Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-45755: [C++][Python][Compute] Add winsorize function #45763

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Mar 13, 2025

Rationale for this change

Add a "winsorize" vector function as described here:
https://en.wikipedia.org/wiki/Winsorizing

and implemented in e.g. Scipy:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.winsorize.html

Also make the "quantile" function supported on decimal32/decimal64.

Are these changes tested?

Yes.

Are there any user-facing changes?

No, only a new compute function.

@pitrou pitrou changed the title GH-45755: [C++][Compute] Add winsorize function GH-45755: [C++][Python][Compute] Add winsorize function Mar 13, 2025
@pitrou pitrou force-pushed the winsorize-function branch 2 times, most recently from 167656f to 32e1126 Compare March 13, 2025 14:27
@pitrou pitrou marked this pull request as ready for review March 13, 2025 15:18
@pitrou pitrou requested a review from zanmato1984 March 13, 2025 15:51
Copy link
Contributor

@zanmato1984 zanmato1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Mar 14, 2025
@pitrou
Copy link
Member Author

pitrou commented Mar 14, 2025

Plus, no update to doc? https://github.com/apache/arrow/blob/ab4263476d9d5078cd0fa2cce6b922eb7d90c0af/docs/source/cpp/compute.rst

Oops, I had entirely forgotten.

@pitrou
Copy link
Member Author

pitrou commented Mar 17, 2025

Ok, I've added the docs now. Do you want to take another look?

@pitrou pitrou requested a review from icexelloss March 17, 2025 13:32
@pitrou
Copy link
Member Author

pitrou commented Mar 17, 2025

@github-actions crossbow submit -g cpp

This comment was marked as outdated.

@pitrou
Copy link
Member Author

pitrou commented Mar 17, 2025

@github-actions crossbow submit -g cpp

Copy link

Revision: be3f7aa

Submitted crossbow builds: ursacomputing/crossbow @ actions-fe0234df2b

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-meson GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

Copy link
Contributor

@zanmato1984 zanmato1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with two nits.

@pitrou pitrou force-pushed the winsorize-function branch from be3f7aa to b203c3d Compare March 20, 2025 10:03
QuantileOptions::NEAREST);
ARROW_ASSIGN_OR_RAISE(
auto quantile,
CallFunction("quantile", {input}, &quantile_options, ctx->exec_context()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou Do you think There is any benefit to resolve and use the quantile kernel here directly as supposed to use CallFunction?

I suppose it is easier this way (using CallFunction), but I wonder, in general, when writing a kernel that uses other kernel/functions, whether it is better to use CallFunction or resolve it kernel and use kernel->Exec

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolving the kernel could perhaps save some nanoseconds, but I'm not sure that's significant compared to the other costs.

Copy link
Contributor

@icexelloss icexelloss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a question. Otherwise LGTM.

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review awaiting changes Awaiting changes labels Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants