-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-45755: [C++][Python][Compute] Add winsorize function #45763
base: main
Are you sure you want to change the base?
Conversation
167656f
to
32e1126
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some first glance questions.
Plus, no update to doc? https://github.com/apache/arrow/blob/ab4263476d9d5078cd0fa2cce6b922eb7d90c0af/docs/source/cpp/compute.rst
Oops, I had entirely forgotten. |
32e1126
to
2f89fc9
Compare
Ok, I've added the docs now. Do you want to take another look? |
@github-actions crossbow submit -g cpp |
This comment was marked as outdated.
This comment was marked as outdated.
@github-actions crossbow submit -g cpp |
Revision: be3f7aa Submitted crossbow builds: ursacomputing/crossbow @ actions-fe0234df2b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with two nits.
be3f7aa
to
b203c3d
Compare
QuantileOptions::NEAREST); | ||
ARROW_ASSIGN_OR_RAISE( | ||
auto quantile, | ||
CallFunction("quantile", {input}, &quantile_options, ctx->exec_context())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pitrou Do you think There is any benefit to resolve and use the quantile kernel here directly as supposed to use CallFunction
?
I suppose it is easier this way (using CallFunction), but I wonder, in general, when writing a kernel that uses other kernel/functions, whether it is better to use CallFunction
or resolve it kernel and use kernel->Exec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolving the kernel could perhaps save some nanoseconds, but I'm not sure that's significant compared to the other costs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a question. Otherwise LGTM.
Rationale for this change
Add a "winsorize" vector function as described here:
https://en.wikipedia.org/wiki/Winsorizing
and implemented in e.g. Scipy:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mstats.winsorize.html
Also make the "quantile" function supported on decimal32/decimal64.
Are these changes tested?
Yes.
Are there any user-facing changes?
No, only a new compute function.