-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-46063: [C++][Compute] Fix the issue that MinMax kernel emits -inf/inf for all-NaN input #48459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
5d9050b to
48b787c
Compare
|
Hi @pitrou , would you take a look? Thanks. |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM on the principle, but can we ensure that the hash-aggregate min-max has the same semantics?
That was close. The hash_min_max behaves differently. I'm fixing them now. Thanks for pointing this out! |
48b787c to
4894506
Compare
cpp/src/arrow/type_traits.h
Outdated
|
|
||
| /// @} | ||
|
|
||
| /// \addtogroup c-type-concepts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a nice cup of concepts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do that as a separate PR? #48590 is already open for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we revert all the C++20 stuff in this PR and redo it in a separate one? Or, we can define such concepts local in the cpp file, and only move them to public header in a coming PR for #48590 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can define them locally if that makes things easier for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Moved from public header to local cpp.
cpp/src/arrow/type_traits.h
Outdated
| // XXX: To be completed with more concepts as needed. | ||
|
|
||
| template <typename T> | ||
| concept CBooleanConcept = std::is_same_v<T, bool>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently std::same_as can be used, though I'm not sure whether it differs from std::is_same_v.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. C++20 std::same_as is preferred. Updated.
| template <CFloatingPointConcept CType> | ||
| struct MinMaxOp<CType> { | ||
| static constexpr CType min(CType a, CType b) { return std::fmin(a, b); } | ||
| static constexpr CType max(CType a, CType b) { return std::fmax(a, b); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do std::fmin/fmax actually accept util::Float16??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but we don't either specialize for Float16 so it compiles. Please see my other comment.
| field("argument1", float32()), | ||
| field("argument2", float64()), | ||
| field("key", int64()), | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also want a float16 column? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The min/max kernel for half-float is not implemented so we are not able to test it.
The current half-float is not intact in terms of: 1) whether the type is in floating point category is inconsistent: e.g. type trait is_floating_type<HalfFloatType> is true but g_floating_types doesn't include it (that's why floating point kernels don't register for half-float). 2) Some std functions don't work with our half-float representation: e.g. std::is_nan/fmin/fmax, it won't compile if we try to add certain half-float kernels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I see. Thanks for the clarification.
| this->AssertMinMaxIs("[5, -Inf, 2, 3, 4]", -INFINITY, 5, options); | ||
| this->AssertMinMaxIsNull("[5, null, 2, 3, 4]", options); | ||
| this->AssertMinMaxIsNull("[5, -Inf, null, 3, 4]", options); | ||
| this->AssertMinMaxIsNull("[NaN, null]", options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these tests also executed with Float16 or is there a separate test for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as my other comment.
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two small remarks, otherwise LGTM.
| // XXX: Consider making these concepts complete and moving to public header. | ||
|
|
||
| template <typename T> | ||
| concept CBooleanConcept = std::is_same_v<T, bool>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use std::same_as?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, thanks! Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, thanks! Updated.
| std::floating_point<T> || std::is_same_v<T, util::Float16>; | ||
|
|
||
| template <typename T> | ||
| concept CDecimalConcept = std::is_same_v<T, Decimal32> || std::is_same_v<T, Decimal64> || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, it seems CTypeTraits<Decimal32> et al. aren't defined, perhaps open a separate issue for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, all decimal types are missing for CTypeTraits. #48740 filed.
|
@github-actions crossbow submit -g cpp -g python |
|
Revision: 461bd5c Submitted crossbow builds: ursacomputing/crossbow @ actions-e84e8aea1a |
|
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit abbcd53. There weren't enough matching historic benchmark results to make a call on whether there were regressions. The full Conbench report has more details. |
Rationale for this change
Our MinMax kernels emit -inf/inf for all-NaN input array, which doesn't make sense.
What changes are included in this PR?
Initialize the running min/max value from -inf/inf to NaN, so we can leverage the nice property that:
std::fmin/fmax(all-NaN) = NaNstd::fmin/fmax(NaN, non-NaN) = non-NaNAre these changes tested?
Test included.
Are there any user-facing changes?
None.