GH-46063: [C++][Compute] Fix the issue that MinMax kernel emits -inf/inf for all-NaN input #48459

zanmato1984 · 2025-12-11T10:02:18Z

Rationale for this change

Our MinMax kernels emit -inf/inf for all-NaN input array, which doesn't make sense.

What changes are included in this PR?

Initialize the running min/max value from -inf/inf to NaN, so we can leverage the nice property that:
std::fmin/fmax(all-NaN) = NaN
std::fmin/fmax(NaN, non-NaN) = non-NaN

Are these changes tested?

Test included.

Are there any user-facing changes?

None.

GitHub Issue: [C++] min/max aggregate for floating points can return values not contained in column #46063

github-actions · 2025-12-11T10:02:43Z

⚠️ GitHub issue #46063 has been automatically assigned in GitHub to PR creator.

cpp/src/arrow/compute/kernels/aggregate_test.cc

zanmato1984 · 2025-12-11T10:12:46Z

cc @pitrou @felipecrv

zanmato1984 · 2025-12-24T01:45:47Z

Hi @pitrou , would you take a look? Thanks.

pitrou

LGTM on the principle, but can we ensure that the hash-aggregate min-max has the same semantics?

zanmato1984 · 2026-01-05T10:11:48Z

LGTM on the principle, but can we ensure that the hash-aggregate min-max has the same semantics?

That was close. The hash_min_max behaves differently. I'm fixing them now. Thanks for pointing this out!

zanmato1984 · 2026-01-05T11:28:49Z

cpp/src/arrow/type_traits.h


 /// @}

+/// \addtogroup c-type-concepts


How about a nice cup of concepts?

Can do that as a separate PR? #48590 is already open for it.

Shall we revert all the C++20 stuff in this PR and redo it in a separate one? Or, we can define such concepts local in the cpp file, and only move them to public header in a coming PR for #48590 ?

You can define them locally if that makes things easier for you?

Thanks. Moved from public header to local cpp.

cpp/src/arrow/compute/kernels/hash_aggregate.cc

pitrou · 2026-01-05T17:30:04Z

cpp/src/arrow/type_traits.h

+// XXX: To be completed with more concepts as needed.
+
+template <typename T>
+concept CBooleanConcept = std::is_same_v<T, bool>;


Apparently std::same_as can be used, though I'm not sure whether it differs from std::is_same_v.

Yes. C++20 std::same_as is preferred. Updated.

pitrou · 2026-01-05T17:32:10Z

cpp/src/arrow/compute/kernels/hash_aggregate.cc

+template <CFloatingPointConcept CType>
+struct MinMaxOp<CType> {
+  static constexpr CType min(CType a, CType b) { return std::fmin(a, b); }
+  static constexpr CType max(CType a, CType b) { return std::fmax(a, b); }


Do std::fmin/fmax actually accept util::Float16??

No, but we don't either specialize for Float16 so it compiles. Please see my other comment.

pitrou · 2026-01-05T17:33:38Z

cpp/src/arrow/acero/hash_aggregate_test.cc

+      field("argument1", float32()),
+      field("argument2", float64()),
+      field("key", int64()),
+  });


I think we also want a float16 column? :)

The min/max kernel for half-float is not implemented so we are not able to test it.

The current half-float is not intact in terms of: 1) whether the type is in floating point category is inconsistent: e.g. type trait is_floating_type<HalfFloatType> is true but g_floating_types doesn't include it (that's why floating point kernels don't register for half-float). 2) Some std functions don't work with our half-float representation: e.g. std::is_nan/fmin/fmax, it won't compile if we try to add certain half-float kernels.

Hmm, I see. Thanks for the clarification.

pitrou · 2026-01-05T17:34:30Z

cpp/src/arrow/compute/kernels/aggregate_test.cc

  this->AssertMinMaxIs("[5, -Inf, 2, 3, 4]", -INFINITY, 5, options);
  this->AssertMinMaxIsNull("[5, null, 2, 3, 4]", options);
  this->AssertMinMaxIsNull("[5, -Inf, null, 3, 4]", options);
+  this->AssertMinMaxIsNull("[NaN, null]", options);


Are these tests also executed with Float16 or is there a separate test for it?

Same as my other comment.

pitrou

Just two small remarks, otherwise LGTM.

pitrou · 2026-01-06T14:13:32Z

cpp/src/arrow/compute/kernels/hash_aggregate.cc

+// XXX: Consider making these concepts complete and moving to public header.
+
+template <typename T>
+concept CBooleanConcept = std::is_same_v<T, bool>;


Use std::same_as?

Right, thanks! Updated.

pitrou · 2026-01-06T14:15:49Z

cpp/src/arrow/compute/kernels/hash_aggregate.cc

+    std::floating_point<T> || std::is_same_v<T, util::Float16>;
+
+template <typename T>
+concept CDecimalConcept = std::is_same_v<T, Decimal32> || std::is_same_v<T, Decimal64> ||


By the way, it seems CTypeTraits<Decimal32> et al. aren't defined, perhaps open a separate issue for that.

Hmm, all decimal types are missing for CTypeTraits. #48740 filed.

zanmato1984 · 2026-01-07T01:02:31Z

@github-actions crossbow submit -g cpp -g python

github-actions · 2026-01-07T01:05:54Z

Revision: 461bd5c

Submitted crossbow builds: ursacomputing/crossbow @ actions-e84e8aea1a

Task	Status
example-cpp-minimal-build-static
example-cpp-minimal-build-static-system-dependency
example-cpp-tutorial
example-python-minimal-build-fedora-conda
example-python-minimal-build-ubuntu-venv
test-build-cpp-fuzz
test-conda-cpp
test-conda-cpp-valgrind
test-conda-python-3.10
test-conda-python-3.10-hdfs-2.9.2
test-conda-python-3.10-hdfs-3.2.1
test-conda-python-3.10-pandas-1.3.4-numpy-1.21.2
test-conda-python-3.11
test-conda-python-3.11-dask-latest
test-conda-python-3.11-dask-upstream_devel
test-conda-python-3.11-hypothesis
test-conda-python-3.11-pandas-latest-numpy-latest
test-conda-python-3.11-spark-master
test-conda-python-3.12
test-conda-python-3.12-cpython-debug
test-conda-python-3.12-pandas-latest-numpy-1.26
test-conda-python-3.12-pandas-latest-numpy-latest
test-conda-python-3.13
test-conda-python-3.13-pandas-nightly-numpy-nightly
test-conda-python-3.13-pandas-upstream_devel-numpy-nightly
test-conda-python-3.14
test-conda-python-emscripten
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1
test-cuda-cpp-ubuntu-24.04-cuda-13.0.2
test-cuda-python-ubuntu-22.04-cuda-11.7.1
test-cuda-python-ubuntu-24.04-cuda-13.0.2
test-debian-12-cpp-amd64
test-debian-12-cpp-i386
test-debian-12-python-3-amd64
test-debian-12-python-3-i386
test-debian-experimental-cpp-gcc-15
test-fedora-42-cpp
test-fedora-42-python-3
test-ubuntu-22.04-cpp
test-ubuntu-22.04-cpp-20
test-ubuntu-22.04-cpp-bundled
test-ubuntu-22.04-cpp-emscripten
test-ubuntu-22.04-cpp-no-threading
test-ubuntu-22.04-python-3
test-ubuntu-22.04-python-313-freethreading
test-ubuntu-24.04-cpp
test-ubuntu-24.04-cpp-bundled-offline
test-ubuntu-24.04-cpp-gcc-13-bundled
test-ubuntu-24.04-cpp-gcc-14
test-ubuntu-24.04-cpp-minimal-with-formats
test-ubuntu-24.04-cpp-thread-sanitizer
test-ubuntu-24.04-python-3

conbench-apache-arrow · 2026-01-07T09:32:50Z

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit abbcd53.

There weren't enough matching historic benchmark results to make a call on whether there were regressions.

The full Conbench report has more details.

github-actions bot added Component: C++ awaiting review Awaiting review labels Dec 11, 2025

zanmato1984 commented Dec 11, 2025

View reviewed changes

cpp/src/arrow/compute/kernels/aggregate_test.cc Outdated Show resolved Hide resolved

github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Dec 11, 2025

zanmato1984 force-pushed the fix/gh-46063 branch from 5d9050b to 48b787c Compare December 11, 2025 10:11

pitrou reviewed Jan 5, 2026

View reviewed changes

zanmato1984 added 2 commits January 5, 2026 18:32

Fix the issue that MinMax kernel emits inf for all NaN array

7317ced

Address comment: fix hash aggregate as well

4894506

zanmato1984 force-pushed the fix/gh-46063 branch from 48b787c to 4894506 Compare January 5, 2026 11:27

zanmato1984 requested a review from westonpace as a code owner January 5, 2026 11:27

zanmato1984 commented Jan 5, 2026

View reviewed changes

Fix format

7559543

pitrou reviewed Jan 5, 2026

View reviewed changes

Address comments: Move type concepts from public header to local cpp

4730f95

pitrou approved these changes Jan 6, 2026

View reviewed changes

Address comment: Use std::same_as instead of std::is_same_v

461bd5c

zanmato1984 merged commit abbcd53 into apache:main Jan 7, 2026
47 of 48 checks passed

zanmato1984 removed the awaiting committer review Awaiting committer review label Jan 7, 2026

zanmato1984 mentioned this pull request Jan 7, 2026

[C++] min/max aggregate for floating points can return values not contained in column #46063

Closed

GH-46063: [C++][Compute] Fix the issue that MinMax kernel emits -inf/inf for all-NaN input #48459

GH-46063: [C++][Compute] Fix the issue that MinMax kernel emits -inf/inf for all-NaN input #48459

Conversation

zanmato1984 commented Dec 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Dec 11, 2025

Uh oh!

Uh oh!

zanmato1984 commented Dec 11, 2025

Uh oh!

zanmato1984 commented Dec 24, 2025

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

zanmato1984 commented Jan 5, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zanmato1984 commented Jan 7, 2026

Uh oh!

github-actions bot commented Jan 7, 2026

Uh oh!

Uh oh!

conbench-apache-arrow bot commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zanmato1984 commented Dec 11, 2025 •

edited by github-actions bot

Loading