Skip to content

Conversation

@zanmato1984
Copy link
Contributor

@zanmato1984 zanmato1984 commented Dec 11, 2025

Rationale for this change

Our MinMax kernels emit -inf/inf for all-NaN input array, which doesn't make sense.

What changes are included in this PR?

Initialize the running min/max value from -inf/inf to NaN, so we can leverage the nice property that:
std::fmin/fmax(all-NaN) = NaN
std::fmin/fmax(NaN, non-NaN) = non-NaN

Are these changes tested?

Test included.

Are there any user-facing changes?

None.

@github-actions
Copy link

⚠️ GitHub issue #46063 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Dec 11, 2025
@zanmato1984
Copy link
Contributor Author

cc @pitrou @felipecrv

@zanmato1984
Copy link
Contributor Author

Hi @pitrou , would you take a look? Thanks.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on the principle, but can we ensure that the hash-aggregate min-max has the same semantics?

@zanmato1984
Copy link
Contributor Author

LGTM on the principle, but can we ensure that the hash-aggregate min-max has the same semantics?

That was close. The hash_min_max behaves differently. I'm fixing them now. Thanks for pointing this out!


/// @}

/// \addtogroup c-type-concepts
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a nice cup of concepts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do that as a separate PR? #48590 is already open for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we revert all the C++20 stuff in this PR and redo it in a separate one? Or, we can define such concepts local in the cpp file, and only move them to public header in a coming PR for #48590 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can define them locally if that makes things easier for you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Moved from public header to local cpp.

// XXX: To be completed with more concepts as needed.

template <typename T>
concept CBooleanConcept = std::is_same_v<T, bool>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently std::same_as can be used, though I'm not sure whether it differs from std::is_same_v.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. C++20 std::same_as is preferred. Updated.

template <CFloatingPointConcept CType>
struct MinMaxOp<CType> {
static constexpr CType min(CType a, CType b) { return std::fmin(a, b); }
static constexpr CType max(CType a, CType b) { return std::fmax(a, b); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do std::fmin/fmax actually accept util::Float16??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but we don't either specialize for Float16 so it compiles. Please see my other comment.

field("argument1", float32()),
field("argument2", float64()),
field("key", int64()),
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also want a float16 column? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The min/max kernel for half-float is not implemented so we are not able to test it.

The current half-float is not intact in terms of: 1) whether the type is in floating point category is inconsistent: e.g. type trait is_floating_type<HalfFloatType> is true but g_floating_types doesn't include it (that's why floating point kernels don't register for half-float). 2) Some std functions don't work with our half-float representation: e.g. std::is_nan/fmin/fmax, it won't compile if we try to add certain half-float kernels.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see. Thanks for the clarification.

this->AssertMinMaxIs("[5, -Inf, 2, 3, 4]", -INFINITY, 5, options);
this->AssertMinMaxIsNull("[5, null, 2, 3, 4]", options);
this->AssertMinMaxIsNull("[5, -Inf, null, 3, 4]", options);
this->AssertMinMaxIsNull("[NaN, null]", options);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these tests also executed with Float16 or is there a separate test for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as my other comment.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two small remarks, otherwise LGTM.

// XXX: Consider making these concepts complete and moving to public header.

template <typename T>
concept CBooleanConcept = std::is_same_v<T, bool>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use std::same_as?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, thanks! Updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, thanks! Updated.

std::floating_point<T> || std::is_same_v<T, util::Float16>;

template <typename T>
concept CDecimalConcept = std::is_same_v<T, Decimal32> || std::is_same_v<T, Decimal64> ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, it seems CTypeTraits<Decimal32> et al. aren't defined, perhaps open a separate issue for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, all decimal types are missing for CTypeTraits. #48740 filed.

@zanmato1984
Copy link
Contributor Author

@github-actions crossbow submit -g cpp -g python

@github-actions
Copy link

github-actions bot commented Jan 7, 2026

Revision: 461bd5c

Submitted crossbow builds: ursacomputing/crossbow @ actions-e84e8aea1a

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
example-python-minimal-build-fedora-conda GitHub Actions
example-python-minimal-build-ubuntu-venv GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-1.3.4-numpy-1.21.2 GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.12-cpython-debug GitHub Actions
test-conda-python-3.12-pandas-latest-numpy-1.26 GitHub Actions
test-conda-python-3.12-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.13 GitHub Actions
test-conda-python-3.13-pandas-nightly-numpy-nightly GitHub Actions
test-conda-python-3.13-pandas-upstream_devel-numpy-nightly GitHub Actions
test-conda-python-3.14 GitHub Actions
test-conda-python-emscripten GitHub Actions
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-cuda-cpp-ubuntu-24.04-cuda-13.0.2 GitHub Actions
test-cuda-python-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-cuda-python-ubuntu-24.04-cuda-13.0.2 GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-debian-12-python-3-amd64 GitHub Actions
test-debian-12-python-3-i386 GitHub Actions
test-debian-experimental-cpp-gcc-15 GitHub Actions
test-fedora-42-cpp GitHub Actions
test-fedora-42-python-3 GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-22.04-python-3 GitHub Actions
test-ubuntu-22.04-python-313-freethreading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions
test-ubuntu-24.04-python-3 GitHub Actions

@zanmato1984 zanmato1984 merged commit abbcd53 into apache:main Jan 7, 2026
47 of 48 checks passed
@zanmato1984 zanmato1984 removed the awaiting committer review Awaiting committer review label Jan 7, 2026
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit abbcd53.

There weren't enough matching historic benchmark results to make a call on whether there were regressions.

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants