feat: optional disable column stats by parisni · Pull Request #811 · apache/incubator-xtable

parisni · 2026-02-26T13:53:42Z

What is the purpose of the pull request

This PR makes sync faster and reduces memory footprint by adding a shared source-side option to skip column stats extraction:

xtable.source.skip_column_stats=true

The flag now works consistently for Hudi, Delta, and Iceberg sources.

In practical terms, this is intended for heavy tables where stats extraction is a bottleneck.
Example from large-sync behavior: a job that previously took around 6 hours and required >64 GB Xmx can be reduced to around 1 hour with about 10 GB Xmx.

Brief change log

Added a shared source config (xtable.source.skip_column_stats) instead of per-format keys.
Wired skip-column-stats behavior into all three source implementations:
- Hudi source
- Delta source
- Iceberg source
Kept required row-count behavior intact so downstream sync logic remains correct.
Added handling for zero-row files where needed to avoid incorrect stats behavior.
Improved naming consistency from “skip stats” to “skip column stats”.
Added integration coverage for source format × sync mode × skip flag combinations.

Verify this pull request

This change added tests and can be verified as follows:

Added parameterized integration test in ITConversionController for:
- source format: Hudi / Delta / Iceberg
- sync mode: Incremental / Full
- xtable.source.skip_column_stats: true / false
Verified with:
- mvn -pl xtable-core -Dtest=ITConversionController#testSkipColumnStatsAcrossSources test
Additional compile validation:
- mvn -pl xtable-core -DskipTests compile

Trade-offs

When xtable.source.skip_column_stats=true, column stats are not extracted or propagated.
This reduces sync cost significantly, but column stats-dependent optimizations may be unavailable.
As a result, query performance may be reduced for some workloads.

parisni added 10 commits February 26, 2026 11:49

feat: disable stats

6cda60e

disable row count

3873c61

compute mandatory count

a43bf31

fmt

33d73ad

delta support 0 row parquet

371bba2

iceberg support 0 row parquet

1451769

rename skip stats to skip column stats

8852b84

delta to skip column stats

e0d001d

implem for iceberg

9a18b79

add tests

e9f994e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: optional disable column stats#811

feat: optional disable column stats#811
parisni wants to merge 10 commits intoapache:mainfrom
leboncoin:pr-feat-skip-stats

parisni commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

parisni commented Feb 26, 2026

What is the purpose of the pull request

Brief change log

Verify this pull request

Trade-offs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant