Skip to content

Conversation

@rluvaton
Copy link
Member

Which issue does this PR close?

N/A

Rationale for this change

make range and generate_series faster

What changes are included in this PR?

manually implement specialized implementation for Int64 range and generate_series

Are these changes tested?

Existing tests + added more tests for edge cases

Are there any user-facing changes?

Yes, added to public trait SeriesValue a function called generate_array_for_series with default implementation to avoid breaking changes

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Dec 20, 2025
@rluvaton
Copy link
Member Author

run benchmark range_and_generate_series

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing improve-range-and-generate-series-for-int-64 (67b2d04) to 4249e4e diff
BENCH_NAME=range_and_generate_series
BENCH_COMMAND=cargo bench --features=parquet --bench range_and_generate_series
BENCH_FILTER=
BENCH_BRANCH_NAME=improve-range-and-generate-series-for-int-64
Results will be posted here when complete

@rluvaton rluvaton added the performance Make DataFusion faster label Dec 20, 2025
@rluvaton
Copy link
Member Author

@alamb I might increased the benchmark too much, is it stuck?

@rluvaton
Copy link
Member Author

show benchmark queue

}
} else {
// step < 0
let cur = series_state.current as i128;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the same effect be achieved using a reverse iterator like:

(series_state.end..=series_state.current).rev()

Depending on whether end is exclusive or not, you may have to offset it by one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave it a go locally. This seems to do the trick:

if series_state.include_end {
    Int64Array::from_iter_values(
        (series_state.end..=series_state.current)
            .rev()
            .step_by(-series_state.step as usize)
            .take(series_state.batch_size),
    )
} else {
    Int64Array::from_iter_values(
    ((series_state.end + 1)..=series_state.current)
            .rev()
            .step_by(-series_state.step as usize)
            .take(series_state.batch_size),
    )
}

Copy link
Contributor

@pepijnve pepijnve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added review comments

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing improve-range-and-generate-series-for-int-64 (67b2d04) to 4249e4e diff
BENCH_NAME=range_and_generate_series
BENCH_COMMAND=cargo bench --features=parquet --bench range_and_generate_series
BENCH_FILTER=
BENCH_BRANCH_NAME=improve-range-and-generate-series-for-int-64
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Dec 21, 2025

@alamb I might increased the benchmark too much, is it stuck?

I don't know what happened -- the runner had died. I restarted t

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                              improve-range-and-generate-series-for-int-64    main
-----                              --------------------------------------------    ----
generate_series(0, 1000000, 5)     1.86    838.9±8.69µs        ? ?/sec             1.00    449.9±6.31µs        ? ?/sec
generate_series(1000000)           2.25      3.5±0.05ms        ? ?/sec             1.00  1552.1±21.57µs        ? ?/sec
generate_series(1000000, 0, -5)    1.00    326.4±6.72µs        ? ?/sec             1.42    462.1±6.47µs        ? ?/sec
range(0, 1000000, 5)               1.00   393.8±12.61µs        ? ?/sec             1.15   453.1±10.31µs        ? ?/sec
range(1000000)                     1.00  1237.9±19.84µs        ? ?/sec             1.26  1559.6±24.79µs        ? ?/sec
range(1000000, 0, -5)              1.00    319.1±5.62µs        ? ?/sec             1.44   460.7±15.48µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Dec 29, 2025

These cases seems to have slowed down

generate_series(0, 1000000, 5)     1.86    838.9±8.69µs        ? ?/sec             1.00    449.9±6.31µs        ? ?/sec
generate_series(1000000)           2.25      3.5±0.05ms        ? ?/sec             1.00  1552.1±21.57µs        ? ?/sec

Is that expected?

@alamb
Copy link
Contributor

alamb commented Dec 29, 2025

run benchmark range_and_generate_series


# generate_series equal batch size * 2 with starting value and step (including end)
query I
SELECT * FROM generate_series(1, 39, 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also add a test for a starting value other than 1? Perhaps 100 and -100?

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing improve-range-and-generate-series-for-int-64 (df3c1b9) to 83ed192 diff
BENCH_NAME=range_and_generate_series
BENCH_COMMAND=cargo bench --features=parquet --bench range_and_generate_series
BENCH_FILTER=
BENCH_BRANCH_NAME=improve-range-and-generate-series-for-int-64
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                              improve-range-and-generate-series-for-int-64    main
-----                              --------------------------------------------    ----
generate_series(0, 1000000, 5)     1.81   836.7±12.13µs        ? ?/sec             1.00    462.2±6.92µs        ? ?/sec
generate_series(1000000)           2.19      3.4±0.02ms        ? ?/sec             1.00  1552.7±14.22µs        ? ?/sec
generate_series(1000000, 0, -5)    1.00    327.6±7.64µs        ? ?/sec             1.39    454.4±6.16µs        ? ?/sec
range(0, 1000000, 5)               1.00    388.5±3.23µs        ? ?/sec             1.16    450.3±4.20µs        ? ?/sec
range(1000000)                     1.00   1214.4±7.87µs        ? ?/sec             1.28  1556.8±16.35µs        ? ?/sec
range(1000000, 0, -5)              1.00    323.2±3.67µs        ? ?/sec             1.40    452.1±8.87µs        ? ?/sec

@rluvaton
Copy link
Member Author

Closing as it is fast enough and this have some regression

@rluvaton rluvaton closed this Dec 29, 2025
@rluvaton rluvaton deleted the improve-range-and-generate-series-for-int-64 branch December 29, 2025 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation performance Make DataFusion faster sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants