Skip to content

[Test only] Vendor Calcite 1.40#35588

Closed
Abacn wants to merge 19 commits intoapache:masterfrom
Abacn:vendorcalcite
Closed

[Test only] Vendor Calcite 1.40#35588
Abacn wants to merge 19 commits intoapache:masterfrom
Abacn:vendorcalcite

Conversation

@Abacn
Copy link
Contributor

@Abacn Abacn commented Jul 14, 2025

Test #35483 #26403

need #35586. There are dependencies having multi-version java21 class

Please add a meaningful description for your change here


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@Abacn
Copy link
Contributor Author

Abacn commented Jul 15, 2025

org.apache.beam.sdk.extensions.sql.BeamSqlDslSqlStdOperatorsTest > testArithmeticOperator failed due to breaking change in apache/calcite#3481

solved updated the test

@Abacn
Copy link
Contributor Author

Abacn commented Jul 15, 2025

apache/calcite@a326bd2#diff-9cda2b29a1b9206e0daa6e6d722eb476575f83eadea1a6e302225afd08d9f0d2 made datetime becoming reserved keyword, failed several nexmark tests

solved updated tests

@Abacn
Copy link
Contributor Author

Abacn commented Jul 15, 2025

BeamSqlDslExistsTest failed due to query had been parsed differently.

Previously, not exists parsed to

LogicalProject(c_custkey=[$0], c_acctbal=[$1], c_city=[$2])
  LogicalFilter(condition=[IS NULL($4)])
    LogicalJoin(condition=[=($0, $3)], joinType=[left])
      BeamIOSourceRel(table=[[beam, CUSTOMER]])
      LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
        LogicalProject(o_custkey=[$1], $f0=[true])
          BeamIOSourceRel(table=[[beam, ORDERS]])

and invokes logical join. Now it becomes

LogicalProject(c_custkey=[$0], c_acctbal=[$1], c_city=[$2])
  LogicalFilter(condition=[NOT(EXISTS({
LogicalFilter(condition=[=($1, $cor0.c_custkey)])
  BeamIOSourceRel(table=[[beam, ORDERS]])
}))], variablesSet=[[$cor0]])
    BeamIOSourceRel(table=[[beam, CUSTOMER]])

LogicalFilter(convention: None -> BEAM_LOGICAL) is not implemented by Beam SQL.

@Abacn
Copy link
Contributor Author

Abacn commented Jul 17, 2025

testProjectArrayFieldWithCoGBKJoin failed for similar reason:

Did a bisect (use un-vendored Calcite: Abacn@d2aff89). Previously (Calcite 1.38 and below), parsed query:

LogicalProject(f_stringArr=[$2])
  LogicalJoin(condition=[=($1, $3)], joinType=[inner])
    BeamIOSourceRel(table=[[beam, PCOLLECTION]])
    LogicalAggregate(group=[{0}])
      LogicalValues(tuples=[[{ 'A' }, { 'B' }, ... }]])

and Beam plan successfully convert to

BeamCalcRel(expr#0..3=[{inputs}], f_stringArr=[$t2])
  BeamCoGBKJoinRel(condition=[=($1, $3)], joinType=[inner])
    BeamIOSourceRel(table=[[beam, PCOLLECTION]])
    BeamValuesRel(tuples=[[{ 'A' }, { 'B' },...]])

In Calcite 1.39+, the parsed query becomes

LogicalProject(f_stringArr=[$2])
  LogicalFilter(condition=[IN($1, {
LogicalValues(tuples=[[{ 'A' }, { 'B' }, ...]])
})])
    BeamIOSourceRel(table=[[beam, PCOLLECTION]])

and not able to convert due to the same reason, LogicalFilter(convention: None -> BEAM_LOGICAL) not implemented.

@Abacn
Copy link
Contributor Author

Abacn commented Jul 17, 2025

opened https://issues.apache.org/jira/browse/CALCITE-7101.

One can use Abacn@bdd6440 to reproduce, run

./gradlew :sdks:java:extensions:sql:test -PdisableSpotlessCheck=true -PdisableCheckerStyle --tests *BeamSqlDslExistsTest.testExistsSubquery

@Abacn
Copy link
Contributor Author

Abacn commented Jul 22, 2025

I'm able to get all (other than ZetaSQL) tests passing, except that Iceberg sql tests failed with

IcebergReadWriteIT.testSqlWriteAndRead

java.lang.NullPointerException: component type for root
	at java.base/java.util.Objects.requireNonNull(Objects.java:349)
	at org.apache.beam.vendor.calcite.v1_40_0.org.apache.calcite.linq4j.tree.IndexExpression.<init>(IndexExpression.java:37)
	at org.apache.beam.vendor.calcite.v1_40_0.org.apache.calcite.linq4j.tree.Expressions.arrayIndex(Expressions.java:237)
	at org.apache.beam.vendor.calcite.v1_40_0.org.apache.calcite.adapter.enumerable.RexToLixTranslator.getConvertExpression(RexToLixTranslator.java:344)
...
        at org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel$Transform.expand(BeamCalcRel.java:203)
	at org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel$Transform.expand(BeamCalcRel.java:154)
	at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:559)
	at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:507)
	at org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:107)
	at org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:81)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)

These tests were added in #34799. @ahmedabu98 @talatuyarer would you mind taking a look? They can be reproduced locally with ./gradlew :sdks:java:extensions:sql:integrationTest --tests *IcebergReadWriteIT.testSqlWriteAndRead

@ahmedabu98
Copy link
Contributor

ahmedabu98 commented Jul 22, 2025

Looks like Calcite 1.40 is much more strict for nested row types. Check needed changes in 069f6b2:

Given a type:

c_arr_struct ARRAY<ROW<c_arr_struct_arr ARRAY<VARCHAR>, c_arr_struct_integer INTEGER>>

this insert no longer works:

ROW(ARRAY['abc', 'xyz'], 123)

it would have to be casted like this:

CAST(ROW(ARRAY['abc', 'xyz'], 123) AS ROW(c_arr_struct_arr VARCHAR ARRAY, c_arr_struct_integer INTEGER))

@Abacn
Copy link
Contributor Author

Abacn commented Jul 22, 2025

Thanks @ahmedabu98 !

I'm able to bisesct calcite version to 1.33 (passed) and 1.34 (all 3 integration test failing). I use https://github.com/Abacn/beam/tree/unvendor-calcite-test for testing, where it contains commits using different version of Apache Calcite

@Abacn
Copy link
Contributor Author

Abacn commented Jul 22, 2025

testSQLReadWithProjectAndFilterPushDown (org.apache.beam.sdk.extensions.sql.meta.provider.iceberg.IcebergReadWriteIT) failed

org.apache.beam.sdk.Pipeline$PipelineExecutionException: org.apache.iceberg.exceptions.ValidationException: Cannot find field 'C_BOOLEAN' in struct: struct<1: c_integer: optional int, 2: c_float: optional float, 3: c_boolean: optional boolean, 4: c_timestamp: optional timestamptz, 5: c_varchar: optional string>

looks like a case sensitivity issue?

@ahmedabu98
Copy link
Contributor

that should fix it

This was referenced Jul 23, 2025
@Abacn Abacn closed this Jul 29, 2025
@Abacn Abacn deleted the vendorcalcite branch July 29, 2025 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants