Skip to content

build: update substrait to v0.83.0#728

Merged
vbarua merged 14 commits intomainfrom
vbarua/antlr-updates
Mar 6, 2026
Merged

build: update substrait to v0.83.0#728
vbarua merged 14 commits intomainfrom
vbarua/antlr-updates

Conversation

@vbarua
Copy link
Member

@vbarua vbarua commented Mar 6, 2026

chore: update ANTLR parser
feat(POJO): add support for PrecisionTime

vbarua added 13 commits March 5, 2026 15:36
isnull is now a field on anyType itself (not the parent typeDef rule),
so access it directly. AnyVar tokens (ANY0-ANY9) are now distinguished
from Any and return their lowercase text as the parameter name (e.g.
"any3"), consistent with StringLiteral.isWildcard() which checks
startsWith("any").
New grammar entry point rule typeStatement: typeDef EOF is required
by the generated SubstraitTypeVisitor interface. Delegates to typeDef.
UserDefinedContext moved from scalarType to parameterizedType, so the
withNull(ScalarTypeContext) overload no longer applies. Switch to
withNull(boolean) using ctx.isnull directly.

Also wire up the new optional type parameters: each expr is visited and
converted to the appropriate Type.Parameter subtype (ParameterDataType
for Type instances, ParameterIntegerValue for integer literals,
ParameterStringValue for string literals). Unsupported expression types
throw UnsupportedOperationException.
New grammar alternative #precisionTime requires POJO support:
- TypeExpression.PrecisionTime class (mirrors IntervalDay pattern)
- visit(TypeExpression.PrecisionTime) in TypeExpressionVisitor interface
  with default throw in TypeExpressionThrowsVisitor
- precisionTimeE(String) in ParameterizedTypeCreator
- precisionTimeE(TypeExpression) in TypeExpressionCreator
- visitPrecisionTime in ParseToPojo following the visitPrecisionIntervalDay
  pattern (integer, string, and expression precision dispatch)

Also add stub implementations of visitFunc, visitSingleFuncParam, and
visitFuncParamsWithParens (new interface requirements) so the code
compiles pending full implementation.
@vbarua vbarua changed the title build: update substrait to v0.82.0 build: update substrait to v0.83.0 Mar 6, 2026
@vbarua vbarua force-pushed the vbarua/antlr-updates branch from 73ec51c to 917ef85 Compare March 6, 2026 03:54
Copy link
Member Author

@vbarua vbarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this PR is mechanical-ish ANTLR updates. I've gone ahead and annotated the PR with some comments to explain the changes.

The big issue is that we were using a very outdated version of the SubstraitType.g4. In this PR, I've copied in the grammars from v0.83.0. It would be better to use them directly from the submodule #729, or even more ideally we should generate the code as a standalone artifact in substrait-packaging.

I used claude to assist with this PR by having it analyze the original grammar, identity the differences, and then fix the deltas commit by commit. I also did some of it by hand at the start to make sure I understood what was going on.

: '0'..'9'
;

start: expr EOF;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start is now startRule

| AnyVar isnull=QMark?
;

type
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type is now typeDef

anyType
: Any isnull=QMark?
| AnyVar isnull=QMark?
;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now distinguish between any and any0, any1, etc.

| VarChar isnull='?'? Lt len=numericParameter Gt #varChar
| FixedBinary isnull='?'? Lt len=numericParameter Gt #fixedBinary
| Decimal isnull='?'? Lt precision=numericParameter Comma scale=numericParameter Gt #decimal
| IntervalDay isnull='?'? Lt precision=numericParameter Gt #intervalDay
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#intervalDay is now #precisionIntervalDay

| FixedBinary isnull='?'? Lt len=numericParameter Gt #fixedBinary
| Decimal isnull='?'? Lt precision=numericParameter Comma scale=numericParameter Gt #decimal
| IntervalDay isnull='?'? Lt precision=numericParameter Gt #intervalDay
| IntervalCompound isnull='?'? Lt precision=numericParameter Gt #intervalCompound
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#intervalCompound is now #precisionIntervalCompound

parameterizedType
: FixedChar isnull='?'? Lt len=numericParameter Gt #fixedChar
| VarChar isnull='?'? Lt len=numericParameter Gt #varChar
| FixedBinary isnull='?'? Lt len=numericParameter Gt #fixedBinary
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len params are now length

| Time #time
| IntervalYear #intervalYear
| UUID #uuid
| UserDefined Identifier #userDefined
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UserDefined is now a parameterizeType that can take type parameters

public TypeExpression visitFuncParamsWithParens(
final SubstraitTypeParser.FuncParamsWithParensContext ctx) {
throw new UnsupportedOperationException();
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not supporting Func types for now.

SupportedType("INTERVAL_DAY", TypeMetadata("DayTimeIntervalType", true)),
SupportedType("PRECISION_TIMESTAMP", TypeMetadata("TimestampNTZType", true), max_precision = Some(9)),
SupportedType("PRECISION_TIMESTAMP_TZ", TypeMetadata("TimestampType", true), max_precision = Some(9)),
SupportedType("INTERVAL_DAY", TypeMetadata("DayTimeIntervalType", true), max_precision = Some(9)),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dialect now requires that max_precision be set for these types, and validation fails otherwise.

@vbarua
Copy link
Member Author

vbarua commented Mar 6, 2026

cc: @nielspardon from your work in substrait-io/substrait#982 I suspect you may have already been taking a look at doing this work.

@vbarua vbarua requested a review from nielspardon March 6, 2026 03:58
Copy link
Member

@nielspardon nielspardon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I had played with switching to the official ANTLR gammar, after finding that the interval compound type was missing.

The only difference might be that I decided to use a symbolic link to the ANTLR grammar from the git submodule. This would allow us to pull in any changes to the grammar automatically while substrait-packaging does not yet build and distribute Java artifacts.

And I included the FuncTestCase grammar in the code generation which led me to find the redundant import issue which is why I would have tackled when 0.84.0 is being released.

@vbarua vbarua merged commit cbcd2ac into main Mar 6, 2026
12 checks passed
@vbarua vbarua deleted the vbarua/antlr-updates branch March 6, 2026 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants