Conversation
isnull is now a field on anyType itself (not the parent typeDef rule),
so access it directly. AnyVar tokens (ANY0-ANY9) are now distinguished
from Any and return their lowercase text as the parameter name (e.g.
"any3"), consistent with StringLiteral.isWildcard() which checks
startsWith("any").
New grammar entry point rule typeStatement: typeDef EOF is required by the generated SubstraitTypeVisitor interface. Delegates to typeDef.
UserDefinedContext moved from scalarType to parameterizedType, so the withNull(ScalarTypeContext) overload no longer applies. Switch to withNull(boolean) using ctx.isnull directly. Also wire up the new optional type parameters: each expr is visited and converted to the appropriate Type.Parameter subtype (ParameterDataType for Type instances, ParameterIntegerValue for integer literals, ParameterStringValue for string literals). Unsupported expression types throw UnsupportedOperationException.
New grammar alternative #precisionTime requires POJO support: - TypeExpression.PrecisionTime class (mirrors IntervalDay pattern) - visit(TypeExpression.PrecisionTime) in TypeExpressionVisitor interface with default throw in TypeExpressionThrowsVisitor - precisionTimeE(String) in ParameterizedTypeCreator - precisionTimeE(TypeExpression) in TypeExpressionCreator - visitPrecisionTime in ParseToPojo following the visitPrecisionIntervalDay pattern (integer, string, and expression precision dispatch) Also add stub implementations of visitFunc, visitSingleFuncParam, and visitFuncParamsWithParens (new interface requirements) so the code compiles pending full implementation.
73ec51c to
917ef85
Compare
There was a problem hiding this comment.
Most of this PR is mechanical-ish ANTLR updates. I've gone ahead and annotated the PR with some comments to explain the changes.
The big issue is that we were using a very outdated version of the SubstraitType.g4. In this PR, I've copied in the grammars from v0.83.0. It would be better to use them directly from the submodule #729, or even more ideally we should generate the code as a standalone artifact in substrait-packaging.
I used claude to assist with this PR by having it analyze the original grammar, identity the differences, and then fix the deltas commit by commit. I also did some of it by hand at the start to make sure I understood what was going on.
| : '0'..'9' | ||
| ; | ||
|
|
||
| start: expr EOF; |
| | AnyVar isnull=QMark? | ||
| ; | ||
|
|
||
| type |
| anyType | ||
| : Any isnull=QMark? | ||
| | AnyVar isnull=QMark? | ||
| ; |
There was a problem hiding this comment.
We now distinguish between any and any0, any1, etc.
| | VarChar isnull='?'? Lt len=numericParameter Gt #varChar | ||
| | FixedBinary isnull='?'? Lt len=numericParameter Gt #fixedBinary | ||
| | Decimal isnull='?'? Lt precision=numericParameter Comma scale=numericParameter Gt #decimal | ||
| | IntervalDay isnull='?'? Lt precision=numericParameter Gt #intervalDay |
There was a problem hiding this comment.
#intervalDay is now #precisionIntervalDay
| | FixedBinary isnull='?'? Lt len=numericParameter Gt #fixedBinary | ||
| | Decimal isnull='?'? Lt precision=numericParameter Comma scale=numericParameter Gt #decimal | ||
| | IntervalDay isnull='?'? Lt precision=numericParameter Gt #intervalDay | ||
| | IntervalCompound isnull='?'? Lt precision=numericParameter Gt #intervalCompound |
There was a problem hiding this comment.
#intervalCompound is now #precisionIntervalCompound
| parameterizedType | ||
| : FixedChar isnull='?'? Lt len=numericParameter Gt #fixedChar | ||
| | VarChar isnull='?'? Lt len=numericParameter Gt #varChar | ||
| | FixedBinary isnull='?'? Lt len=numericParameter Gt #fixedBinary |
| | Time #time | ||
| | IntervalYear #intervalYear | ||
| | UUID #uuid | ||
| | UserDefined Identifier #userDefined |
There was a problem hiding this comment.
UserDefined is now a parameterizeType that can take type parameters
| public TypeExpression visitFuncParamsWithParens( | ||
| final SubstraitTypeParser.FuncParamsWithParensContext ctx) { | ||
| throw new UnsupportedOperationException(); | ||
| } |
There was a problem hiding this comment.
Not supporting Func types for now.
| SupportedType("INTERVAL_DAY", TypeMetadata("DayTimeIntervalType", true)), | ||
| SupportedType("PRECISION_TIMESTAMP", TypeMetadata("TimestampNTZType", true), max_precision = Some(9)), | ||
| SupportedType("PRECISION_TIMESTAMP_TZ", TypeMetadata("TimestampType", true), max_precision = Some(9)), | ||
| SupportedType("INTERVAL_DAY", TypeMetadata("DayTimeIntervalType", true), max_precision = Some(9)), |
There was a problem hiding this comment.
The dialect now requires that max_precision be set for these types, and validation fails otherwise.
|
cc: @nielspardon from your work in substrait-io/substrait#982 I suspect you may have already been taking a look at doing this work. |
There was a problem hiding this comment.
LGTM, I had played with switching to the official ANTLR gammar, after finding that the interval compound type was missing.
The only difference might be that I decided to use a symbolic link to the ANTLR grammar from the git submodule. This would allow us to pull in any changes to the grammar automatically while substrait-packaging does not yet build and distribute Java artifacts.
And I included the FuncTestCase grammar in the code generation which led me to find the redundant import issue which is why I would have tackled when 0.84.0 is being released.
chore: update ANTLR parser
feat(POJO): add support for PrecisionTime