-
Notifications
You must be signed in to change notification settings - Fork 415
[Bug Fix] Allow HiveCatalog to create table with TimestamptzType #585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pyiceberg/catalog/hive.py
Outdated
| DateType: "date", | ||
| TimeType: "string", | ||
| TimestampType: "timestamp", | ||
| TimestamptzType: "timestamp", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Java, we map TimestamptzType to "timestamp with local time zone" for Hive version >= 3 TimestamptzType.
case TIMESTAMP:
Types.TimestampType timestampType = (Types.TimestampType) type;
if (HiveVersion.min(HiveVersion.HIVE_3) && timestampType.shouldAdjustToUTC()) {
return "timestamp with local time zone";
}
return "timestamp";But in python it seems we do not have a way to check the version. So I put the "timestamp" here to ensure compatibility. Please correct me if I am wrong here. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just set arbitrary strings in there? If so, I think timestamp with local time zone is more accurate. It would be good to validate using an integration test as well, since we have those tests already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, the content here should be constants in serdeConstants since java use these to test the Hive schema util
Among these types, timestamp with local time zone is a new one added in Hive3: ref.
Interestingly, when doing integration tests (use pyiceberg to write and spark to read) I found that Hive server will raise error for timestampabc but work normally for timestamp arbitrary string. Seems as long as the first word is within serdeConstants there will be no error loading the table. Not sure if timestamp arbitrary string will cause error in other Hive use cases.
How about we default to "timestamp with local time zone" and add a HiveCatalog property (e.g. hive.hive2-compatible-mode) to use "timestamp" here when it set to true?
(Will add the integration test soon)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the context, has been a while since working with the Hive catalog.
How about we default to "timestamp with local time zone" and add a HiveCatalog property (e.g. hive.hive2-compatible-mode) to use "timestamp" here when it set to true?
I think that would be a great solution 👍
Fokko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @HonahX Thanks for fixing this 👍
Co-authored-by: Fokko Driesprong <[email protected]>
|
@Fokko Thanks for reviewing! |
Fixes: #583
cc @Fokko