forked from datafold/data-diff
-
Notifications
You must be signed in to change notification settings - Fork 1
audit: Trino driver for Presto SQL divergences #36
Copy link
Copy link
Open
Labels
P1-highHigh priority, next 2-4 weeksHigh priority, next 2-4 weekstechnical-debtTechnical debt itemsTechnical debt itemstriage
Description
Problem
data_diff/databases/trino.py is a 49-line stub that inherits everything from Presto. It overrides only normalize_timestamp and normalize_uuid. Trino and Presto have diverged significantly:
- Type precision semantics differ
- Timestamp-as-instant vs wall-clock behavior
- Function naming and behavior differences
- Connection parameter handling
The __init__ calls super().__init__() (Presto), meaning any Presto-specific logic silently flows into Trino.
Scope
This is an audit, not a rewrite:
- Document which inherited Presto methods produce different SQL than Trino expects
- Identify any methods that would produce silently wrong results
- Add targeted overrides for critical divergences
- File follow-up issues for anything requiring deeper work
Key Files
data_diff/databases/trino.pydata_diff/databases/presto.py
Acceptance Criteria
- Audit document or issue comments listing all divergence points
- Critical divergences fixed with targeted overrides
- Known limitations documented in code comments
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P1-highHigh priority, next 2-4 weeksHigh priority, next 2-4 weekstechnical-debtTechnical debt itemsTechnical debt itemstriage