fix(lineage): unify dbt model producer and consumer nodes (#32)#42
Open
fix(lineage): unify dbt model producer and consumer nodes (#32)#42
Conversation
In dbt mode, a bare SELECT previously emitted a per-statement Output
node while downstream consumers referencing the same model via
`{{ ref(...) }}` emitted separate Table nodes. With matching canonical
names but different node types, the flattener could not merge them,
leaving multi-hop chains (A -> B -> C) as disconnected fragments and
omitting leaf models from the mermaid table view.
Materialize the dbt sink as the canonical relation node and pass its
id to analyze_query as the target, so producer and consumer collapse
into one node and table-level DataFlow edges connect the chain - the
same shape CTAS already produces.
Preserve dbt view materialization, keep producer occurrence metadata on merged model nodes, and treat dbt relation sinks as writes in script dependency views.
Scope `materialized` kwarg parsing to the body of the first `config(...)` call via a paren-depth scanner that respects string literals, eliminating false positives from comments or unrelated SQL. Extend `DbtMaterialization` to cover table/incremental/snapshot/ephemeral/materialized_view, with ephemeral and materialized_view correctly mapped to `NodeType::View`. Pre-declare view-materialized dbt models during DDL pre-collection so consumers resolved before the producer still merge onto the canonical view sink, and rework `getCreatedRelationNodeIds` to identify statement-local projection columns via edge-shape instead of `qualifiedName`, which can be inherited across producer/consumer merges.
Bundle analyze_statement source params into a StatementSource struct so the signature drops under clippy's 7-arg limit, rename the sink node id field to match its broadened semantics (Output node or canonical relation sink), and avoid a String round-trip on the dbt sink id by keeping it as Option<Arc<str>>. Extend dbt materialization parsing to honor multiple config(...) calls (last materialized= wins, matching dbt's override behavior) and predeclare model producers so forward refs resolve without leaving placeholder nodes behind. Extract isCreatedProjectionColumn and isRelationNode helpers in lineageHelpers.ts so the projection detection rule reads as a named predicate rather than an inline boolean soup. Add tests covering unknown adapter materializations, dynamic Jinja materializations, later-config-wins override, and forward-ref resolution.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
analyze_queryas the target, so table-level DataFlow edges are emitted between models (matching the shape CTAS already produces).Before:
supplies,stg_supplies,int_suppliesshow up as disconnected boxes in mermaid andfct_suppliesis missing entirely.After:
supplies -> stg_supplies -> int_supplies -> fct_suppliesrenders as one connected chain.Test plan
dbt_chained_models_unify_producer_and_consumer_nodescovering a 3-model A -> B -> C chain (asserts node unification, absence of stray Output nodes, cross-statement edges, and table-level DataFlow arrows).cargo test --workspacepasses.cargo clippy --workspace --all-targets -- -D warningsclean.cargo fmt --checkclean.raw.supplies -> stg -> int -> fctlineage.just build-wasm && just build-ts && just dev) with a dbt-style 3-model repro, plus diamond and unreferenced-leaf cases.