Skip to content

Construct a Diagram from multiple schemas #1190

@ethho

Description

@ethho

Feature Request

Problem

Currently, Diagram supports construction from a single source: Table, Schema, or module with a schema attribute.
The use case is: when there are multiple connected (with foreign key references between their tables) schemas, and the user wishes to resolve (e.g. to view) these references. However, a user cannot construct a single Diagram from multiple schemas. The workaround

diagrams = [dj.Diagram(schema) for schema in schema_list]
full_diagram: dj.Diagram = sum(diagrams[1::], diagrams[0])

Is not efficient, since it scales linearly with respect the number of schemas (let this be $n$). This is because each Diagram constructor call loads Dependencies:

connection.dependencies.load()

which makes 2 queries to INFORMATION_SCHEMA tables to fetch primary and foreign keys:

# load primary key info
keys = self._conn.query(
"""
SELECT
concat('`', table_schema, '`.`', table_name, '`') as tab, column_name
FROM information_schema.key_column_usage
WHERE table_name not LIKE "~%%" AND table_schema in ('{schemas}') AND constraint_name="PRIMARY"
""".format(
schemas="','".join(self._conn.schemas)
)
)
pks = defaultdict(set)
for key in keys:
pks[key[0]].add(key[1])
# add nodes to the graph
for n, pk in pks.items():
self.add_node(n, primary_key=pk)
# load foreign keys
keys = (
{k.lower(): v for k, v in elem.items()}
for elem in self._conn.query(
"""
SELECT constraint_name,
concat('`', table_schema, '`.`', table_name, '`') as referencing_table,
concat('`', referenced_table_schema, '`.`', referenced_table_name, '`') as referenced_table,
column_name, referenced_column_name
FROM information_schema.key_column_usage
WHERE referenced_table_name NOT LIKE "~%%" AND (referenced_table_schema in ('{schemas}') OR
referenced_table_schema is not NULL AND table_schema in ('{schemas}'))
""".format(
schemas="','".join(self._conn.schemas)
),
as_dict=True,
)
)

So $2n$ queries to INFORMATION_SCHEMA are made.

Requirements

Allow construction of a Diagram from multiple schema names. Construction from multiple schemas should be efficient, making $2 = O(1)$ queries (primary key and foreign key query) to INFORMATION_SCHEMA.

Justification

Adding support for this feature would enable users to efficiently view inter-schema references, even for large numbers of schemas.

Alternative Considerations

In my application, I workaround this by creating a mock class that we can pass to the Diagram constructor to get this behavior:

class AbstractDependencies(object):
    """
    Lightweight subclass of dj.connection.Dependencies that is meant
    to be passed to dj.Diagram constructor.
    """

    def __init__(self, schema_names: list[str], deps: dj.connection.Dependencies):
        super().__init__()

        if not schema_names:
            raise NotImplementedError("No schemas passed")
        # Set attributes for dj.Diagram.__init__
        self.schema_names = schema_names
        self.connection = deps._conn
        self.context = dict()

        # Edit Dependencies object in-place
        deps.clear()
        schema_dict: dict[str, dj.VirtualModule] = {
            schema: dj.VirtualModule(schema, schema, connection=self.connection)
            for schema in self.schema_names
        }
        deps._conn.schemas = schema_dict
        deps.load(force=True)
        assert deps._loaded, "Dependencies not loaded"
        self.nodes = deps.nodes
        self.database = self.schema_names[0]

    def to_diagram(self) -> dj.Diagram:
        diagram = dj.Diagram(self, self.context)
        for node in self.nodes:
            if any(
                node.startswith("`%s`" % database) for database in self.schema_names
            ):
                diagram.nodes_to_show.add(node)
        return diagram

def get_diagram_from_schemas(schemas: list[str]) -> dj.Diagram:
    deps: dj.connection.Dependencies = connection.dependencies
    abs_deps = AbstractDependencies(schemas, deps)
    return abs_deps.to_diagram()

This is $O(1)$ but hacky.

Environment

  • OS: Linux (relevant for all)
  • Python Version: 3.9 (relevant for all)
  • MySQL Version: 8.0 (relevant for all)
  • MySQL Deployment Strategy: remote (percona-qa.datajoint.io)
  • DataJoint Version: 0.14.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementIndicates new improvements

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions