-
Notifications
You must be signed in to change notification settings - Fork 93
Description
Feature Request
Problem
Currently, Diagram supports construction from a single source: Table, Schema, or module with a schema attribute.
The use case is: when there are multiple connected (with foreign key references between their tables) schemas, and the user wishes to resolve (e.g. to view) these references. However, a user cannot construct a single Diagram from multiple schemas. The workaround
diagrams = [dj.Diagram(schema) for schema in schema_list]
full_diagram: dj.Diagram = sum(diagrams[1::], diagrams[0])Is not efficient, since it scales linearly with respect the number of schemas (let this be Diagram constructor call loads Dependencies:
datajoint-python/datajoint/diagram.py
Line 103 in 77b75e9
| connection.dependencies.load() |
which makes 2 queries to INFORMATION_SCHEMA tables to fetch primary and foreign keys:
datajoint-python/datajoint/dependencies.py
Lines 99 to 135 in 77b75e9
| # load primary key info | |
| keys = self._conn.query( | |
| """ | |
| SELECT | |
| concat('`', table_schema, '`.`', table_name, '`') as tab, column_name | |
| FROM information_schema.key_column_usage | |
| WHERE table_name not LIKE "~%%" AND table_schema in ('{schemas}') AND constraint_name="PRIMARY" | |
| """.format( | |
| schemas="','".join(self._conn.schemas) | |
| ) | |
| ) | |
| pks = defaultdict(set) | |
| for key in keys: | |
| pks[key[0]].add(key[1]) | |
| # add nodes to the graph | |
| for n, pk in pks.items(): | |
| self.add_node(n, primary_key=pk) | |
| # load foreign keys | |
| keys = ( | |
| {k.lower(): v for k, v in elem.items()} | |
| for elem in self._conn.query( | |
| """ | |
| SELECT constraint_name, | |
| concat('`', table_schema, '`.`', table_name, '`') as referencing_table, | |
| concat('`', referenced_table_schema, '`.`', referenced_table_name, '`') as referenced_table, | |
| column_name, referenced_column_name | |
| FROM information_schema.key_column_usage | |
| WHERE referenced_table_name NOT LIKE "~%%" AND (referenced_table_schema in ('{schemas}') OR | |
| referenced_table_schema is not NULL AND table_schema in ('{schemas}')) | |
| """.format( | |
| schemas="','".join(self._conn.schemas) | |
| ), | |
| as_dict=True, | |
| ) | |
| ) |
So INFORMATION_SCHEMA are made.
Requirements
Allow construction of a Diagram from multiple schema names. Construction from multiple schemas should be efficient, making INFORMATION_SCHEMA.
Justification
Adding support for this feature would enable users to efficiently view inter-schema references, even for large numbers of schemas.
Alternative Considerations
In my application, I workaround this by creating a mock class that we can pass to the Diagram constructor to get this behavior:
class AbstractDependencies(object):
"""
Lightweight subclass of dj.connection.Dependencies that is meant
to be passed to dj.Diagram constructor.
"""
def __init__(self, schema_names: list[str], deps: dj.connection.Dependencies):
super().__init__()
if not schema_names:
raise NotImplementedError("No schemas passed")
# Set attributes for dj.Diagram.__init__
self.schema_names = schema_names
self.connection = deps._conn
self.context = dict()
# Edit Dependencies object in-place
deps.clear()
schema_dict: dict[str, dj.VirtualModule] = {
schema: dj.VirtualModule(schema, schema, connection=self.connection)
for schema in self.schema_names
}
deps._conn.schemas = schema_dict
deps.load(force=True)
assert deps._loaded, "Dependencies not loaded"
self.nodes = deps.nodes
self.database = self.schema_names[0]
def to_diagram(self) -> dj.Diagram:
diagram = dj.Diagram(self, self.context)
for node in self.nodes:
if any(
node.startswith("`%s`" % database) for database in self.schema_names
):
diagram.nodes_to_show.add(node)
return diagram
def get_diagram_from_schemas(schemas: list[str]) -> dj.Diagram:
deps: dj.connection.Dependencies = connection.dependencies
abs_deps = AbstractDependencies(schemas, deps)
return abs_deps.to_diagram()This is
Environment
- OS: Linux (relevant for all)
- Python Version: 3.9 (relevant for all)
- MySQL Version: 8.0 (relevant for all)
- MySQL Deployment Strategy: remote (percona-qa.datajoint.io)
- DataJoint Version: 0.14.3