Skip to content

refactoring table writer for dynamic column handling#236

Merged
ParticularlyPythonicBS merged 2 commits intounstablefrom
ref/table_writer
Dec 11, 2025
Merged

refactoring table writer for dynamic column handling#236
ParticularlyPythonicBS merged 2 commits intounstablefrom
ref/table_writer

Conversation

@ParticularlyPythonicBS
Copy link
Member

@ParticularlyPythonicBS ParticularlyPythonicBS commented Dec 11, 2025

Dynamic Column Handling: Instead of hardcoding SQL strings, we use a helper method that accepts a list of dictionaries. It checks the database schema to see which columns actually exist and constructs the SQL dynamically.

The column names in the Python code now explicitly match the SQL schema.

Summary by CodeRabbit

  • Bug Fixes

    • More reliable database writes with foreign-key validation, safer cleanup on close/exit, and clearer diagnostics.
  • New Features

    • Faster, schema-aware bulk exports for large result sets.
    • Optional storage-level outputs supported and consistent scenario naming in exported results.
  • Refactor

    • Centralized, simplified result-writing pipeline with dynamic schema handling.
  • Chores

    • Updated development tooling version pins.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 11, 2025

Warning

Rate limit exceeded

@ParticularlyPythonicBS has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 0 minutes and 10 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 843747a and 46ecf8b.

📒 Files selected for processing (1)
  • temoa/_internal/table_writer.py (6 hunks)

Walkthrough

Bumps pre-commit hook revisions and refactors the table writer to use schema-driven bulk inserts, adds lifecycle/context-manager helpers and SQL script execution, introduces table-column caching and FK validation, renames several constants, and expands write_results signature with type hints and a save_storage_levels flag.

Changes

Cohort / File(s) Summary
Pre-commit configuration
\.pre-commit-config.yaml
Updated hook revisions: uv-pre-commit 0.9.3 → 0.9.17, asottile/pyupgrade v3.21.0 → v3.21.2, ruff-pre-commit v0.14.1 → v0.14.8. No hooks or args changed.
Table writer refactor
temoa table writer
temoa/_internal/table_writer.py
Replaced hard-coded per-row INSERTs with dynamic, schema-aware bulk inserts via new _get_table_columns and _bulk_insert (with column filtering & caching). Renamed constants to BASIC_OUTPUT_TABLES, OPTIONAL_OUTPUT_TABLES, FLOW_SUMMARY_FILE_LOC, MC_TWEAKS_FILE_LOC. Added context-manager (__enter__/__exit__), __del__, close(), _get_scenario_name, FK validation, execute_script, make_summary_flow_table, make_mc_tweaks_table, and updated write_results signature to accept typed parameters and save_storage_levels. Centralized commits and removed explicit VACUUM calls.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Pay attention to:
    • _get_table_columns — correctness, caching invalidation, and edge cases when schema changes.
    • _bulk_insert — SQL generation, parameter binding, and behavior when record keys don't match columns.
    • write_results signature changes — callers and backward compatibility (new save_storage_levels default).
    • Lifecycle methods (__enter__/__exit__, close, __del__) — resource cleanup and exception-safety.
    • execute_script, make_summary_flow_table, and make_mc_tweaks_table — file/resource handling and SQL execution context.
    • Renamed constants — ensure all references across the codebase updated.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.14% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'refactoring table writer for dynamic column handling' directly and concisely summarizes the main change: replacing hardcoded SQL with dynamic, schema-aware column handling in the table writer module.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8dda04a and 2c285fa.

📒 Files selected for processing (2)
  • .pre-commit-config.yaml (2 hunks)
  • temoa/_internal/table_writer.py (8 hunks)
🧰 Additional context used
🪛 Ruff (0.14.8)
temoa/_internal/table_writer.py

87-87: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


88-88: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


128-128: Possible SQL injection vector through string-based query construction

(S608)


184-184: Unpacked variable e_costs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


221-221: Possible SQL injection vector through string-based query construction

(S608)


236-236: Possible SQL injection vector through string-based query construction

(S608)


247-247: Avoid specifying long messages outside the exception class

(TRY003)


293-293: Avoid specifying long messages outside the exception class

(TRY003)


329-329: Avoid specifying long messages outside the exception class

(TRY003)


388-388: Avoid specifying long messages outside the exception class

(TRY003)


448-448: Avoid specifying long messages outside the exception class

(TRY003)


557-557: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: setup and test (windows-latest, 3.13)
  • GitHub Check: setup and test (macos-latest, 3.13)
  • GitHub Check: setup and test (macos-latest, 3.12)
  • GitHub Check: setup and test (windows-latest, 3.12)
🔇 Additional comments (10)
.pre-commit-config.yaml (1)

4-4: LGTM!

Routine version bumps for pre-commit tools (uv, pyupgrade, ruff). No configuration changes.

Also applies to: 43-43, 50-50

temoa/_internal/table_writer.py (9)

45-69: LGTM!

Good refactoring of module-level constants to SCREAMING_SNAKE_CASE following Python conventions. The separation of BASIC_OUTPUT_TABLES and OPTIONAL_OUTPUT_TABLES improves clarity.


83-89: Verify disabling foreign key constraints is intentional.

PRAGMA foreign_keys = OFF disables foreign key constraint enforcement. While this may improve write performance, it could allow orphaned records or referential integrity violations to go undetected. Ensure this is intentional and that data integrity is validated elsewhere if needed.


91-98: LGTM!

The column caching mechanism is efficient. The PRAGMA table_info query uses string interpolation, but since table names come from internal constants (not user input), the SQL injection risk flagged by static analysis is a false positive in this context.


155-160: Verify poll_emissions handles p_0=None correctly.

When scenario_mode != TemoaMode.MYOPIC, p_0 is set to None and passed to poll_emissions. Ensure the downstream function handles a None value for p_0 gracefully.


218-239: LGTM!

The SQL injection warnings (S608) are false positives here since table names come from hardcoded internal constants. The scenario values are properly parameterized with ? placeholders.


245-270: LGTM!

Clean implementation using the new bulk insert pattern. The records are properly structured with all required fields.


291-321: LGTM!

The emissions writing logic correctly handles both flow-based emissions (with period) and embodied emissions (using vintage as period). The hasattr check appropriately distinguishes between the two emission types.


386-438: LGTM!

The flow table writing is well-structured with a clear mapping from FlowType to table names. Using defaultdict to accumulate records per table before bulk inserting is efficient.


590-603: Consider defensive access for solver results.

If the solver encountered issues, results['Solution'].Constraint may not exist or have the expected structure. Consider adding a guard or try-except if this method could receive incomplete results.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
temoa/_internal/table_writer.py (2)

159-159: Prefix unused variable with underscore.

The e_costs variable is assigned but never used. This is the same issue previously flagged at line 182 (now line 159).

Apply this diff:

-        e_costs, e_flows = poll_emissions(model=model, p_0=value(p_0))
+        _e_costs, e_flows = poll_emissions(model=model, p_0=value(p_0))

637-639: Consider implementing a context manager for robust resource management.

While __del__ provides a safety net, it's not guaranteed to run during interpreter shutdown or with circular references. The previous review comment about implementing __enter__/__exit__ or an explicit close() method remains valid.

Consider adding:

def close(self) -> None:
    """Explicitly close the database connection."""
    if hasattr(self, 'con') and self.con:
        self.con.close()
        self.con = None

def __enter__(self) -> TableWriter:
    return self

def __exit__(self, exc_type, exc_val, exc_tb) -> None:
    self.close()

This allows usage like:

with TableWriter(config) as writer:
    writer.write_results(model)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2c285fa and 0399429.

📒 Files selected for processing (1)
  • temoa/_internal/table_writer.py (8 hunks)
🧰 Additional context used
🪛 Ruff (0.14.8)
temoa/_internal/table_writer.py

127-127: Possible SQL injection vector through string-based query construction

(S608)


182-182: Unpacked variable e_costs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


219-219: Possible SQL injection vector through string-based query construction

(S608)


234-234: Possible SQL injection vector through string-based query construction

(S608)


245-245: Avoid specifying long messages outside the exception class

(TRY003)


291-291: Avoid specifying long messages outside the exception class

(TRY003)


327-327: Avoid specifying long messages outside the exception class

(TRY003)


386-386: Avoid specifying long messages outside the exception class

(TRY003)


446-446: Avoid specifying long messages outside the exception class

(TRY003)


555-555: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: setup and test (macos-latest, 3.13)
  • GitHub Check: setup and test (windows-latest, 3.12)
  • GitHub Check: setup and test (macos-latest, 3.12)
🔇 Additional comments (13)
temoa/_internal/table_writer.py (13)

45-69: LGTM! Constants renamed to follow PEP 8 conventions.

The rename from lowercase to uppercase (e.g., basic_output_tablesBASIC_OUTPUT_TABLES) correctly follows Python naming conventions for module-level constants.


79-81: LGTM! Cache added for performance optimization.

The table columns cache avoids repeated PRAGMA calls, which is a sensible performance improvement.


90-97: LGTM! Column introspection method is well-designed.

The method safely caches column information from the database schema. The static analysis warning about SQL injection (line 93) is a false positive since table_name is always sourced from internal constants, not user input.


99-134: LGTM! Dynamic bulk insert implementation is solid.

The method addresses the previous review concern about KeyError by using rec.get(col) at line 132. The design correctly:

  • Validates columns against schema
  • Handles missing keys gracefully (defaulting to None)
  • Uses consistent column ordering

The SQL injection warning (line 127) is a false positive since table_name is always from internal constants.


211-214: LGTM! Centralized scenario naming logic.

The helper method consolidates scenario name construction, improving maintainability and consistency across all write methods.


243-268: Verify handling of missing technology sectors.

The refactoring to use _bulk_insert is correct. However, line 257 uses tech_sectors.get(sli.t) which returns None for missing technologies, while other methods like _write_cost_rows (line 568) use .get() with similar behavior. This is now consistent across the codebase, but missing technologies will silently insert None into the sector column.

Consider whether missing technologies should fail fast or insert None. If they should fail, add validation:

sector = self.tech_sectors.get(sli.t)
if sector is None:
    logger.warning('Unknown technology %s, using NULL sector', sli.t)

289-319: LGTM! Emissions writing correctly refactored.

The method properly checks dependencies before proceeding and handles both flow-based and embodied emissions with appropriate period assignment logic. The refactoring to use _bulk_insert is correct.


321-382: LGTM! Capacity tables refactored consistently.

The refactoring correctly uses _bulk_insert for all three capacity types (built, net, retired). The consistent use of tech_sectors.get(t) addresses the previous review comment about inconsistent access patterns.


384-436: LGTM! Flow tables refactored with efficient batching.

The use of defaultdict to accumulate records by table type before bulk inserting is an efficient approach that reduces database round-trips. The conditional column naming logic for curtailment vs flow is correct.


484-511: LGTM! Flow balance check improved with defensive access.

The method now uses .get() with default values (lines 495-498), which addresses the previous review comment about potential KeyError. The improved diagnostic logging at lines 506 and 509 provides better visibility into imbalances.


587-600: LGTM! Dual variables writing refactored cleanly.

The list comprehension approach (lines 591-598) is more Pythonic and pairs well with the _bulk_insert refactoring.


602-619: LGTM! Tweaks writing refactored consistently.

The refactoring follows the same pattern as other write methods, using record accumulation and _bulk_insert.


631-635: LGTM! Helper methods for SQL script execution.

The new helper methods (make_summary_flow_table and make_mc_tweaks_table) provide clean abstractions for table creation from SQL scripts.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
temoa/_internal/table_writer.py (1)

252-252: Prefix unused variable with underscore.

The e_costs variable is assigned but never used. This was marked as addressed in a past review, but the static analysis still flags it (RUF059).

-            e_costs, e_flows = poll_emissions(model=model)
+            _e_costs, e_flows = poll_emissions(model=model)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0399429 and 2ff868d.

📒 Files selected for processing (1)
  • temoa/_internal/table_writer.py (8 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-03T13:34:35.907Z
Learnt from: ParticularlyPythonicBS
Repo: TemoaProject/temoa PR: 184
File: temoa/data_processing/database_util.py:109-118
Timestamp: 2025-11-03T13:34:35.907Z
Learning: In the Temoa project, SQL injection vulnerabilities in database utility classes (such as temoa/data_processing/database_util.py) are considered low concern due to the data access patterns and should not be flagged during code reviews. Technical debt for SQL injection hardening should be tracked separately as issues rather than blocking PRs.

Applied to files:

  • temoa/_internal/table_writer.py
📚 Learning: 2025-11-03T13:34:35.907Z
Learnt from: ParticularlyPythonicBS
Repo: TemoaProject/temoa PR: 184
File: temoa/data_processing/database_util.py:109-118
Timestamp: 2025-11-03T13:34:35.907Z
Learning: In the Temoa project, SQL injection vulnerabilities in the database utility classes (such as temoa/data_processing/database_util.py) are considered low concern due to the data access patterns and should not be flagged during reviews. Technical debt for SQL injection hardening should be tracked separately as issues rather than blocking PRs.

Applied to files:

  • temoa/_internal/table_writer.py
🪛 Ruff (0.14.8)
temoa/_internal/table_writer.py

107-107: Do not catch blind exception: Exception

(BLE001)


145-145: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


187-187: Possible SQL injection vector through string-based query construction

(S608)


205-205: Avoid specifying long messages outside the exception class

(TRY003)


246-246: Avoid specifying long messages outside the exception class

(TRY003)


252-252: Unpacked variable e_costs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


261-261: Avoid specifying long messages outside the exception class

(TRY003)


297-297: Possible SQL injection vector through string-based query construction

(S608)


312-312: Possible SQL injection vector through string-based query construction

(S608)


323-323: Avoid specifying long messages outside the exception class

(TRY003)


369-369: Avoid specifying long messages outside the exception class

(TRY003)


405-405: Avoid specifying long messages outside the exception class

(TRY003)


464-464: Avoid specifying long messages outside the exception class

(TRY003)


524-524: Avoid specifying long messages outside the exception class

(TRY003)


636-636: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: setup and test (windows-latest, 3.12)
  • GitHub Check: setup and test (macos-latest, 3.12)
  • GitHub Check: setup and test (windows-latest, 3.13)
🔇 Additional comments (1)
temoa/_internal/table_writer.py (1)

159-195: LGTM! Schema-driven bulk insert is well-designed.

The dynamic column matching via schema reflection is a solid improvement over hardcoded SQL:

  • Automatically adapts to schema changes
  • Uses rec.get(col) to handle missing keys gracefully (addressed from past review)
  • Validates column existence before insertion
  • Sorts columns for consistency

The f-string SQL construction is flagged by static analysis, but per project learnings, SQL injection hardening is tracked separately and shouldn't block this refactor.

Comment on lines +150 to +158
def _get_table_columns(self, table_name: str) -> set[str]:
"""Returns a set of column names for the given table."""
if table_name not in self._table_columns_cache:
cursor = self.con.execute(f'PRAGMA table_info({table_name})')
# row[1] is the column name in sqlite PRAGMA table_info
columns = {row[1] for row in cursor.fetchall()}
self._table_columns_cache[table_name] = columns
return self._table_columns_cache[table_name]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider validating table names for defense-in-depth.

While SQL injection is low concern per project learnings, validating table_name against a whitelist of known output tables would provide defense-in-depth and satisfy static analysis.

     def _get_table_columns(self, table_name: str) -> set[str]:
         """Returns a set of column names for the given table."""
+        # Validate table name to prevent SQL injection
+        valid_tables = set(BASIC_OUTPUT_TABLES + OPTIONAL_OUTPUT_TABLES)
+        if table_name not in valid_tables:
+            raise ValueError(f'Invalid table name: {table_name}')
+        
         if table_name not in self._table_columns_cache:
             cursor = self.con.execute(f'PRAGMA table_info({table_name})')
             # row[1] is the column name in sqlite PRAGMA table_info
             columns = {row[1] for row in cursor.fetchall()}
             self._table_columns_cache[table_name] = columns
         return self._table_columns_cache[table_name]

Based on learnings, SQL injection hardening should be tracked separately as technical debt rather than blocking this PR.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In temoa/_internal/table_writer.py around lines 150 to 158, validate the
incoming table_name against a whitelist of allowed output table names before
using it in the PRAGMA query to provide defense-in-depth; add or use an existing
set/list of known table names (e.g. self._allowed_tables or a module-level
constant), raise a clear ValueError if table_name is not in that whitelist, and
only then perform the PRAGMA and cache logic so unrecognized names never reach
the SQL call.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
temoa/_internal/table_writer.py (3)

76-83: Connection lifecycle and context management are solid; narrow close() exception handling.

The connection property and context-manager (__enter__/__exit__) pattern look good and make DB usage safer. However, close() still catches a blind Exception, which can hide unexpected errors.

Consider narrowing the exception types to SQLite/OS-level errors:

     def close(self) -> None:
         """Explicitly close the database connection."""
         if self.con:
             try:
                 self.con.close()
-            except Exception as e:
-                logger.warning('Error closing database connection: %s', e)
+            except (sqlite3.Error, OSError) as e:
+                logger.warning('Error closing database connection: %s', e)
             finally:
                 self.con = None

__del__ delegating to close() then remains safe and idempotent.

Also applies to: 94-125, 718-719


126-158: Foreign-key validation behavior looks good; consider logger.exception for failures.

_validate_foreign_keys() correctly re-enables FKs, runs PRAGMA foreign_key_check, and logs violations without crashing writes, matching the docstring.

For the error path, switching to logger.exception would capture a traceback and align with the TRY400 hint and earlier review:

-        except sqlite3.Error as e:
-            logger.error('Error during foreign key validation: %s', e)
+        except sqlite3.Error:
+            logger.exception('Error during foreign key validation')

159-167: Schema-aware _bulk_insert is useful; clarify/guard assumptions about record shapes.

The combination of _get_table_columns + _bulk_insert is a nice improvement over hard-coded SQL. However, _bulk_insert infers target_columns from only the first record and then indexes every subsequent record with those columns:

data_keys = set(records[0].keys())
target_columns = list(data_keys.intersection(valid_columns))
...
rows_to_insert.append(tuple(rec[col] for col in target_columns))

If later records ever:

  • lack one of these keys, you'll get a KeyError; or
  • contain extra keys not present in the first record, those columns will never be written.

Given current call sites appear to build homogeneous dicts, this is likely fine today, but it's brittle for future extensions. Two options:

  • Add a brief docstring note that all records must share the same keys; and/or
  • Add a validation pass that checks each rec has at least target_columns and raises a clear ValueError if not, instead of a KeyError.

Also applies to: 168-203

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2ff868d and 843747a.

📒 Files selected for processing (1)
  • temoa/_internal/table_writer.py (6 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: ParticularlyPythonicBS
Repo: TemoaProject/temoa PR: 184
File: temoa/data_processing/database_util.py:109-118
Timestamp: 2025-11-03T13:34:35.907Z
Learning: In the Temoa project, SQL injection vulnerabilities in the database utility classes (such as temoa/data_processing/database_util.py) are considered low concern due to the data access patterns and should not be flagged during reviews. Technical debt for SQL injection hardening should be tracked separately as issues rather than blocking PRs.
Learnt from: ParticularlyPythonicBS
Repo: TemoaProject/temoa PR: 184
File: temoa/data_processing/database_util.py:109-118
Timestamp: 2025-11-03T13:34:35.907Z
Learning: In the Temoa project, SQL injection vulnerabilities in database utility classes (such as temoa/data_processing/database_util.py) are considered low concern due to the data access patterns and should not be flagged during code reviews. Technical debt for SQL injection hardening should be tracked separately as issues rather than blocking PRs.
📚 Learning: 2025-11-03T13:34:35.907Z
Learnt from: ParticularlyPythonicBS
Repo: TemoaProject/temoa PR: 184
File: temoa/data_processing/database_util.py:109-118
Timestamp: 2025-11-03T13:34:35.907Z
Learning: In the Temoa project, SQL injection vulnerabilities in database utility classes (such as temoa/data_processing/database_util.py) are considered low concern due to the data access patterns and should not be flagged during code reviews. Technical debt for SQL injection hardening should be tracked separately as issues rather than blocking PRs.

Applied to files:

  • temoa/_internal/table_writer.py
📚 Learning: 2025-11-03T13:34:35.907Z
Learnt from: ParticularlyPythonicBS
Repo: TemoaProject/temoa PR: 184
File: temoa/data_processing/database_util.py:109-118
Timestamp: 2025-11-03T13:34:35.907Z
Learning: In the Temoa project, SQL injection vulnerabilities in the database utility classes (such as temoa/data_processing/database_util.py) are considered low concern due to the data access patterns and should not be flagged during reviews. Technical debt for SQL injection hardening should be tracked separately as issues rather than blocking PRs.

Applied to files:

  • temoa/_internal/table_writer.py
🧬 Code graph analysis (1)
temoa/_internal/table_writer.py (4)
temoa/types/model_types.py (5)
  • FI (242-252)
  • FlowType (275-282)
  • EI (232-239)
  • TemoaModel (148-224)
  • CapData (266-271)
temoa/_internal/table_data_puller.py (2)
  • poll_emissions (561-749)
  • poll_capacity_results (49-94)
temoa/_internal/data_brick.py (8)
  • DataBrick (23-83)
  • emission_cost_data (80-83)
  • emission_flows (54-55)
  • capacity_data (58-59)
  • flow_data (62-63)
  • cost_data (70-71)
  • exchange_cost_data (74-77)
  • obj_data (66-67)
temoa/_internal/exchange_tech_cost_ledger.py (1)
  • CostType (24-32)
🪛 Ruff (0.14.8)
temoa/_internal/table_writer.py

102-102: Avoid specifying long messages outside the exception class

(TRY003)


121-121: Do not catch blind exception: Exception

(BLE001)


157-157: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


196-196: Possible SQL injection vector through string-based query construction

(S608)


255-255: Unpacked variable e_costs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


297-297: Possible SQL injection vector through string-based query construction

(S608)


300-300: Possible SQL injection vector through string-based query construction

(S608)


312-312: Possible SQL injection vector through string-based query construction

(S608)


323-323: Avoid specifying long messages outside the exception class

(TRY003)


369-369: Avoid specifying long messages outside the exception class

(TRY003)


405-405: Avoid specifying long messages outside the exception class

(TRY003)


464-464: Avoid specifying long messages outside the exception class

(TRY003)


524-524: Avoid specifying long messages outside the exception class

(TRY003)


636-636: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: setup and test (windows-latest, 3.12)
  • GitHub Check: setup and test (macos-latest, 3.13)
  • GitHub Check: setup and test (macos-latest, 3.12)
  • GitHub Check: setup and test (windows-latest, 3.13)
🔇 Additional comments (13)
temoa/_internal/table_writer.py (13)

205-248: Top-level write_results flow and FK validation/commit pattern look reasonable.

The restructured write_results now centralizes scenario clearing, tech-sector initialization, emissions/costs/flows writing, and FK validation in a single try/finally block. Using _validate_foreign_keys() followed by a final commit() ensures FK checks run once after all per-table writes (even though many helpers also commit), which is acceptable for this refactor.

No changes requested here.


262-283: write_mc_results integration with DataBrick looks consistent.

write_mc_results correctly mirrors the main path: it wires brick.emission_flows into emission_register, merges emission_cost_data into _insert_cost_results, and reuses the same capacity, summary-flow, and objective writers, with FK validation + commit in finally.

No issues spotted in this block.


289-316: Scenario naming and clearing logic are coherent.

_get_scenario_name()'s scenario-<iteration> convention, combined with:

  • clear_scenario() deleting rows with scenario == config.scenario in basic/optional tables, and
  • clear_iterative_runs() deleting rows with scenario LIKE '<scenario>-%',

ensures both base and iterative scenarios are cleared before a fresh run when append=False.

This matches the expanded iteration-aware API and looks correct.


321-347: write_storage_level implementation aligns with the new bulk-insert pattern.

write_storage_level:

  • Asserts tech_sectors is initialized (a clear programming error if not),
  • Uses _get_scenario_name(iteration) for MM/MC compatibility,
  • Builds homogeneous record dicts including sector via self.tech_sectors.get(sli.t), and
  • Delegates to _bulk_insert('output_storage_level', records) with a local commit.

This is consistent with the rest of the writer refactor; no changes needed.


399-460: Capacity writing logic is consistent with the new dynamic schema approach.

write_capacity_tables and _insert_capacity_results now:

  • Use poll_capacity_results once,
  • Derive a single scenario string,
  • Build three homogeneous record lists (built, net, retired) with consistent keys including sector from tech_sectors, and
  • Use _bulk_insert for each target table with a single commit at the end.

This is clear, avoids per-row INSERTs, and uses tech_sectors.get(t) consistently to avoid KeyErrors on missing mappings.

No changes requested.


520-560: Summary-flow aggregation and writing look correct.

_insert_summary_flow_results:

  • Requires initialized tech_sectors,
  • Aggregates only FlowType.OUT across seasons/times into a keyed defaultdict,
  • Filters out negligible flows via epsilon,
  • Writes a clean, homogeneous record list into output_flow_out_summary via _bulk_insert, then commits.

This matches the intended semantics of a summary-outflow table and seems correct.


562-589: Flow balance check uses safer defaults and clearer logging.

The updated check_flow_balance:

  • Uses .get(..., 0) for IN/OUT/LOST/FLEX to avoid KeyError,
  • Excludes storage technologies and known special commodities before checking,
  • Applies a relative tolerance on finite input and an absolute tolerance otherwise, and
  • Logs clear warnings with the offending FI index and delta.

This is a good, non-intrusive consistency check on the in-memory flow register.

No changes requested.


594-608: Cost writing path correctly merges emission and exchange costs; cloning regular entries avoids side effects.

write_costs and _insert_cost_results:

  • Derive the discounting base year appropriately per scenario_mode,
  • Use poll_cost_results once,
  • If emission_entries is provided, shallow-copy regular_entries to avoid mutating the caller’s dict and overlay emission costs via .update(), then
  • Write both regular and exchange entries through _write_cost_rows.

This structure is clear and avoids the earlier side effect on regular_entries passed in.

Also applies to: 609-628


630-667: _write_cost_rows is consistent and robust against missing cost components.

The function:

  • Sorts keys for deterministic row order,
  • Uses tech_sectors.get(t) to avoid hard failures on missing sector mappings,
  • Reads each CostType via .get(..., 0) so missing cost components become zeros rather than raising, and
  • Uses _bulk_insert('output_cost', records) with a single commit.

This looks correct and handling of optional cost components is sensible.


668-681: Dual-variable writing matches the bulk-insert style.

write_dual_variables:

  • Derives a scenario-specific name via _get_scenario_name,
  • Iterates results['Solution'].Constraint.items() and extracts data['Dual'],
  • Builds uniform record dicts and inserts via _bulk_insert('output_dual_variable', ...) followed by commit.

Assuming the SolverResults structure is as expected, this is straightforward and consistent.

No changes requested.


683-700: write_tweaks integration with MC change records looks good.

For Monte Carlo runs, write_tweaks:

  • Uses the iteration-aware scenario name,
  • Serializes param_index via str(...).replace("'", ''), and
  • Bulk-inserts into output_mc_delta with a single commit.

This seems consistent with the new dynamic insertion pattern and the MC SQL script/table naming.

No changes requested.


702-717: execute_script and helper wrappers are clean and flexible.

execute_script handles both resources.abc.Traversable and filesystem paths, reads the SQL, and uses executescript + commit. The dedicated helpers:

def make_summary_flow_table(self) -> None:
    self.execute_script(FLOW_SUMMARY_FILE_LOC)

def make_mc_tweaks_table(self) -> None:
    self.execute_script(MC_TWEAKS_FILE_LOC)

nicely centralize resource locations and table creation.

No issues here.


462-515: Document the intentional exclusion of FlowType.LOST from table output.

FlowType.LOST flows are calculated and tracked in flow_register (as shown in table_data_puller.py) and used in the balance validation in check_flow_balance, but they are not mapped in write_flow_tables and thus are never written to output tables. This appears intentional—the database schema has no column or table for LOST flows anywhere—but a brief comment explaining that LOST flows are diagnostic-only and not persisted to the DB would clarify the design for future maintainers.

Consider adding a comment such as:

# LOST flows are diagnostic-only for balance validation and are not persisted to output tables

@ParticularlyPythonicBS ParticularlyPythonicBS merged commit ad22530 into unstable Dec 11, 2025
9 checks passed
@ParticularlyPythonicBS ParticularlyPythonicBS deleted the ref/table_writer branch December 11, 2025 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant