feat: Introduce logging context to Tangle #66

morgan-wowk · 2026-01-12T21:20:03Z

Issue
#67

Changes:

Adds trace context helpers
Add trace middleware to apply trace context to API requests
- This helps logging formatters pull trace ids out of context to set them in logs
- Sets the trace id on the response for client consumption
Wraps API requests with trace context using middleware
Passes trace id to pipeline executions using extras parameter
Wraps pipeline execution with trace context (using trace id from extras parameter, or creating a new one if None)

Test pipeline notes:

Requires ytest, starlette, httpx (e.g. pip3 install ...)

cloud_pipelines_backend/instrumentation/trace_middleware.py

cloud_pipelines_backend/orchestrator_sql.py

Ark-kun · 2026-01-13T00:44:51Z

Thank you for this PR.

I'm reviewing the changes and I think we might want to design this slightly differently.
I think that trace IDs are mostly relevant to API Server and API requests.
Orchestrator works autonomously, it's not triggered by API requests, so it does not have them or need them. Instead, the ExecutionNode.id and ContainerExecution.id already serve the same role.

On the other hand, API requests might benefit from trace IDs so that all logging messages that are generated when processing a single API call can be filtered and grouped together.

morgan-wowk · 2026-01-13T02:43:31Z

Thank you for this PR.

I'm reviewing the changes and I think we might want to design this slightly differently. I think that trace IDs are mostly relevant to API Server and API requests. Orchestrator works autonomously, it's not triggered by API requests, so it does not have them or need them. Instead, the ExecutionNode.id and ContainerExecution.id already serve the same role.

On the other hand, API requests might benefit from trace IDs so that all logging messages that are generated when processing a single API call can be filtered and grouped together.

This sounds good to me.

I will make some adjustments. Thanks!

**Changes:** * Adds logging context helpers * Add request middleware to generate unique request id and set it in the logging context around API requests * Sets the x-tangle-request-id on the response for client consumption

morgan-wowk · 2026-01-19T23:12:40Z

Thank you for this PR.

I'm reviewing the changes and I think we might want to design this slightly differently. I think that trace IDs are mostly relevant to API Server and API requests. Orchestrator works autonomously, it's not triggered by API requests, so it does not have them or need them. Instead, the ExecutionNode.id and ContainerExecution.id already serve the same role.

On the other hand, API requests might benefit from trace IDs so that all logging messages that are generated when processing a single API call can be filtered and grouped together.

The work has been re-designed. Here is a summary:

Logging Context Improvements

Summary

Enhanced the Tangle orchestrator logging to automatically include execution context (execution_id, container_execution_id) in all log messages, making it easier to trace and filter logs for specific executions.

Changes Made

1. Added Logging Context to Orchestrator (`orchestrator_sql.py`)

Wrapped execution processing with logging_context.logging_context() to automatically attach execution IDs to all logs within that scope.

For Queued Executions:

# Set execution context for logging
with logging_context.logging_context(
    execution_id=queued_execution.id
):
    _logger.info("Before processing queued execution")
    # ... process execution ...
    _logger.info("After processing queued execution")

For Running Container Executions:

with logging_context.logging_context(
    execution_id=execution_id,
    container_execution_id=running_container_execution.id
):
    _logger.info("Before processing running container execution")
    # ... process container execution ...
    _logger.info("After processing running container execution")

2. Added Context-Aware Logging Configuration (`start_local.py`)

Created a custom formatter and filter that dynamically includes context fields in logs only when they're set.

class LoggingContextFilter(logging.Filter):
    """Adds contextual metadata to log records."""
    
    def filter(self, record: logging.LogRecord) -> bool:
        for key, value in get_all_context_metadata().items():
            if value is not None:
                setattr(record, key, value)
        return True


class ContextAwareFormatter(logging.Formatter):
    """Formatter that dynamically includes context fields only when they're set."""
    
    def format(self, record: logging.LogRecord) -> str:
        base_format = "%(asctime)s [%(levelname)s] %(name)s"
        
        # Collect context fields that are present
        context_parts = []
        context_metadata = get_all_context_metadata()
        for key, value in context_metadata.items():
            if value is not None and hasattr(record, key):
                context_parts.append(f"{key}={value}")
        
        # Add context to format if any exists
        if context_parts:
            base_format += " [" + " ".join(context_parts) + "]"
        
        base_format += ": %(message)s"
        
        formatter = logging.Formatter(base_format)
        return formatter.format(record)

3. Cleaned Up Redundant IDs in Log Messages

Removed execution IDs from log messages where they're already present in the logging context, eliminating duplication.

Examples of cleaned up logs:

# Before: ID appears twice (in context and message)
_logger.info(f"Before processing {queued_execution.id=}")

# After: ID only in context
_logger.info("Before processing queued execution")

# Before
_logger.info(f"Container execution {container_execution.id} is now in state {new_status}")

# After
_logger.info(f"Container execution is now in state {new_status}")

# Before
_logger.info(f"Terminating container execution {container_execution.id}.")

# After
_logger.info("Terminating container execution.")

Log Output Examples

Without Context (non-orchestrator logs)

2026-01-19 14:40:52,611 [INFO] start_local: Starting the orchestrator
2026-01-19 14:40:52,620 [INFO] uvicorn.error: Application startup complete.

With Execution Context Only

2026-01-19 14:30:13,650 [INFO] cloud_pipelines_backend.orchestrator_sql [execution_id=019bd8612bb80ea96aba]: Before processing queued execution
2026-01-19 14:30:13,674 [INFO] cloud_pipelines_backend.orchestrator_sql [execution_id=019bd8612bb80ea96aba]: Execution will reuse the old_execution.id='019b3428a0f1e29c82d4'
2026-01-19 14:30:13,681 [INFO] cloud_pipelines_backend.orchestrator_sql [execution_id=019bd8612bb80ea96aba]: After processing queued execution

With Both Execution and Container Execution Context

2026-01-19 14:30:14,687 [INFO] cloud_pipelines_backend.orchestrator_sql [execution_id=019bd8612bb888cc61d4 container_execution_id=019b342ab120726fdee2]: Before processing running container execution
2026-01-19 14:30:14,701 [INFO] cloud_pipelines_backend.orchestrator_sql [execution_id=019bd8612bb888cc61d4 container_execution_id=019b342ab120726fdee2]: Container execution is now in state RUNNING (was PENDING)
2026-01-19 14:30:14,713 [INFO] cloud_pipelines_backend.orchestrator_sql [execution_id=019bd8612bb888cc61d4 container_execution_id=019b342ab120726fdee2]: After processing running container execution

Benefits

Easy Log Filtering: Filter logs by execution_id or container_execution_id in log aggregation tools
Better Traceability: Follow an execution through its entire lifecycle
No Duplication: IDs appear once in context, not repeated in every message
Dynamic Context: Formatter automatically includes any context fields without code changes
Cleaner Messages: Log messages are more readable without ID clutter

Extensibility

The context system is fully extensible. You can add any arbitrary metadata:

with logging_context.logging_context(
    execution_id=execution.id,
    container_execution_id=container.id,
    pipeline_run_id=pipeline.id,  # Would work if added
    user_id=user.id,               # Would work if added
    custom_field="value"           # Any custom field works
):
    _logger.info("Processing with custom context")

All fields automatically appear in logs:

[INFO] module [execution_id=123 container_execution_id=456 pipeline_run_id=789 user_id=admin custom_field=value]: Processing with custom context

Files Modified

tangle/cloud_pipelines_backend/orchestrator_sql.py - Added logging context to execution processing
tangle/start_local.py - Added LoggingContextFilter and ContextAwareFormatter

Backward Compatibility

✅ All changes are backward compatible
✅ Logs without context still work (no context fields shown)
✅ Existing log parsing tools continue to work
✅ All tests pass

Ark-kun · 2026-01-20T11:17:13Z

cloud_pipelines_backend/orchestrator_sql.py

+            with logging_context.logging_context(
+                execution_id=queued_execution.id
+            ):
+                _logger.info("Before processing queued execution")


Will the execution ID information still be present in non-JSON logs?

It will be, yes. This is achieved through the following change:

class ContextAwareFormatter(logging.Formatter): """Formatter that dynamically includes context fields only when they're set.""" def format(self, record: logging.LogRecord) -> str: """Format log record with dynamic context fields.""" # Base format base_format = "%(asctime)s [%(levelname)s] %(name)s" # Collect context fields that are present context_parts = [] context_metadata = get_all_context_metadata() for key, value in context_metadata.items(): if value is not None and hasattr(record, key): context_parts.append(f"{key}={value}") # Add context to format if any exists if context_parts: base_format += " [" + " ".join(context_parts) + "]" base_format += ": %(message)s" # Create formatter with the dynamic format formatter = logging.Formatter(base_format) return formatter.format(record)

Thank you. I wonder how _logger.exception messages look like.

morgan-wowk · 2026-01-20T18:38:45Z

start_local.py

 import logging.config
+from cloud_pipelines_backend.instrumentation.logging_context import get_all_context_metadata
+
+class LoggingContextFilter(logging.Filter):


Move these two classes to a dedicated, well-named file/module rather than being coupled with an entrypoint script.

Ark-kun · 2026-01-21T23:07:44Z

cloud_pipelines_backend/orchestrator_sql.py

+            execution_id = execution_nodes[0].id if execution_nodes else None
+
+            with logging_context.logging_context(
+                execution_id=execution_id,


Let's call it: execution_node_id

Ark-kun · 2026-01-21T23:10:53Z

cloud_pipelines_backend/orchestrator_sql.py

            # There must be at least one SUCCEEDED/RUNNING/PENDING since non_purged_candidates is non-empty.
            old_execution = non_purged_candidates[-1]
            _logger.info(
-                f"Execution {execution.id=} will reuse the {old_execution.id=} with "


Let's either restore the = or rephrase: Reusing cached execution node {old_execution.id} with

Ark-kun · 2026-01-21T23:15:29Z

start_local.py


 # region: Logging configuration
 import logging.config
+from cloud_pipelines_backend.instrumentation.logging_context import get_all_context_metadata


style: Let's only import modules, not classes/function. https://google.github.io/styleguide/pyguide.html#22-imports

import foo from foo import bar

If the module name is ambiguous/conflicting, tehn it can be renamed:

from launchers import interfaces as launcher_interfaces

Ark-kun · 2026-01-21T23:22:54Z

cloud_pipelines_backend/instrumentation/logging_context.py

Ideally, module names should be plural (e.g. "contexts") or gerund ("logging").
Singular nouns clash with local variables. For example, Kubernetes library has from kubernetes import client and it's a pain.

In this case it's harder to find a good name.
Maybe contextual_logging.py?

Ark-kun · 2026-01-21T23:23:35Z

cloud_pipelines_backend/instrumentation/request_middleware.py

Maybe, api_tracing.py?

Ark-kun · 2026-01-21T23:24:32Z

cloud_pipelines_backend/orchestrator_sql.py

Let's make sure to use Black formatting on all modified files.

Ark-kun · 2026-01-21T23:37:47Z

cloud_pipelines_backend/instrumentation/logging_context.py

+        key: The metadata key (e.g., 'execution_id', 'request_id', 'user_id')
+        value: The value to set
+    """
+    metadata = _context_metadata.get().copy()


What is the main purpose of .copy() here?

Ark-kun · 2026-01-21T23:39:14Z

cloud_pipelines_backend/instrumentation/logging_context.py

+)
+
+
+def set_context_metadata(key: str, value: Any) -> None:


Is there a way to delete/unset a key?
Similar to a dict.

Ark-kun

Thank you. This looks pretty good.
I've left some comments and approved it.

maxy-shpfy requested a review from Ark-kun January 12, 2026 22:55

morgan-wowk force-pushed the trace-id-support branch from 45fca00 to c08c8f2 Compare January 12, 2026 23:11

morgan-wowk commented Jan 12, 2026

View reviewed changes

cloud_pipelines_backend/instrumentation/trace_middleware.py Outdated Show resolved Hide resolved

Ark-kun reviewed Jan 13, 2026

View reviewed changes

cloud_pipelines_backend/orchestrator_sql.py Outdated Show resolved Hide resolved

Ark-kun self-assigned this Jan 13, 2026

morgan-wowk force-pushed the trace-id-support branch from c08c8f2 to e824f64 Compare January 19, 2026 20:33

morgan-wowk changed the title ~~feat: Introduce trace ids to Tangle~~ feat: Introduce logging context to Tangle Jan 19, 2026

morgan-wowk force-pushed the trace-id-support branch 7 times, most recently from 6245345 to 2bfcc23 Compare January 19, 2026 22:53

feat: Introduce trace ids to Tangle

4e7b1a3

**Changes:** * Adds logging context helpers * Add request middleware to generate unique request id and set it in the logging context around API requests * Sets the x-tangle-request-id on the response for client consumption

morgan-wowk force-pushed the trace-id-support branch from 2bfcc23 to 4e7b1a3 Compare January 19, 2026 23:01

Ark-kun reviewed Jan 20, 2026

View reviewed changes

morgan-wowk commented Jan 20, 2026

View reviewed changes

Ark-kun reviewed Jan 21, 2026

View reviewed changes

cloud_pipelines_backend/instrumentation/request_middleware.py

Copy link

Contributor

Ark-kun Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, api_tracing.py?

Ark-kun reviewed Jan 21, 2026

View reviewed changes

cloud_pipelines_backend/orchestrator_sql.py

Copy link

Contributor

Ark-kun Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure to use Black formatting on all modified files.

Ark-kun reviewed Jan 21, 2026

View reviewed changes

Ark-kun approved these changes Jan 22, 2026

View reviewed changes

feat: Introduce logging context to Tangle #66

Are you sure you want to change the base?

feat: Introduce logging context to Tangle #66

Uh oh!

Conversation

morgan-wowk commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Ark-kun commented Jan 13, 2026

Uh oh!

morgan-wowk commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

morgan-wowk commented Jan 19, 2026

Logging Context Improvements

Summary

Changes Made

1. Added Logging Context to Orchestrator (orchestrator_sql.py)

For Queued Executions:

For Running Container Executions:

2. Added Context-Aware Logging Configuration (start_local.py)

3. Cleaned Up Redundant IDs in Log Messages

Log Output Examples

Without Context (non-orchestrator logs)

With Execution Context Only

With Both Execution and Container Execution Context

Benefits

Extensibility

Files Modified

Backward Compatibility

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ark-kun Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ark-kun left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

morgan-wowk commented Jan 12, 2026 •

edited

Loading

morgan-wowk commented Jan 13, 2026 •

edited

Loading

1. Added Logging Context to Orchestrator (`orchestrator_sql.py`)

2. Added Context-Aware Logging Configuration (`start_local.py`)

Ark-kun Jan 21, 2026 •

edited

Loading