Skip to content

mkarots/hookedllm

HookedLLM

Async-first, scoped hook system for LLM observability with SOLID/DI architecture

Python 3.10+ License: MIT Documentation

HookedLLM provides transparent observability for LLM calls through a powerful hook system. Add evaluation, logging, metrics, and custom behaviors to your LLM applications without modifying core application logic.

✨ Key Features

  • 🎯 Scoped Isolation: Named scopes prevent hook interference across application contexts
  • πŸ”§ SOLID/DI Compliant: Full dependency injection support for testing and customization
  • πŸ“¦ Minimal Surface: Single import, simple API: import hookedllm
  • ⚑ Async-First: Built for modern async LLM SDKs
  • 🎨 Type-Safe: Full type hints and IDE autocomplete support
  • πŸ›‘οΈ Resilient: Hook failures never break your LLM calls
  • πŸ”€ Conditional Execution: Run hooks only when rules match (model, tags, metadata)
  • βš™οΈ Config or Code: Define hooks programmatically or via YAML

πŸš€ Quick Start

Installation

# Core package (zero dependencies)
pip install hookedllm

# With OpenAI support
pip install hookedllm[openai]

# With Anthropic/Claude support
pip install hookedllm[anthropic]

# With both OpenAI and Anthropic support
pip install hookedllm[openai,anthropic]

# With all optional dependencies (OpenAI, Anthropic, config support)
pip install hookedllm[all]

Basic Usage

With OpenAI:

import hookedllm
from openai import AsyncOpenAI

# Define a simple hook
async def log_usage(call_input, call_output, context):
    print(f"Model: {call_input.model}")
    print(f"Tokens: {call_output.usage.get('total_tokens', 0)}")

# Register hook to a scope
hookedllm.scope("evaluation").after(log_usage)

# Wrap your client with the scope
client = hookedllm.wrap(AsyncOpenAI(), scope="evaluation")

# Use normally - hooks execute automatically!
response = await client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)

With Anthropic/Claude:

import hookedllm
from anthropic import AsyncAnthropic

# Same hook works for both providers!
async def log_usage(call_input, call_output, context):
    print(f"Provider: {context.provider}, Model: {call_input.model}")
    if call_output.usage:
        total = call_output.usage.get("total_tokens", 0)
        print(f"Tokens: {total}")

# Register hook
hookedllm.scope("evaluation").after(log_usage)

# Wrap Anthropic client - automatic provider detection!
client = hookedllm.wrap(AsyncAnthropic(), scope="evaluation")

# Use normally - hooks execute automatically!
response = await client.messages.create(
    model="claude-3-haiku-20240307",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
    metadata={"hookedllm_tags": ["example"]}  # Note: Anthropic uses metadata, not extra_body
)

πŸ“– Examples

Explore the examples/ directory for complete, runnable demonstrations:

Getting Started

  • simple_demo.py - Your first hookedllm program

    • Complete working example with real LLM calls
    • Automatic metrics tracking with MetricsHook
    • Response evaluation with EvaluationHook
    • Perfect starting point for new users
  • basic_usage.py - Core concepts walkthrough

    • Simple hook registration
    • Scoped vs global hooks
    • Conditional rules with when
    • Multiple scope usage

Advanced Features

  • global_hooks_demo.py - Global hooks in action

    • 5 different LLM calls with global before/after hooks
    • Shows all data provided by the framework
    • Demonstrates hook execution flow
    • Metrics aggregation across calls
  • scopes_demo.py - Scope isolation deep dive

    • Prevents hook interference across contexts
    • Development vs production vs evaluation scopes
    • Multi-scope client usage
    • Real-world use case examples
  • evaluation_and_metrics.py - Built-in helpers

    • Using MetricsHook for automatic tracking
    • Using EvaluationHook for quality scoring
    • Conditional evaluation (only for specific models)
    • Multiple scope combinations

Integrations

  • integrations/langfuse_integration.py - Langfuse observability

    • Automatic trace and generation tracking
    • Token usage and cost monitoring
    • Error tracking with full context
    • Metadata enrichment
  • integrations/opentelemetry_integration.py - OpenTelemetry tracing

    • Distributed tracing for LLM calls
    • Semantic conventions for LLM observability
    • Span creation with attributes and events
    • Integration with existing OTel infrastructure

Running the Examples

# Install with OpenAI support
pip install -e .[openai]

# Or install with Anthropic support
pip install -e .[anthropic]

# Or install with both
pip install -e .[openai,anthropic]

# Set your API keys
export OPENAI_API_KEY=your-key-here
export ANTHROPIC_API_KEY=your-key-here

# Run any example
python examples/simple_demo.py
python examples/scopes_demo.py
python examples/anthropic_simple_example.py  # Anthropic example
python examples/integrations/langfuse_integration.py

Each example includes:

  • βœ… Complete, runnable code
  • πŸ“ Detailed inline comments
  • πŸš€ Setup instructions
  • πŸ’‘ Real-world use cases
  • 🎯 Best practices

πŸ“š Core Concepts

Scopes

Scopes isolate hooks to specific parts of your application:

# Evaluation scope
hookedllm.scope("evaluation").after(evaluate_response)
hookedllm.scope("evaluation").after(calculate_metrics)

# Production scope
hookedllm.scope("production").after(production_logger)
hookedllm.scope("production").error(alert_on_error)

# Clients opt into scopes
eval_client = hookedllm.wrap(AsyncOpenAI(), scope="evaluation")
prod_client = hookedllm.wrap(AsyncOpenAI(), scope="production")

# Each client only runs its scope's hooks - no interference!

Hook Types

Four hook types cover the entire call lifecycle:

# Before: runs before LLM call
async def before_hook(call_input, context):
    context.metadata["user_id"] = "abc123"

# After: runs after successful call
async def after_hook(call_input, call_output, context):
    print(f"Response: {call_output.text}")

# Error: runs on failure
async def error_hook(call_input, error, context):
    print(f"Error: {error}")

# Finally: always runs with complete result
async def finally_hook(result):
    print(f"Took {result.elapsed_ms}ms")

hookedllm.before(before_hook)
hookedllm.after(after_hook)
hookedllm.error(error_hook)
hookedllm.finally_(finally_hook)

Conditional Rules

Execute hooks only when conditions match:

# Only for GPT-4
hookedllm.scope("evaluation").after(
    expensive_eval,
    when=hookedllm.when.model("gpt-4")
)

# Only in production
hookedllm.after(
    prod_logger,
    when=hookedllm.when.tag("production")
)

# Complex rules with composition
hookedllm.after(
    my_hook,
    when=(
        hookedllm.when.model("gpt-4") &
        hookedllm.when.tag("production") &
        ~hookedllm.when.tag("test")
    )
)

# Custom predicates
hookedllm.after(
    premium_hook,
    when=lambda call_input, ctx: ctx.metadata.get("tier") == "premium"
)

Global + Scoped Hooks

Combine global hooks (run everywhere) with scoped hooks:

# Global hook - runs for ALL clients
hookedllm.finally_(track_all_metrics)

# Scoped hooks - only for specific clients
hookedllm.scope("evaluation").after(evaluate)
hookedllm.scope("production").error(alert)

# Evaluation client gets: track_all_metrics + evaluate
eval_client = hookedllm.wrap(AsyncOpenAI(), scope="evaluation")

# Production client gets: track_all_metrics + alert
prod_client = hookedllm.wrap(AsyncOpenAI(), scope="production")

Multiple Scopes

Clients can use multiple scopes:

hookedllm.scope("logging").finally_(log_call)
hookedllm.scope("metrics").finally_(track_metrics)
hookedllm.scope("evaluation").after(evaluate)

# Client with all three scopes
client = hookedllm.wrap(
    AsyncOpenAI(),
    scope=["logging", "metrics", "evaluation"]
)

# Runs: log_call + track_metrics + evaluate

πŸ§ͺ Testing with Dependency Injection

HookedLLM is fully testable through dependency injection:

import hookedllm
from unittest.mock import Mock

def test_hook_execution():
    # Create mock dependencies
    mock_registry = Mock(spec=hookedllm.ScopeRegistry)
    mock_executor = Mock(spec=hookedllm.HookExecutor)
    
    # Configure mocks
    mock_scope = Mock()
    mock_registry.get_scopes_for_client.return_value = [mock_scope]
    
    # Create context with mocks
    ctx = hookedllm.create_context(
        registry=mock_registry,
        executor=mock_executor
    )
    
    # Test
    ctx.scope("test").after(my_hook)
    client = ctx.wrap(FakeClient(), scope="test")
    
    # Assert
    assert mock_executor.execute_after.called

πŸ—οΈ Architecture

HookedLLM follows SOLID principles with full dependency injection:

  • Single Responsibility: Separate storage, execution, and registry
  • Dependency Inversion: Depends on Protocol abstractions
  • Liskov Substitution: Any implementation of protocols works
  • Interface Segregation: Focused, minimal interfaces
  • Open/Closed: Extend via hooks and rules without modifying core

See ARCHITECTURE.md for detailed design documentation.

πŸ“– Advanced Usage

Custom Error Handling

def my_error_handler(error, context):
    # Custom handling for hook errors
    logger.error(f"Hook failed in {context}: {error}")

executor = hookedllm.DefaultHookExecutor(
    error_handler=my_error_handler,
    logger=my_logger
)

ctx = hookedllm.create_context(executor=executor)
client = ctx.wrap(AsyncOpenAI())

Evaluation Hook Example

async def evaluate_response(call_input, call_output, context):
    """Evaluate LLM responses for quality."""
    # Build evaluation prompt
    eval_prompt = f"""
    Evaluate this response for clarity and accuracy:
    
    Query: {call_input.messages[-1].content}
    Response: {call_output.text}
    
    Return JSON: {{"clarity": 0-1, "accuracy": 0-1}}
    """
    
    # Use separate evaluator client (no hooks to avoid recursion)
    evaluator = AsyncOpenAI()
    eval_result = await evaluator.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": eval_prompt}]
    )
    
    # Store evaluation in metadata
    context.metadata["evaluation"] = eval_result.choices[0].message.content

# Register to evaluation scope
hookedllm.scope("evaluation").after(evaluate_response)

Metrics Collection

metrics = {"calls": 0, "tokens": 0, "errors": 0}

async def track_metrics(result):
    """Track aggregated metrics."""
    metrics["calls"] += 1
    
    if result.error:
        metrics["errors"] += 1
    
    if result.output and result.output.usage:
        metrics["tokens"] += result.output.usage.get("total_tokens", 0)

hookedllm.finally_(track_metrics)

Tags and Metadata

Pass tags and metadata to enable conditional hooks:

OpenAI (uses extra_body):

response = await client.chat.completions.create(
    model="gpt-4",
    messages=[...],
    extra_body={
        "hookedllm_tags": ["production", "critical"],
        "hookedllm_metadata": {
            "user_id": "abc123",
            "user_tier": "premium"
        }
    }
)

Anthropic (uses metadata):

response = await client.messages.create(
    model="claude-3-haiku-20240307",
    messages=[...],
    metadata={
        "hookedllm_tags": ["production", "critical"],
        "hookedllm_metadata": {
            "user_id": "abc123",
            "user_tier": "premium"
        }
    }
)

🀝 Contributing

Contributions welcome! Please see our Contributing Guidelines and Code of Conduct.

πŸ“„ License

MIT License - see LICENSE file for details.

πŸ”’ Security

Please see SECURITY.md for security policy and reporting vulnerabilities.

πŸ™ Acknowledgments

Built with inspiration from middleware patterns, aspect-oriented programming, and functional composition principles.

About

Before & After Hook Chains for LLMs

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors