HookedLLM

Async-first, scoped hook system for LLM observability with SOLID/DI architecture

HookedLLM provides transparent observability for LLM calls through a powerful hook system. Add evaluation, logging, metrics, and custom behaviors to your LLM applications without modifying core application logic.

✨ Key Features

🎯 Scoped Isolation: Named scopes prevent hook interference across application contexts
🔧 SOLID/DI Compliant: Full dependency injection support for testing and customization
📦 Minimal Surface: Single import, simple API: import hookedllm
⚡ Async-First: Built for modern async LLM SDKs
🎨 Type-Safe: Full type hints and IDE autocomplete support
🛡️ Resilient: Hook failures never break your LLM calls
🔀 Conditional Execution: Run hooks only when rules match (model, tags, metadata)
⚙️ Config or Code: Define hooks programmatically or via YAML

🚀 Quick Start

Installation

# Core package (zero dependencies)
pip install hookedllm

# With OpenAI support
pip install hookedllm[openai]

# With Anthropic/Claude support
pip install hookedllm[anthropic]

# With both OpenAI and Anthropic support
pip install hookedllm[openai,anthropic]

# With all optional dependencies (OpenAI, Anthropic, config support)
pip install hookedllm[all]

Basic Usage

With OpenAI:

import hookedllm
from openai import AsyncOpenAI

# Define a simple hook
async def log_usage(call_input, call_output, context):
    print(f"Model: {call_input.model}")
    print(f"Tokens: {call_output.usage.get('total_tokens', 0)}")

# Register hook to a scope
hookedllm.scope("evaluation").after(log_usage)

# Wrap your client with the scope
client = hookedllm.wrap(AsyncOpenAI(), scope="evaluation")

# Use normally - hooks execute automatically!
response = await client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)

With Anthropic/Claude:

import hookedllm
from anthropic import AsyncAnthropic

# Same hook works for both providers!
async def log_usage(call_input, call_output, context):
    print(f"Provider: {context.provider}, Model: {call_input.model}")
    if call_output.usage:
        total = call_output.usage.get("total_tokens", 0)
        print(f"Tokens: {total}")

# Register hook
hookedllm.scope("evaluation").after(log_usage)

# Wrap Anthropic client - automatic provider detection!
client = hookedllm.wrap(AsyncAnthropic(), scope="evaluation")

# Use normally - hooks execute automatically!
response = await client.messages.create(
    model="claude-3-haiku-20240307",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
    metadata={"hookedllm_tags": ["example"]}  # Note: Anthropic uses metadata, not extra_body
)

📖 Examples

Explore the examples/ directory for complete, runnable demonstrations:

Getting Started

simple_demo.py - Your first hookedllm program
- Complete working example with real LLM calls
- Automatic metrics tracking with MetricsHook
- Response evaluation with EvaluationHook
- Perfect starting point for new users
basic_usage.py - Core concepts walkthrough
- Simple hook registration
- Scoped vs global hooks
- Conditional rules with when
- Multiple scope usage

Advanced Features

global_hooks_demo.py - Global hooks in action
- 5 different LLM calls with global before/after hooks
- Shows all data provided by the framework
- Demonstrates hook execution flow
- Metrics aggregation across calls
scopes_demo.py - Scope isolation deep dive
- Prevents hook interference across contexts
- Development vs production vs evaluation scopes
- Multi-scope client usage
- Real-world use case examples
evaluation_and_metrics.py - Built-in helpers
- Using MetricsHook for automatic tracking
- Using EvaluationHook for quality scoring
- Conditional evaluation (only for specific models)
- Multiple scope combinations

Integrations

integrations/langfuse_integration.py - Langfuse observability
- Automatic trace and generation tracking
- Token usage and cost monitoring
- Error tracking with full context
- Metadata enrichment
integrations/opentelemetry_integration.py - OpenTelemetry tracing
- Distributed tracing for LLM calls
- Semantic conventions for LLM observability
- Span creation with attributes and events
- Integration with existing OTel infrastructure

Running the Examples

# Install with OpenAI support
pip install -e .[openai]

# Or install with Anthropic support
pip install -e .[anthropic]

# Or install with both
pip install -e .[openai,anthropic]

# Set your API keys
export OPENAI_API_KEY=your-key-here
export ANTHROPIC_API_KEY=your-key-here

# Run any example
python examples/simple_demo.py
python examples/scopes_demo.py
python examples/anthropic_simple_example.py  # Anthropic example
python examples/integrations/langfuse_integration.py

Each example includes:

✅ Complete, runnable code
📝 Detailed inline comments
🚀 Setup instructions
💡 Real-world use cases
🎯 Best practices

📚 Core Concepts

Scopes

Scopes isolate hooks to specific parts of your application:

# Evaluation scope
hookedllm.scope("evaluation").after(evaluate_response)
hookedllm.scope("evaluation").after(calculate_metrics)

# Production scope
hookedllm.scope("production").after(production_logger)
hookedllm.scope("production").error(alert_on_error)

# Clients opt into scopes
eval_client = hookedllm.wrap(AsyncOpenAI(), scope="evaluation")
prod_client = hookedllm.wrap(AsyncOpenAI(), scope="production")

# Each client only runs its scope's hooks - no interference!

Hook Types

Four hook types cover the entire call lifecycle:

# Before: runs before LLM call
async def before_hook(call_input, context):
    context.metadata["user_id"] = "abc123"

# After: runs after successful call
async def after_hook(call_input, call_output, context):
    print(f"Response: {call_output.text}")

# Error: runs on failure
async def error_hook(call_input, error, context):
    print(f"Error: {error}")

# Finally: always runs with complete result
async def finally_hook(result):
    print(f"Took {result.elapsed_ms}ms")

hookedllm.before(before_hook)
hookedllm.after(after_hook)
hookedllm.error(error_hook)
hookedllm.finally_(finally_hook)

Conditional Rules

Execute hooks only when conditions match:

# Only for GPT-4
hookedllm.scope("evaluation").after(
    expensive_eval,
    when=hookedllm.when.model("gpt-4")
)

# Only in production
hookedllm.after(
    prod_logger,
    when=hookedllm.when.tag("production")
)

# Complex rules with composition
hookedllm.after(
    my_hook,
    when=(
        hookedllm.when.model("gpt-4") &
        hookedllm.when.tag("production") &
        ~hookedllm.when.tag("test")
    )
)

# Custom predicates
hookedllm.after(
    premium_hook,
    when=lambda call_input, ctx: ctx.metadata.get("tier") == "premium"
)

Global + Scoped Hooks

Combine global hooks (run everywhere) with scoped hooks:

# Global hook - runs for ALL clients
hookedllm.finally_(track_all_metrics)

# Scoped hooks - only for specific clients
hookedllm.scope("evaluation").after(evaluate)
hookedllm.scope("production").error(alert)

# Evaluation client gets: track_all_metrics + evaluate
eval_client = hookedllm.wrap(AsyncOpenAI(), scope="evaluation")

# Production client gets: track_all_metrics + alert
prod_client = hookedllm.wrap(AsyncOpenAI(), scope="production")

Multiple Scopes

Clients can use multiple scopes:

hookedllm.scope("logging").finally_(log_call)
hookedllm.scope("metrics").finally_(track_metrics)
hookedllm.scope("evaluation").after(evaluate)

# Client with all three scopes
client = hookedllm.wrap(
    AsyncOpenAI(),
    scope=["logging", "metrics", "evaluation"]
)

# Runs: log_call + track_metrics + evaluate

🧪 Testing with Dependency Injection

HookedLLM is fully testable through dependency injection:

import hookedllm
from unittest.mock import Mock

def test_hook_execution():
    # Create mock dependencies
    mock_registry = Mock(spec=hookedllm.ScopeRegistry)
    mock_executor = Mock(spec=hookedllm.HookExecutor)
    
    # Configure mocks
    mock_scope = Mock()
    mock_registry.get_scopes_for_client.return_value = [mock_scope]
    
    # Create context with mocks
    ctx = hookedllm.create_context(
        registry=mock_registry,
        executor=mock_executor
    )
    
    # Test
    ctx.scope("test").after(my_hook)
    client = ctx.wrap(FakeClient(), scope="test")
    
    # Assert
    assert mock_executor.execute_after.called

🏗️ Architecture

HookedLLM follows SOLID principles with full dependency injection:

Single Responsibility: Separate storage, execution, and registry
Dependency Inversion: Depends on Protocol abstractions
Liskov Substitution: Any implementation of protocols works
Interface Segregation: Focused, minimal interfaces
Open/Closed: Extend via hooks and rules without modifying core

See ARCHITECTURE.md for detailed design documentation.

📖 Advanced Usage

Custom Error Handling

def my_error_handler(error, context):
    # Custom handling for hook errors
    logger.error(f"Hook failed in {context}: {error}")

executor = hookedllm.DefaultHookExecutor(
    error_handler=my_error_handler,
    logger=my_logger
)

ctx = hookedllm.create_context(executor=executor)
client = ctx.wrap(AsyncOpenAI())

Evaluation Hook Example

async def evaluate_response(call_input, call_output, context):
    """Evaluate LLM responses for quality."""
    # Build evaluation prompt
    eval_prompt = f"""
    Evaluate this response for clarity and accuracy:
    
    Query: {call_input.messages[-1].content}
    Response: {call_output.text}
    
    Return JSON: {{"clarity": 0-1, "accuracy": 0-1}}
    """
    
    # Use separate evaluator client (no hooks to avoid recursion)
    evaluator = AsyncOpenAI()
    eval_result = await evaluator.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": eval_prompt}]
    )
    
    # Store evaluation in metadata
    context.metadata["evaluation"] = eval_result.choices[0].message.content

# Register to evaluation scope
hookedllm.scope("evaluation").after(evaluate_response)

Metrics Collection

metrics = {"calls": 0, "tokens": 0, "errors": 0}

async def track_metrics(result):
    """Track aggregated metrics."""
    metrics["calls"] += 1
    
    if result.error:
        metrics["errors"] += 1
    
    if result.output and result.output.usage:
        metrics["tokens"] += result.output.usage.get("total_tokens", 0)

hookedllm.finally_(track_metrics)

Tags and Metadata

Pass tags and metadata to enable conditional hooks:

OpenAI (uses extra_body):

response = await client.chat.completions.create(
    model="gpt-4",
    messages=[...],
    extra_body={
        "hookedllm_tags": ["production", "critical"],
        "hookedllm_metadata": {
            "user_id": "abc123",
            "user_tier": "premium"
        }
    }
)

Anthropic (uses metadata):

response = await client.messages.create(
    model="claude-3-haiku-20240307",
    messages=[...],
    metadata={
        "hookedllm_tags": ["production", "critical"],
        "hookedllm_metadata": {
            "user_id": "abc123",
            "user_tier": "premium"
        }
    }
)

🤝 Contributing

Contributions welcome! Please see our Contributing Guidelines and Code of Conduct.

📄 License

MIT License - see LICENSE file for details.

🔒 Security

Please see SECURITY.md for security policy and reporting vulnerabilities.

🙏 Acknowledgments

Built with inspiration from middleware patterns, aspect-oriented programming, and functional composition principles.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github		.github
docs		docs
examples		examples
src/hookedllm		src/hookedllm
tests		tests
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DOCUMENTATION_CRITIQUE.md		DOCUMENTATION_CRITIQUE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
RELEASE.md		RELEASE.md
SECURITY.md		SECURITY.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HookedLLM

✨ Key Features

🚀 Quick Start

Installation

Basic Usage

📖 Examples

Getting Started

Advanced Features

Integrations

Running the Examples

📚 Core Concepts

Scopes

Hook Types

Conditional Rules

Global + Scoped Hooks

Multiple Scopes

🧪 Testing with Dependency Injection

🏗️ Architecture

📖 Advanced Usage

Custom Error Handling

Evaluation Hook Example

Metrics Collection

Tags and Metadata

🤝 Contributing

📄 License

🔒 Security

🙏 Acknowledgments

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HookedLLM

✨ Key Features

🚀 Quick Start

Installation

Basic Usage

📖 Examples

Getting Started

Advanced Features

Integrations

Running the Examples

📚 Core Concepts

Scopes

Hook Types

Conditional Rules

Global + Scoped Hooks

Multiple Scopes

🧪 Testing with Dependency Injection

🏗️ Architecture

📖 Advanced Usage

Custom Error Handling

Evaluation Hook Example

Metrics Collection

Tags and Metadata

🤝 Contributing

📄 License

🔒 Security

🙏 Acknowledgments

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages