Async-first, scoped hook system for LLM observability with SOLID/DI architecture
HookedLLM provides transparent observability for LLM calls through a powerful hook system. Add evaluation, logging, metrics, and custom behaviors to your LLM applications without modifying core application logic.
- π― Scoped Isolation: Named scopes prevent hook interference across application contexts
- π§ SOLID/DI Compliant: Full dependency injection support for testing and customization
- π¦ Minimal Surface: Single import, simple API:
import hookedllm - β‘ Async-First: Built for modern async LLM SDKs
- π¨ Type-Safe: Full type hints and IDE autocomplete support
- π‘οΈ Resilient: Hook failures never break your LLM calls
- π Conditional Execution: Run hooks only when rules match (model, tags, metadata)
- βοΈ Config or Code: Define hooks programmatically or via YAML
# Core package (zero dependencies)
pip install hookedllm
# With OpenAI support
pip install hookedllm[openai]
# With Anthropic/Claude support
pip install hookedllm[anthropic]
# With both OpenAI and Anthropic support
pip install hookedllm[openai,anthropic]
# With all optional dependencies (OpenAI, Anthropic, config support)
pip install hookedllm[all]With OpenAI:
import hookedllm
from openai import AsyncOpenAI
# Define a simple hook
async def log_usage(call_input, call_output, context):
print(f"Model: {call_input.model}")
print(f"Tokens: {call_output.usage.get('total_tokens', 0)}")
# Register hook to a scope
hookedllm.scope("evaluation").after(log_usage)
# Wrap your client with the scope
client = hookedllm.wrap(AsyncOpenAI(), scope="evaluation")
# Use normally - hooks execute automatically!
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}]
)With Anthropic/Claude:
import hookedllm
from anthropic import AsyncAnthropic
# Same hook works for both providers!
async def log_usage(call_input, call_output, context):
print(f"Provider: {context.provider}, Model: {call_input.model}")
if call_output.usage:
total = call_output.usage.get("total_tokens", 0)
print(f"Tokens: {total}")
# Register hook
hookedllm.scope("evaluation").after(log_usage)
# Wrap Anthropic client - automatic provider detection!
client = hookedllm.wrap(AsyncAnthropic(), scope="evaluation")
# Use normally - hooks execute automatically!
response = await client.messages.create(
model="claude-3-haiku-20240307",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
metadata={"hookedllm_tags": ["example"]} # Note: Anthropic uses metadata, not extra_body
)Explore the examples/ directory for complete, runnable demonstrations:
-
simple_demo.py- Your first hookedllm program- Complete working example with real LLM calls
- Automatic metrics tracking with
MetricsHook - Response evaluation with
EvaluationHook - Perfect starting point for new users
-
basic_usage.py- Core concepts walkthrough- Simple hook registration
- Scoped vs global hooks
- Conditional rules with
when - Multiple scope usage
-
global_hooks_demo.py- Global hooks in action- 5 different LLM calls with global before/after hooks
- Shows all data provided by the framework
- Demonstrates hook execution flow
- Metrics aggregation across calls
-
scopes_demo.py- Scope isolation deep dive- Prevents hook interference across contexts
- Development vs production vs evaluation scopes
- Multi-scope client usage
- Real-world use case examples
-
evaluation_and_metrics.py- Built-in helpers- Using
MetricsHookfor automatic tracking - Using
EvaluationHookfor quality scoring - Conditional evaluation (only for specific models)
- Multiple scope combinations
- Using
-
integrations/langfuse_integration.py- Langfuse observability- Automatic trace and generation tracking
- Token usage and cost monitoring
- Error tracking with full context
- Metadata enrichment
-
integrations/opentelemetry_integration.py- OpenTelemetry tracing- Distributed tracing for LLM calls
- Semantic conventions for LLM observability
- Span creation with attributes and events
- Integration with existing OTel infrastructure
# Install with OpenAI support
pip install -e .[openai]
# Or install with Anthropic support
pip install -e .[anthropic]
# Or install with both
pip install -e .[openai,anthropic]
# Set your API keys
export OPENAI_API_KEY=your-key-here
export ANTHROPIC_API_KEY=your-key-here
# Run any example
python examples/simple_demo.py
python examples/scopes_demo.py
python examples/anthropic_simple_example.py # Anthropic example
python examples/integrations/langfuse_integration.pyEach example includes:
- β Complete, runnable code
- π Detailed inline comments
- π Setup instructions
- π‘ Real-world use cases
- π― Best practices
Scopes isolate hooks to specific parts of your application:
# Evaluation scope
hookedllm.scope("evaluation").after(evaluate_response)
hookedllm.scope("evaluation").after(calculate_metrics)
# Production scope
hookedllm.scope("production").after(production_logger)
hookedllm.scope("production").error(alert_on_error)
# Clients opt into scopes
eval_client = hookedllm.wrap(AsyncOpenAI(), scope="evaluation")
prod_client = hookedllm.wrap(AsyncOpenAI(), scope="production")
# Each client only runs its scope's hooks - no interference!Four hook types cover the entire call lifecycle:
# Before: runs before LLM call
async def before_hook(call_input, context):
context.metadata["user_id"] = "abc123"
# After: runs after successful call
async def after_hook(call_input, call_output, context):
print(f"Response: {call_output.text}")
# Error: runs on failure
async def error_hook(call_input, error, context):
print(f"Error: {error}")
# Finally: always runs with complete result
async def finally_hook(result):
print(f"Took {result.elapsed_ms}ms")
hookedllm.before(before_hook)
hookedllm.after(after_hook)
hookedllm.error(error_hook)
hookedllm.finally_(finally_hook)Execute hooks only when conditions match:
# Only for GPT-4
hookedllm.scope("evaluation").after(
expensive_eval,
when=hookedllm.when.model("gpt-4")
)
# Only in production
hookedllm.after(
prod_logger,
when=hookedllm.when.tag("production")
)
# Complex rules with composition
hookedllm.after(
my_hook,
when=(
hookedllm.when.model("gpt-4") &
hookedllm.when.tag("production") &
~hookedllm.when.tag("test")
)
)
# Custom predicates
hookedllm.after(
premium_hook,
when=lambda call_input, ctx: ctx.metadata.get("tier") == "premium"
)Combine global hooks (run everywhere) with scoped hooks:
# Global hook - runs for ALL clients
hookedllm.finally_(track_all_metrics)
# Scoped hooks - only for specific clients
hookedllm.scope("evaluation").after(evaluate)
hookedllm.scope("production").error(alert)
# Evaluation client gets: track_all_metrics + evaluate
eval_client = hookedllm.wrap(AsyncOpenAI(), scope="evaluation")
# Production client gets: track_all_metrics + alert
prod_client = hookedllm.wrap(AsyncOpenAI(), scope="production")Clients can use multiple scopes:
hookedllm.scope("logging").finally_(log_call)
hookedllm.scope("metrics").finally_(track_metrics)
hookedllm.scope("evaluation").after(evaluate)
# Client with all three scopes
client = hookedllm.wrap(
AsyncOpenAI(),
scope=["logging", "metrics", "evaluation"]
)
# Runs: log_call + track_metrics + evaluateHookedLLM is fully testable through dependency injection:
import hookedllm
from unittest.mock import Mock
def test_hook_execution():
# Create mock dependencies
mock_registry = Mock(spec=hookedllm.ScopeRegistry)
mock_executor = Mock(spec=hookedllm.HookExecutor)
# Configure mocks
mock_scope = Mock()
mock_registry.get_scopes_for_client.return_value = [mock_scope]
# Create context with mocks
ctx = hookedllm.create_context(
registry=mock_registry,
executor=mock_executor
)
# Test
ctx.scope("test").after(my_hook)
client = ctx.wrap(FakeClient(), scope="test")
# Assert
assert mock_executor.execute_after.calledHookedLLM follows SOLID principles with full dependency injection:
- Single Responsibility: Separate storage, execution, and registry
- Dependency Inversion: Depends on Protocol abstractions
- Liskov Substitution: Any implementation of protocols works
- Interface Segregation: Focused, minimal interfaces
- Open/Closed: Extend via hooks and rules without modifying core
See ARCHITECTURE.md for detailed design documentation.
def my_error_handler(error, context):
# Custom handling for hook errors
logger.error(f"Hook failed in {context}: {error}")
executor = hookedllm.DefaultHookExecutor(
error_handler=my_error_handler,
logger=my_logger
)
ctx = hookedllm.create_context(executor=executor)
client = ctx.wrap(AsyncOpenAI())async def evaluate_response(call_input, call_output, context):
"""Evaluate LLM responses for quality."""
# Build evaluation prompt
eval_prompt = f"""
Evaluate this response for clarity and accuracy:
Query: {call_input.messages[-1].content}
Response: {call_output.text}
Return JSON: {{"clarity": 0-1, "accuracy": 0-1}}
"""
# Use separate evaluator client (no hooks to avoid recursion)
evaluator = AsyncOpenAI()
eval_result = await evaluator.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": eval_prompt}]
)
# Store evaluation in metadata
context.metadata["evaluation"] = eval_result.choices[0].message.content
# Register to evaluation scope
hookedllm.scope("evaluation").after(evaluate_response)metrics = {"calls": 0, "tokens": 0, "errors": 0}
async def track_metrics(result):
"""Track aggregated metrics."""
metrics["calls"] += 1
if result.error:
metrics["errors"] += 1
if result.output and result.output.usage:
metrics["tokens"] += result.output.usage.get("total_tokens", 0)
hookedllm.finally_(track_metrics)Pass tags and metadata to enable conditional hooks:
OpenAI (uses extra_body):
response = await client.chat.completions.create(
model="gpt-4",
messages=[...],
extra_body={
"hookedllm_tags": ["production", "critical"],
"hookedllm_metadata": {
"user_id": "abc123",
"user_tier": "premium"
}
}
)Anthropic (uses metadata):
response = await client.messages.create(
model="claude-3-haiku-20240307",
messages=[...],
metadata={
"hookedllm_tags": ["production", "critical"],
"hookedllm_metadata": {
"user_id": "abc123",
"user_tier": "premium"
}
}
)Contributions welcome! Please see our Contributing Guidelines and Code of Conduct.
MIT License - see LICENSE file for details.
Please see SECURITY.md for security policy and reporting vulnerabilities.
Built with inspiration from middleware patterns, aspect-oriented programming, and functional composition principles.