AWS CDK infrastructure for D&D Buddy - a campaign management tool with AI assistant capabilities.
D&D Buddy uses a serverless architecture built on AWS CDK with four main stacks:
Handles user authentication and authorization.
Resources:
- Cognito User Pool: User authentication with email/username sign-in
- User Pool Client: OAuth2 client for frontend authentication
- Security: Password policy enforcement, email verification, account recovery
Outputs: User Pool ID, Client ID, ARN
Manages all data persistence for the application.
Resources:
-
Campaign Files Bucket (
dnd-buddy-files-{account}): S3 storage for user campaign files- Structure:
{userId}/{campaign}/{category}/{filename} - Categories:
npcs,lore,monsters,sessions,organizations,species,players - Encryption: S3-managed (SSE-S3)
- Structure:
-
Vector Store (
dnd-vec-{account}): S3 Vectors for semantic search- Index:
campaign-vectors-index - Dimensions: 1024 (Cohere embed-english-v3)
- Distance metric: Cosine similarity
- Enables RAG (Retrieval Augmented Generation) for campaign context
- Index:
-
Chat History Table (
dnd-buddy-chat-history): DynamoDB for conversation persistence- Partition key:
SessionId - TTL: 7 days (automatic cleanup)
- Billing: On-demand
- Partition key:
-
WebSocket Connection Table (
dnd-buddy-websocket-connections): DynamoDB for active connections- Partition key:
connectionId - GSI:
UserIdIndexfor querying by user - TTL: Automatic cleanup of stale connections
- Partition key:
Outputs: Bucket names, table names, vector index name
Provides REST API endpoints for file indexing and session management.
Resources:
-
REST API Gateway: CORS-enabled API with Cognito authorization
-
Indexing Lambda (
dnd-buddy-indexing): Updates vector embeddings when files change- Runtime: Python 3.11
- Timeout: 5 minutes
- Memory: 512 MB
- Endpoint:
POST /index
-
Sessions Lambda (
dnd-buddy-sessions): Retrieves conversation history- Runtime: Python 3.11
- Timeout: 30 seconds
- Memory: 256 MB
- Endpoints:
GET /sessions,GET /sessions/{sessionId}
Outputs: API URL, Lambda ARNs
Enables real-time bidirectional communication for AI chat.
Resources:
-
WebSocket API Gateway: Real-time communication with JWT authorization
- Routes:
$connect,$disconnect,$default - Stage:
prod(auto-deploy enabled)
- Routes:
-
Connection Lambda (
dnd-buddy-websocket-connection): Handles WebSocket lifecycle- Runtime: Node.js 24.x
- Handles: Authorization, connect, disconnect events
- Validates JWT tokens from Cognito
-
Agent Lambda (
dnd-buddy-agent): AI assistant with streaming responses- Runtime: Python 3.14 (ARM64)
- Timeout: 5 minutes
- Memory: 1024 MB
- Framework: LangGraph + LangChain
- Models: Configurable (default: Claude Haiku 4.5 + Nova Micro)
Outputs: WebSocket URL, API ID, Lambda ARNs
The D&D Buddy agent is built with LangGraph + LangChain and runs as a Lambda function with a dual-model architecture for cost optimization.
The agent uses two models for different purposes:
-
Planning Model (default: Nova Micro)
- Purpose: Tool selection and execution planning
- Temperature: 0.0 (deterministic)
- Max tokens: 400
- No streaming
- Cost-optimized for iterative reasoning
-
Creative Model (default: Claude Haiku 4.5)
- Purpose: Final response generation
- Temperature: 0.6 (creative)
- Max tokens: 1500
- Streaming enabled
- Used only once per request
Why this matters: The agent may call tools multiple times (up to 3 iterations) to gather information. Using a cheap model for planning and a better model for the final response significantly reduces costs while maintaining quality.
User Message → WebSocket → Agent Lambda
↓
Load conversation history (last 2 messages)
Load campaign context (background + recent sessions)
↓
┌─────────────────────────────────────┐
│ LangGraph Agent Loop (Planning) │
│ Model: Nova Micro (cheap) │
│ Max iterations: 3 │
│ │
│ 1. Analyze user request │
│ 2. Select tools to call │
│ 3. Execute tools with context │
│ 4. Repeat if more info needed │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ Final Response Generation │
│ Model: Claude Haiku (creative) │
│ Streaming: Yes │
│ │
│ Synthesize tool results │
│ Generate helpful response │
│ Stream tokens to WebSocket │
└─────────────────────────────────────┘
↓
Save to DynamoDB (with TTL)
↓
Return response + tools summary
The agent has access to 4 tools for campaign assistance:
-
search_campaign - Semantic search across all campaign files
- Uses S3 Vectors with Cohere embeddings
- Searches: NPCs, monsters, sessions, lore, organizations, species, players
- Parameters:
query(string),top_k(default: 5) - Auto-injected:
user_id,campaign - Returns: Relevant snippets with filenames
-
get_file_content - Retrieves complete file content
- Used when search results reference the same file multiple times
- Parameters:
file_path(relative to campaign root) - Auto-injected:
user_id,campaign - Returns: Full markdown content
-
roll_dice - D&D dice notation parser
- Supports standard notation:
1d20,2d6+3,4d8-2 - Interprets natural language: "roll perception", "attack roll"
- Parameters:
dice_notation(string) - Returns: Roll result with breakdown
- Supports standard notation:
-
get_conversation_history - Retrieves earlier messages
- Used when user references past conversation
- Parameters:
message_count(default: 10) - Auto-injected:
session_id - Cached per Lambda execution
- Returns: List of previous messages
Context Injection: Tools that need user context (search_campaign, get_file_content) automatically receive user_id and campaign. Session-aware tools (get_conversation_history) receive session_id. This keeps tool definitions clean while ensuring security.
Startup Context:
- Last 2 messages from conversation history (last exchange)
- Campaign background context (via semantic search for "world setting themes")
- Recent sessions context (via semantic search for "recent session latest")
On-Demand Context:
- Agent calls
get_conversation_historywhen user references earlier discussion - Full history loaded once and cached per Lambda execution
- Reduces DynamoDB reads and latency
System Prompt:
- Built dynamically with campaign-specific context
- Includes campaign background, recent sessions, tool descriptions
- Emphasizes campaign-specific responses over generic D&D advice
- Enforces concise formatting (100-300 words, structured output)
- Session ownership validation: Session ID must start with
{userId}- - JWT authorization: WebSocket authorizer validates Cognito tokens
- User-scoped paths: All S3 operations scoped to
{userId}/{campaign}/ - No cross-user access: Tools automatically inject authenticated user context
cdk/
├── bin/
│ └── cdk.ts # CDK app entry point
│
├── lib/
│ ├── dnd-buddy-stack.ts # Main stack orchestration
│ └── stacks/
│ ├── auth-stack.ts # Cognito user pool & client
│ ├── storage-stack.ts # S3 buckets, DynamoDB tables, S3 Vectors
│ ├── api-stack.ts # REST API + indexing/sessions Lambdas
│ └── websocket-stack.ts # WebSocket API + agent Lambda
│
├── lambdas/
│ ├── dnd-buddy-agent/ # AI agent (Python 3.14)
│ │ ├── agent.py # LangGraph agent with dual-model architecture
│ │ ├── main.py # Lambda handler with WebSocket streaming
│ │ ├── requirements.txt # Python dependencies
│ │ └── tools/
│ │ ├── __init__.py # Tool exports
│ │ ├── search_campaign.py # S3 Vectors semantic search
│ │ ├── get_file.py # S3 file content retrieval
│ │ ├── roll_dice.py # D&D dice notation parser
│ │ └── get_history.py # DynamoDB conversation history
│ │
│ ├── indexing/ # Vector indexing Lambda (Python 3.11)
│ │ ├── handler.py # Processes files and creates embeddings
│ │ └── requirements.txt
│ │
│ ├── sessions/ # Session management Lambda (Python 3.11)
│ │ ├── handler.py # Retrieves chat history
│ │ └── requirements.txt
│ │
│ └── websocket-connection/ # WebSocket lifecycle Lambda (Node.js 24.x)
│ └── index.js # Handles connect/disconnect + JWT auth
│
├── test/
│ └── cdk.test.ts # CDK stack tests
│
├── cdk.json # CDK configuration
├── package.json # Node.js dependencies
└── tsconfig.json # TypeScript configuration
- AWS CLI: Configured with credentials (
aws configure) - Node.js: 18+ with npm
- Python: 3.11+ (for Lambda bundling)
- AWS Account: With Bedrock model access enabled
- Required models: Claude Haiku 4.5, Nova Micro, Cohere Embed English v3
- Enable in AWS Console → Bedrock → Model access
# Install dependencies
npm install
# Build TypeScript
npm run build
# Deploy all stacks (first time)
npx cdk bootstrap # Only needed once per account/region
npx cdk deploy --all
# Deploy takes ~5-10 minutes
# Outputs will include WebSocket URL, API URL, and resource IDs# Deploy in order (respects dependencies)
npx cdk deploy DndBuddyAuthStack
npx cdk deploy DndBuddyStorageStack
npx cdk deploy DndBuddyApiStack
npx cdk deploy DndBuddyWebSocketStack
# Or deploy just what changed
npx cdk deploy DndBuddyWebSocketStackModel Selection - Edit cdk/lib/dnd-buddy-stack.ts:
const bedrockModelId = 'anthropic.claude-haiku-4-5-20251001-v1:0'; // Creative model
const bedrockModelIdTool = 'amazon.nova-micro-v1:0'; // Planning model
const embeddingModelId = 'cohere.embed-english-v3'; // EmbeddingsAgent Parameters - Set in WebSocketStack environment variables:
BEDROCK_MODEL_ID: Creative model for final responses (prefixed witheu.)BEDROCK_MODEL_ID_TOOL: Planning model for tool execution (prefixed witheu.)EMBEDDING_MODEL_ID: Model for vector embeddingsMAX_AGENT_ITERATIONS: Tool execution loop limit (default: 3)MAX_HISTORY_TOKENS: Conversation history token limit (default: 10000)
All environment variables are automatically set by CDK during deployment.
After deployment, CDK outputs important values:
AuthStack:
UserPoolId: For frontend Amplify configurationUserPoolClientId: For frontend Amplify configurationUserPoolArn: For IAM policies
StorageStack:
CampaignFilesBucketName: S3 bucket for user filesVectorBucketName: S3 Vectors bucketVectorIndexName: Index name for semantic searchChatHistoryTableName: DynamoDB table nameWebSocketConnectionTableName: DynamoDB table name
ApiStack:
ApiUrl: REST API endpointIndexingLambdaArn: ARN for indexing functionSessionsLambdaArn: ARN for sessions function
WebSocketStack:
WebSocketUrl: WebSocket endpoint for frontendWebSocketApiId: API Gateway IDAgentLambdaArn: ARN for agent functionConnectionLambdaArn: ARN for connection handler
Save these outputs for frontend configuration.
# Run CDK tests
npm test
# Watch mode for TypeScript changes
npm run watch
# Synthesize CloudFormation templates (no deployment)
npx cdk synth
# Show differences between deployed and local
npx cdk diff
# List all stacks
npx cdk listThe agent can be tested locally with proper AWS credentials:
cd lambdas/dnd-buddy-agent
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Set environment variables
export VECTOR_BUCKET_NAME="dnd-vec-{your-account-id}"
export VECTOR_INDEX_NAME="campaign-vectors-index"
export CAMPAIGN_FILES_BUCKET="dnd-buddy-files-{your-account-id}"
export CHAT_HISTORY_TABLE_NAME="dnd-buddy-chat-history"
export BEDROCK_MODEL_ID="eu.anthropic.claude-haiku-4-5-20251001-v1:0"
export BEDROCK_MODEL_ID_TOOL="eu.amazon.nova-micro-v1:0"
export EMBEDDING_MODEL_ID="cohere.embed-english-v3"
# Test agent (requires deployed infrastructure)
python -c "from agent import main; print(main({'userId': 'test', 'campaign': 'test', 'prompt': 'Hello'}))"Note: Local testing requires deployed AWS resources (S3 Vectors, DynamoDB, etc.) and proper IAM permissions.
For rapid iteration on Lambda code without full CDK deployment:
# Update agent Lambda code only
cd lambdas/dnd-buddy-agent
zip -r function.zip .
aws lambda update-function-code \
--function-name dnd-buddy-agent \
--zip-file fileb://function.zip
# Update indexing Lambda
cd ../indexing
zip -r function.zip .
aws lambda update-function-code \
--function-name dnd-buddy-indexing \
--zip-file fileb://function.zipCloudWatch Logs:
# Tail agent logs
aws logs tail /aws/lambda/dnd-buddy-agent --follow
# Tail WebSocket connection logs
aws logs tail /aws/lambda/dnd-buddy-websocket-connection --follow
# Filter for errors
aws logs tail /aws/lambda/dnd-buddy-agent --follow --filter-pattern "ERROR"Lambda Insights:
- Enable in AWS Console → Lambda → Configuration → Monitoring
- Provides performance metrics and traces
X-Ray Tracing:
- Enable in Lambda configuration for distributed tracing
- Useful for debugging tool execution flow
Dual-Model Architecture:
- Planning model (Nova Micro): ~$0.000035 per 1K input tokens
- Creative model (Claude Haiku): ~$0.00025 per 1K input tokens
- Agent may call planning model 3x, creative model 1x per request
- Estimated cost: $0.001-0.003 per conversation turn
Serverless Pricing:
- Lambda: Pay per invocation + compute time (ARM64 is 20% cheaper)
- API Gateway: $1 per million WebSocket messages
- DynamoDB: On-demand pricing (no idle costs)
- S3: $0.023 per GB/month for storage
- S3 Vectors: Query pricing based on vector dimensions
Cost Reduction Strategies:
- TTL on DynamoDB tables (7 days) for automatic cleanup
- Minimal conversation history loading (last 2 messages)
- ARM64 Lambda architecture (20% cost reduction)
- On-demand DynamoDB (no provisioned capacity waste)
- S3 lifecycle policies (optional, for old campaign files)
Estimated Monthly Cost:
- Light usage (100 conversations): ~$5-10
- Medium usage (1000 conversations): ~$30-50
- Heavy usage (10000 conversations): ~$200-300
Authentication & Authorization:
- Cognito JWT tokens required for all API calls
- WebSocket authorizer validates tokens before connection
- Session validation:
sessionIdmust start with{userId}- - No anonymous access to any resources
Data Isolation:
- S3 paths scoped to
{userId}/{campaign}/ - DynamoDB queries filtered by authenticated user
- Vector search automatically scoped to user's campaign
- No cross-user data leakage possible
Network Security:
- S3 buckets: Block all public access
- API Gateway: CORS configured for specific origins
- Lambda: VPC deployment optional (not required for this architecture)
- Encryption: S3-managed encryption (SSE-S3) for all buckets
IAM Least Privilege:
- Each Lambda has minimal required permissions
- No wildcard resource access
- Separate roles for each function
- No hardcoded credentials (uses IAM roles)
Compliance:
- Data retention: 7-day TTL on chat history
- User data deletion: Manual S3/DynamoDB cleanup required
- Audit logging: CloudWatch Logs with 7-day retention
- Secrets management: Use AWS Secrets Manager for API keys (if needed)
CloudWatch Metrics:
- Lambda invocations, duration, errors, throttles
- API Gateway request count, latency, 4xx/5xx errors
- DynamoDB read/write capacity, throttles
- S3 Vectors query latency
CloudWatch Logs:
- All Lambda functions: 7-day retention
- Structured logging with correlation IDs
- Agent logs include: tool calls, iterations, token counts
Alarms (Recommended):
# High error rate
aws cloudwatch put-metric-alarm \
--alarm-name dnd-buddy-agent-errors \
--metric-name Errors \
--namespace AWS/Lambda \
--statistic Sum \
--period 300 \
--threshold 10 \
--comparison-operator GreaterThanThreshold
# High latency
aws cloudwatch put-metric-alarm \
--alarm-name dnd-buddy-agent-latency \
--metric-name Duration \
--namespace AWS/Lambda \
--statistic Average \
--period 300 \
--threshold 30000 \
--comparison-operator GreaterThanThresholdAgent not responding:
- Check CloudWatch logs:
/aws/lambda/dnd-buddy-agent - Verify Bedrock model access enabled in AWS Console
- Check Lambda timeout (should be 5 minutes)
- Verify WebSocket connection is established
Vector search returning no results:
- Ensure files are indexed:
POST /indexwith file metadata - Check vector bucket and index exist in S3 Vectors
- Verify embedding model ID matches indexing and search
- Check CloudWatch logs for indexing errors
WebSocket connection fails:
- Verify JWT token is valid and not expired
- Check token is passed as query parameter:
?token=... - Review connection Lambda logs for authorization errors
- Ensure Cognito User Pool ID is correct
High costs:
- Review CloudWatch metrics for Lambda invocations
- Check if agent is hitting max iterations frequently
- Consider reducing
MAX_AGENT_ITERATIONSto 2 - Monitor Bedrock token usage in CloudWatch
DynamoDB throttling:
- On-demand mode should auto-scale
- Check for hot partition keys (unlikely with sessionId)
- Review access patterns in CloudWatch Contributor Insights
To delete all resources:
# Delete all stacks (in reverse order)
npx cdk destroy --all
# Confirm each stack deletion
# Note: S3 buckets with RETAIN policy must be manually deletedManual cleanup required:
- S3 buckets (if RETAIN policy is set)
- CloudWatch log groups (if retention is set to NEVER_EXPIRE)
- Cognito User Pool (if RETAIN policy is set)
When making changes:
- Follow the project philosophy: minimal and simple
- Update this README if architecture changes
- Test locally before deploying
- Use
cdk diffto review changes - Deploy to dev environment first (if available)