feat: alerts with webhook delivery + outcome integration#20
Open
govindkavaturi-art wants to merge 1 commit intomainfrom
Open
feat: alerts with webhook delivery + outcome integration#20govindkavaturi-art wants to merge 1 commit intomainfrom
govindkavaturi-art wants to merge 1 commit intomainfrom
Conversation
Ports the alerts feature to OSS. Deliberately excludes SendGrid/email
— self-hosters configure alert_webhook_url and forward to their own
Slack/Discord/ntfy/SMTP relay. Hosted cueapi.ai keeps managed email.
Model + migrations:
- app/models/alert.py: id/user_id/cue_id/execution_id/alert_type/
severity/message/alert_metadata (column 'metadata')/acknowledged/
created_at. CHECK on alert_type IN ('outcome_timeout',
'verification_failed', 'consecutive_failures'). CHECK on severity.
Indexes: user_id, (user_id, created_at), execution_id.
- alembic 018: alerts table.
- alembic 019: users.alert_webhook_url (String 2048) +
alert_webhook_secret (String 64), both nullable.
- 018.down_revision = '016' intentionally — PR #18 introduces 017 but
isn't merged yet. When PR #18 merges first, rebase this PR to chain
017 -> 018. Documented in the migration docstring.
Services:
- app/services/alert_service.py: create_alert with 5-min dedup on
(user_id, alert_type, execution_id|cue_id). count_consecutive_failures
walks execution history backwards, stops at first non-failed.
Threshold = 3. Webhook delivery is fire-and-forget via
asyncio.create_task.
- app/services/alert_webhook.py: deliver_alert with HMAC-SHA256 over
'{timestamp}.{sorted_payload_json}', 10s timeout, SSRF re-resolve at
delivery, never raises. No-URL short-circuits silently. URL-without-
secret logs a warning and skips.
Router + auth:
- app/routers/alerts.py: GET /v1/alerts with alert_type/since/limit/
offset filters, 400 on invalid type, auth-scoped.
- app/routers/auth_routes.py: PATCH /me accepts alert_webhook_url
(empty string clears; SSRF-validated). GET /alert-webhook-secret
lazy-generates on first call. POST /alert-webhook-secret/regenerate
requires X-Confirm-Destructive.
Integration into outcome_service.record_outcome (post-commit):
- verification_failed alert fires when execution.outcome_state ==
'verification_failed'. Dormant on current main (the rule engine that
sets this state lives in PR #18); activates automatically once #18
merges. No rebase of integration code required — only the migration
chain needs updating.
- consecutive_failures alert fires when the streak reaches 3 on a
failed outcome. Independent of PR #18 — works on current main.
- outcome_timeout alert firing deferred — requires a deadline-checking
poller that cueapi-core doesn't have yet. CHECK constraint and
router already accept the type so the wiring is drop-in when that
poller lands.
- Alert firing is wrapped in try/except — must never break outcome
reporting.
Tests (36 new, all passing):
- test_alert_model.py (6): CRUD, CHECK rejection for invalid
type/severity, parametrized valid types, index existence.
- test_alert_service.py (7): create persists, dedup within window,
dedup doesn't cross alert types, consecutive_failures counter +
streak-breaking + threshold constant.
- test_alert_webhook_delivery.py (7): no-URL short-circuit, URL-
without-secret skip, SSRF block, HMAC signature recomputation,
timeout/non-2xx/RuntimeError all swallowed.
- test_alerts_api.py (8): empty list, own alerts, type filter, invalid
type rejected, pagination, cross-user scoping, auth required.
- test_alert_webhook_config.py (6): set valid URL, empty string clears,
SSRF rejection at config, lazy secret generation, confirmation
required, rotation.
- test_outcome_triggers_alert.py (3): verification_failed end-to-end
(seeds outcome_state to exercise the integration path), consecutive
failures end-to-end, isolated failure does NOT fire.
Full-suite delta: +36 passing, 0 new failures. Pre-existing SDK-
integration failures (cueapi Python package not installed locally)
unchanged.
Docs:
- README 'Alerts' section with alert types, querying, webhook setup.
- examples/alert_webhook_receiver.py: 30-line Flask receiver with
signature verification.
- CHANGELOG [Unreleased] entry.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
argus-qa-ai
approved these changes
Apr 17, 2026
Collaborator
argus-qa-ai
left a comment
There was a problem hiding this comment.
All CI checks passing. Approved by Argus.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ports the alerts feature to OSS and ships a new webhook-based delivery path. Deliberately excludes SendGrid/email — OSS self-hosters configure their own
alert_webhook_urland forward to Slack / Discord / ntfy / SMTP relay / whatever they run. Hosted cueapi.ai retains managed email delivery via SendGrid (see HOSTED_ONLY.md).What's new
Alert model + migrations
app/models/alert.py— id, user_id, cue_id (nullable), execution_id (nullable), alert_type, severity, message,alert_metadata(DB columnmetadata), acknowledged, created_atalert_type IN ('outcome_timeout', 'verification_failed', 'consecutive_failures')severity IN ('info', 'warning', 'critical')user_id,(user_id, created_at),execution_idalert_webhook_url(String 2048) +alert_webhook_secret(String 64) to users.Services
alert_service.create_alert— persists the row, then fire-and-forget schedulesdeliver_alert. Dedup window 5 minutes on(user_id, alert_type, execution_id|cue_id)so flapping executions don't flood user inboxes.alert_service.count_consecutive_failures— walks execution history backwards, stops at the first non-failed row. Threshold3.alert_webhook.deliver_alert— HMAC-SHA256 signing over{timestamp}.{sorted_payload_json}(same scheme as the existing webhook signer), 10s timeout, SSRF re-resolve at delivery time (DNS rebind protection), never raises. Best-effort — a user's slow endpoint must not block outcome reporting.Headers
Alert webhook POSTs carry:
X-CueAPI-Signature: v1=<hex>X-CueAPI-Timestamp: <unix>X-CueAPI-Alert-Id: <uuid>X-CueAPI-Alert-Type: <type>User-Agent: CueAPI/1.0Endpoints
GET /v1/alerts— list withalert_type/since/limit/offsetfilters,400 invalid_filterfor unknown types, auth-scopedPATCH /v1/auth/me— acceptsalert_webhook_url; empty string clears; SSRF-validated at set time (400 invalid_alert_webhook_url)GET /v1/auth/alert-webhook-secret— lazily generates a 64-char hex secret on first call; returns same value on subsequent callsPOST /v1/auth/alert-webhook-secret/regenerate— rotates; requiresX-Confirm-Destructive: trueOutcome integration (hooks wired into
record_outcome, post-commit)verification_failed— fires whenexecution.outcome_state == 'verification_failed'. This state is set by the rule engine from PR feat: verification modes + evidence fields + transport combo rejection (hosted parity) #18; on currentmain, the hook is dormant (no caller sets that state duringrecord_outcome). Once PR feat: verification modes + evidence fields + transport combo rejection (hosted parity) #18 merges, the hook activates automatically with no further code change — only the migration chain needs rebasing.consecutive_failures— on anysuccess=false, callscount_consecutive_failures. Fires if streak ≥ 3. Works on current main independently of PR feat: verification modes + evidence fields + transport combo rejection (hosted parity) #18.outcome_timeout— deferred. Requires a deadline-checking poller that cueapi-core doesn't have yet. The CHECK constraint and router accept the type already, so wiring is drop-in when that poller lands.All three branches are wrapped in
try/except— alert firing can never break outcome reporting.Merge-order dependencies
018.down_revision = "016". When PR feat: verification modes + evidence fields + transport combo rejection (hosted parity) #18 merges first (introducing 017), rebase this PR and change 018'sdown_revisionto"017". Documented in the migration docstring.verification_failedcode path is harmless on current main (guard returns false). Once PR feat: verification modes + evidence fields + transport combo rejection (hosted parity) #18 merges, it activates. No code change needed.parity-manifest.jsonwill gain three new entries (app/models/alert.py,app/services/alert_service.py,app/routers/alerts.py) once PR chore: document open-core model + parity manifest + CI warning #19 merges — a trivial follow-up.Tests — 36 new, all passing
test_alert_model.pytest_alert_service.pytest_alert_webhook_delivery.pytest_alerts_api.pytest_alert_webhook_config.pytest_outcome_triggers_alert.pyFull-suite delta: +36 passing, 0 new failures. Pre-existing 7 SDK-integration failures (
ModuleNotFoundError: cueapi) unchanged — environment-dependent, CI handles that.Migration chain applied cleanly on a blank DB:
016 → 018 → 019(alembic upgrade head).Notable decisions
alert_typeis added.alert_metadataPython attr,metadataDB column.metadatais reserved on SQLAlchemy's Base class. Mapping viaColumn("metadata", JSONB)matches private's layout so a future ORM sync is frictionless.DEDUP_WINDOW_SECONDS,CONSECUTIVE_FAILURE_THRESHOLD) — easy to flag for env-var-ification later.asyncio.create_task(not a queue). Keeps OSS dependency-free — no arq/celery/dramatiq for alert delivery. Failures log + return. For hosted scale, the SendGrid path bypasses this.GETpopulates; rotation invalidates immediately.GET /alert-webhook-secret. Rather than POST an unsigned payload, we log and skip — receivers can trust that any POST they get is signed.outcome_timeoutdeferred. No deadline poller in OSS core (thedocker-compose.ymlpoller handlesnext_run, notoutcome_deadline_at). Surfaced in the CHECK constraint and router already so the wiring is trivial when a deadline-checker lands.Documentation
examples/alert_webhook_receiver.py— 30-line Flask receiver demonstrating signature verification[Unreleased]entryHOSTED_ONLY.mdTest plan
pytest tests/— no new failures\d alertsapp.utils.signing.sign_payload(receivers use the same verification as regular webhook callers)🤖 Generated with Claude Code