Skip to content

feat: verification modes + evidence fields + transport combo rejection (hosted parity)#18

Merged
govindkavaturi-art merged 2 commits intomainfrom
feat/verification-modes-and-evidence-parity
Apr 17, 2026
Merged

feat: verification modes + evidence fields + transport combo rejection (hosted parity)#18
govindkavaturi-art merged 2 commits intomainfrom
feat/verification-modes-and-evidence-parity

Conversation

@govindkavaturi-art
Copy link
Copy Markdown
Member

Summary

Ports the outcome-verification feature from the hosted monorepo into cueapi-core, and fixes the partially-ported /verify endpoint from PR #15 so it honors the {valid, reason} contract documented by the hosted API.

What changed

Schema (app/schemas/)

  • VerificationMode enum: none, require_external_id, require_result_url, require_artifacts, manual
  • VerificationPolicy sub-object with a single mode field today (leaves room for future fields without breaking the shape)
  • verification: Optional[VerificationPolicy] on CueCreate and CueUpdate
  • OutcomeRequest extended with optional external_id, result_url, result_ref, result_type, summary, artifacts. Legacy shape ({success, result, error, metadata}) is unchanged

Model (app/models/cue.py)

Migration

  • 017_add_verification_mode.py — adds the column + CHECK constraint. Applies cleanly on a blank DB (verified locally). Downgrade drops the constraint first, then the column

Services

  • outcome_service.record_outcome computes outcome_state from (success, verification_mode, evidence):
success mode evidence outcome_state
false any reported_failure
true none/NULL reported_success
true manual verification_pending
true require_external_id present verified_success
true require_external_id missing verification_failed
true require_result_url present verified_success
true require_result_url missing verification_failed
true require_artifacts present verified_success
true require_artifacts missing verification_failed
  • cue_service._check_transport_verification_combo rejects worker transport paired with evidence-requiring modes at both create and update (see "Restriction" below)

Router

  • POST /v1/executions/{id}/verify now accepts {valid: bool, reason: str?} via a typed VerifyRequest body.
    • valid=true (default) → verified_success (legacy behavior preserved — empty body still works)
    • valid=falseverification_failed, reason recorded on evidence_summary (truncated to 500 chars, prepended to any existing summary)
    • Accepted starting states expanded to include reported_failure — this was rejected before but there was no semantic reason to
  • OutcomeResponse now surfaces outcome_state

Intentional behavior change

POST /v1/executions/{id}/verify with an explicit {valid: false} body now transitions to verification_failed instead of verified_success. Before this PR, the endpoint ignored the request body and always transitioned to verified_success — a silent-failure bug that made the valid=false branch impossible to exercise. Callers relying on the always-success behavior were getting broken semantics anyway. Empty-body requests remain verified_success (the previous default).

Restriction

Worker-transport cues cannot combine with require_external_id / require_result_url / require_artifacts. Attempting to do so at create or PATCH time returns:

{
  "error": {
    "code": "unsupported_verification_for_transport",
    "transport": "worker",
    "verification_mode": "require_external_id",
    "supported_worker_modes": ["none", "manual"]
  }
}

This is because cueapi-worker < 0.3.0 has no mechanism to attach evidence to the outcome POST. The restriction will be lifted in a follow-up PR once cueapi-worker 0.3.0 (evidence reporting via CUEAPI_OUTCOME_FILE) is published to PyPI.

Tests

35 new tests across four files:

  • tests/test_verification_modes.py — 10 tests, 5 modes × (satisfied, unsatisfied / applicable variants)
  • tests/test_transport_verification_combo.py — 13 tests: 3 evidence modes rejected × (create, PATCH) + 2 worker-compatible modes accepted + 5 webhook-always-allowed modes + 3 PATCH transitions
  • tests/test_outcome_evidence.py — 4 tests: inline evidence persists, legacy shape still works, Pydantic length caps enforced, PATCH evidence still works
  • tests/test_verify_endpoints.py — 8 tests covering both branches of valid, empty body default, reason-preserves-existing-summary, invalid-state rejections, and /verification-pending

Test-suite delta

  • +35 new passing tests, 0 new failures
  • Amended one existing test: test_execution_parity.py::TestVerify::test_verify_wrong_state now uses a pre-outcome state (since reported_failure is now valid starting state)
  • Pre-existing failures in test_sdk_integration.py (7) — ModuleNotFoundError: No module named 'cueapi'. Confirmed on clean origin/main (stashed this PR's changes and re-ran). These are environment-dependent tests that expect the Python SDK to be installed; CI handles that

Backward compatibility

  • POST /outcome without evidence fields → identical behavior to before
  • POST /verify with empty body → identical behavior to before (verified_success)
  • Cues without a verification field → verification_mode = NULL → outcome-state engine treats as none → same reported_success / reported_failure semantics as before
  • PATCH /v1/executions/{id}/evidence → untouched, still accepts the two-step flow

References

  • Private monorepo sources: app/schemas/cue.py (VerificationMode/VerificationConfig), app/schemas/outcome.py (evidence fields), app/services/outcome_service.py (rule engine), app/services/cue_service.py (_check_transport_verification_combo), app/routers/executions.py (/verify body contract)
  • Audit context: this PR addresses the ABSENT/PARTIAL items 1, 2, 3, 4, 5, 6, 8 from the cueapi-core drift re-audit. Items 7, 9–14 are out of scope (alerts + sync-discipline land in follow-up PRs)

Test plan

  • 35 new tests pass locally (pytest tests/test_verification_modes.py tests/test_transport_verification_combo.py tests/test_outcome_evidence.py tests/test_verify_endpoints.py)
  • Full pytest tests/ — no new failures (SDK-integration failures pre-exist)
  • Migration 017 applies cleanly on a blank DB (alembic upgrade head from empty schema)
  • Column + CHECK constraint verified in Postgres (\d cues)

🤖 Generated with Claude Code

Ports the outcome-verification feature from the hosted monorepo into
cueapi-core and fixes the partial /verify endpoint that PR #15 left
behind.

Schema:
- VerificationMode enum (none, require_external_id, require_result_url,
  require_artifacts, manual) + VerificationPolicy on CueCreate/CueUpdate.
- OutcomeRequest accepts evidence fields inline (external_id, result_url,
  result_ref, result_type, summary, artifacts). Legacy shape still works.

Model:
- Migration 017: verification_mode column on cues (String(50), nullable,
  CHECK-constrained enum). NULL == 'none'. evidence_* columns already
  existed from PR #15 and are reused.

Services:
- outcome_service computes outcome_state from (success, mode, evidence).
  Missing required evidence -> verification_failed. Manual mode parks in
  verification_pending. Failure bypasses verification entirely.
- cue_service _check_transport_verification_combo rejects worker+evidence
  at create and update. Lifted in a follow-up PR once cueapi-worker 0.3.0
  lands on PyPI.

Router:
- POST /v1/executions/{id}/verify now accepts {valid: bool, reason: str?}.
  valid=true preserves legacy behavior; valid=false -> verification_failed
  with reason recorded on evidence_summary. Accepted starting states
  expanded to include reported_failure. Empty body defaults to valid=true
  for full backward compat.

Tests:
- 35 new tests across 4 files (verification_modes, transport_verification_combo,
  outcome_evidence, verify_endpoints).
- Amended test_execution_parity.py::test_verify_wrong_state to use a
  pre-outcome state (reported_failure is now a valid starting state).
- Full-suite delta: +35 passing, 0 new failures. Pre-existing SDK-integration
  failures (cueapi Python package not installed locally) unchanged.

Alert firing for verification_failed deferred to PR 2 (alerts feature).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@govindkavaturi-art govindkavaturi-art enabled auto-merge (squash) April 17, 2026 01:32
govindkavaturi-art pushed a commit that referenced this pull request Apr 17, 2026
Ports the alerts feature to OSS. Deliberately excludes SendGrid/email
— self-hosters configure alert_webhook_url and forward to their own
Slack/Discord/ntfy/SMTP relay. Hosted cueapi.ai keeps managed email.

Model + migrations:
- app/models/alert.py: id/user_id/cue_id/execution_id/alert_type/
  severity/message/alert_metadata (column 'metadata')/acknowledged/
  created_at. CHECK on alert_type IN ('outcome_timeout',
  'verification_failed', 'consecutive_failures'). CHECK on severity.
  Indexes: user_id, (user_id, created_at), execution_id.
- alembic 018: alerts table.
- alembic 019: users.alert_webhook_url (String 2048) +
  alert_webhook_secret (String 64), both nullable.
- 018.down_revision = '016' intentionally — PR #18 introduces 017 but
  isn't merged yet. When PR #18 merges first, rebase this PR to chain
  017 -> 018. Documented in the migration docstring.

Services:
- app/services/alert_service.py: create_alert with 5-min dedup on
  (user_id, alert_type, execution_id|cue_id). count_consecutive_failures
  walks execution history backwards, stops at first non-failed.
  Threshold = 3. Webhook delivery is fire-and-forget via
  asyncio.create_task.
- app/services/alert_webhook.py: deliver_alert with HMAC-SHA256 over
  '{timestamp}.{sorted_payload_json}', 10s timeout, SSRF re-resolve at
  delivery, never raises. No-URL short-circuits silently. URL-without-
  secret logs a warning and skips.

Router + auth:
- app/routers/alerts.py: GET /v1/alerts with alert_type/since/limit/
  offset filters, 400 on invalid type, auth-scoped.
- app/routers/auth_routes.py: PATCH /me accepts alert_webhook_url
  (empty string clears; SSRF-validated). GET /alert-webhook-secret
  lazy-generates on first call. POST /alert-webhook-secret/regenerate
  requires X-Confirm-Destructive.

Integration into outcome_service.record_outcome (post-commit):
- verification_failed alert fires when execution.outcome_state ==
  'verification_failed'. Dormant on current main (the rule engine that
  sets this state lives in PR #18); activates automatically once #18
  merges. No rebase of integration code required — only the migration
  chain needs updating.
- consecutive_failures alert fires when the streak reaches 3 on a
  failed outcome. Independent of PR #18 — works on current main.
- outcome_timeout alert firing deferred — requires a deadline-checking
  poller that cueapi-core doesn't have yet. CHECK constraint and
  router already accept the type so the wiring is drop-in when that
  poller lands.
- Alert firing is wrapped in try/except — must never break outcome
  reporting.

Tests (36 new, all passing):
- test_alert_model.py (6): CRUD, CHECK rejection for invalid
  type/severity, parametrized valid types, index existence.
- test_alert_service.py (7): create persists, dedup within window,
  dedup doesn't cross alert types, consecutive_failures counter +
  streak-breaking + threshold constant.
- test_alert_webhook_delivery.py (7): no-URL short-circuit, URL-
  without-secret skip, SSRF block, HMAC signature recomputation,
  timeout/non-2xx/RuntimeError all swallowed.
- test_alerts_api.py (8): empty list, own alerts, type filter, invalid
  type rejected, pagination, cross-user scoping, auth required.
- test_alert_webhook_config.py (6): set valid URL, empty string clears,
  SSRF rejection at config, lazy secret generation, confirmation
  required, rotation.
- test_outcome_triggers_alert.py (3): verification_failed end-to-end
  (seeds outcome_state to exercise the integration path), consecutive
  failures end-to-end, isolated failure does NOT fire.

Full-suite delta: +36 passing, 0 new failures. Pre-existing SDK-
integration failures (cueapi Python package not installed locally)
unchanged.

Docs:
- README 'Alerts' section with alert types, querying, webhook setup.
- examples/alert_webhook_receiver.py: 30-line Flask receiver with
  signature verification.
- CHANGELOG [Unreleased] entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@argus-qa-ai argus-qa-ai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All CI checks passing. Approved by Argus.

@govindkavaturi-art govindkavaturi-art merged commit 498c301 into main Apr 17, 2026
3 checks passed
govindkavaturi-art pushed a commit that referenced this pull request Apr 17, 2026
PR #18 (verification modes) landed first, so record_outcome now
unconditionally overwrites execution.outcome_state from the rule
engine: (success, verification_mode, evidence). Pre-seeding
outcome_state='verification_failed' before calling the endpoint
was a pre-#18 strategy — the seed gets overwritten to
'reported_success' when the test sends success=True on a cue
with no verification_mode.

Fix: configure the cue with verification_mode='require_external_id'
and report success=True without external_id. The rule engine
naturally lands in verification_failed, which triggers the alert
hook. This is the real production path users hit.

No behavior change in the alert hook. Test fixture _cue() now
accepts verification_mode kwarg.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
govindkavaturi-art added a commit that referenced this pull request Apr 17, 2026
* feat: alerts with webhook delivery + outcome integration

Ports the alerts feature to OSS. Deliberately excludes SendGrid/email
— self-hosters configure alert_webhook_url and forward to their own
Slack/Discord/ntfy/SMTP relay. Hosted cueapi.ai keeps managed email.

Model + migrations:
- app/models/alert.py: id/user_id/cue_id/execution_id/alert_type/
  severity/message/alert_metadata (column 'metadata')/acknowledged/
  created_at. CHECK on alert_type IN ('outcome_timeout',
  'verification_failed', 'consecutive_failures'). CHECK on severity.
  Indexes: user_id, (user_id, created_at), execution_id.
- alembic 018: alerts table.
- alembic 019: users.alert_webhook_url (String 2048) +
  alert_webhook_secret (String 64), both nullable.
- 018.down_revision = '016' intentionally — PR #18 introduces 017 but
  isn't merged yet. When PR #18 merges first, rebase this PR to chain
  017 -> 018. Documented in the migration docstring.

Services:
- app/services/alert_service.py: create_alert with 5-min dedup on
  (user_id, alert_type, execution_id|cue_id). count_consecutive_failures
  walks execution history backwards, stops at first non-failed.
  Threshold = 3. Webhook delivery is fire-and-forget via
  asyncio.create_task.
- app/services/alert_webhook.py: deliver_alert with HMAC-SHA256 over
  '{timestamp}.{sorted_payload_json}', 10s timeout, SSRF re-resolve at
  delivery, never raises. No-URL short-circuits silently. URL-without-
  secret logs a warning and skips.

Router + auth:
- app/routers/alerts.py: GET /v1/alerts with alert_type/since/limit/
  offset filters, 400 on invalid type, auth-scoped.
- app/routers/auth_routes.py: PATCH /me accepts alert_webhook_url
  (empty string clears; SSRF-validated). GET /alert-webhook-secret
  lazy-generates on first call. POST /alert-webhook-secret/regenerate
  requires X-Confirm-Destructive.

Integration into outcome_service.record_outcome (post-commit):
- verification_failed alert fires when execution.outcome_state ==
  'verification_failed'. Dormant on current main (the rule engine that
  sets this state lives in PR #18); activates automatically once #18
  merges. No rebase of integration code required — only the migration
  chain needs updating.
- consecutive_failures alert fires when the streak reaches 3 on a
  failed outcome. Independent of PR #18 — works on current main.
- outcome_timeout alert firing deferred — requires a deadline-checking
  poller that cueapi-core doesn't have yet. CHECK constraint and
  router already accept the type so the wiring is drop-in when that
  poller lands.
- Alert firing is wrapped in try/except — must never break outcome
  reporting.

Tests (36 new, all passing):
- test_alert_model.py (6): CRUD, CHECK rejection for invalid
  type/severity, parametrized valid types, index existence.
- test_alert_service.py (7): create persists, dedup within window,
  dedup doesn't cross alert types, consecutive_failures counter +
  streak-breaking + threshold constant.
- test_alert_webhook_delivery.py (7): no-URL short-circuit, URL-
  without-secret skip, SSRF block, HMAC signature recomputation,
  timeout/non-2xx/RuntimeError all swallowed.
- test_alerts_api.py (8): empty list, own alerts, type filter, invalid
  type rejected, pagination, cross-user scoping, auth required.
- test_alert_webhook_config.py (6): set valid URL, empty string clears,
  SSRF rejection at config, lazy secret generation, confirmation
  required, rotation.
- test_outcome_triggers_alert.py (3): verification_failed end-to-end
  (seeds outcome_state to exercise the integration path), consecutive
  failures end-to-end, isolated failure does NOT fire.

Full-suite delta: +36 passing, 0 new failures. Pre-existing SDK-
integration failures (cueapi Python package not installed locally)
unchanged.

Docs:
- README 'Alerts' section with alert types, querying, webhook setup.
- examples/alert_webhook_receiver.py: 30-line Flask receiver with
  signature verification.
- CHANGELOG [Unreleased] entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(alerts): trigger verification_failed via rule engine, not pre-seed

PR #18 (verification modes) landed first, so record_outcome now
unconditionally overwrites execution.outcome_state from the rule
engine: (success, verification_mode, evidence). Pre-seeding
outcome_state='verification_failed' before calling the endpoint
was a pre-#18 strategy — the seed gets overwritten to
'reported_success' when the test sends success=True on a cue
with no verification_mode.

Fix: configure the cue with verification_mode='require_external_id'
and report success=True without external_id. The rule engine
naturally lands in verification_failed, which triggers the alert
hook. This is the real production path users hit.

No behavior change in the alert hook. Test fixture _cue() now
accepts verification_mode kwarg.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Gk <gk@Gks-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
govindkavaturi-art pushed a commit that referenced this pull request Apr 17, 2026
cueapi-worker 0.3.0 (released 2026-04-17 to PyPI) closes the worker-
side evidence gap via CUEAPI_OUTCOME_FILE. The daemon reads the
handler's per-run temp file after exit and merges the evidence
fields into its outcome POST. All five verification modes now work
on both transports.

Changes:
- app/services/cue_service.py: remove _check_transport_verification_combo
  and the two calls in create_cue + update_cue. Replace with
  info-level logging when a worker cue is configured with an
  evidence-requiring mode (breadcrumb for operators still running
  older cueapi-worker).
- tests/test_transport_verification_combo.py: flip expected 400 → 201
  on create, 400 → 200 on PATCH. Header comment documents the
  history. Two test classes renamed from WorkerEvidenceRejected* to
  WorkerEvidenceAccepted* / PatchTransitions::test_patch_worker_to_evidence_mode_accepted.
- README.md: update transport-compatibility footnote to reflect the
  new accept-everything reality, with an upgrade hint for users on
  cueapi-worker < 0.3.0.
- CHANGELOG: replace the "Restricted" entry (worker+evidence
  rejection) with a "Removed" entry describing the lift.

Tests: 13/13 pass locally on the updated combo suite.

Preconditions met:
- cueapi-worker 0.3.0 published to PyPI (2026-04-17 22:04:39 UTC)
- cueapi-core #18 merged to main (verification_mode column
  + rule engine in place to read verification_mode and produce
  outcome_state transitions)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants