feat(cloudflare-runtime): full package (W1+W2+W3 consolidated)#56
feat(cloudflare-runtime): full package (W1+W2+W3 consolidated)#56khaliqgant merged 6 commits intomainfrom
Conversation
Adds @agent-assistant/cloudflare-runtime — the CF-Workers runtime adapter for personas built on @agent-assistant/continuation. This single PR consolidates what the cf-runtime workflow bundle planned to land in three PRs (W1 ingress, W2 continuation adapters, W3 executor + DO) because the implementer agents built the full surface in one pass and all 35 tests pass cleanly. Public surface: - wrapCloudflareWorker — webhook ingress + dedup + enqueue - handleCfQueue + TurnExecutorDO — queue consumer + per-conversation DO - createFakeExecutionContext — collects waitUntil and awaits before ack (the direct fix for the production Slack-silence bug) - createCfContinuationStore — DO storage primary + KV trigger index - createCfContinuationScheduler — DO alarm + queue delayed delivery - createCfDeliveryAdapter — Slack/GitHub/a2a-callback post-back - createCfSpecialistClient — async sage to specialist bridge - verifySlackSignature / verifyGitHubSignature helpers Test evidence: - npx tsc --noEmit clean - npx vitest run: 35/35 across 10 files See AgentWorkforce/cloud workflows/cf-runtime/SPEC.md for the contract.
…ce hatches Adds CfLogger interface + consoleJsonLogger default + nullLogger and createCapturingLogger helpers, then wires the logger through wrapCloudflareWorker and handleCfQueue. Every cf-runtime entry point now emits structured JSON events that render cleanly in 'wrangler tail --format json'. Hatches for debugging: - Inject a custom CfLogger per persona (ship to Workers Analytics Engine, Datadog, etc) by passing logger: ... into wrapCloudflareWorker / handleCfQueue - child() bindings auto-correlate by turnId / conversationId / component - resolveTurnId on the executor adds a stable turnId to every log line for greppable per-turn lifecycle tracing across ingress + executor + delivery - nullLogger silences in tests; createCapturingLogger collects records for assertions Direct response to the sage debuggability gap: previously the only signal for a stuck turn was the raw waitUntil-cancelled warning. Now every step (webhook received, parse result, dedup decision, enqueue, dispatch start, dispatch complete, dispatch failed) has a structured event. Tests: 42/42 across 11 files (was 35/10). New observability.test.ts covers logger primitives + ingress wiring + executor wiring + dispatch failure path.
Devin + Codex review (cd1985c, 8b8584b) flagged 6 issues. Each is fixed with a focused regression test that locks the corrected behavior in. 1. Missing spec for new package (CONTRIBUTING.md violation) - Add docs/specs/v1-cloudflare-runtime-spec.md mirroring v1-webhook-runtime-spec.md - README.md no longer references a nonexistent SPEC.md; points at the new spec instead. 2. user_reply trigger index key mismatch (Devin red) - continuationTriggerIndexKey emitted a key from waitFor.correlationKey; resumeTriggerIndexKey synthesized a key from message.id. They never matched. - Both functions now return undefined for user_reply (the upstream trigger shape carries no symmetric correlation field). - findByTrigger short-circuits to null when the key is undefined. - Spec section 5 documents the limitation: callers needing user_reply correlation must use listBySession or extend the trigger upstream. 3. Delivery adapter reported delivered:true when handler was missing (P1, Devin red) - Each branch now returns {delivered:false, failureReason:no_<kind>_handler} when the handler is undefined, instead of silently no-oping. 4. Delivery adapter returned undefined for unknown target kinds (Devin yellow) - Default branch returns {delivered:false, failureReason:unsupported_target_kind:X}. 5. Nango fell through to Slack-shaped dedup extraction (Devin yellow) - getProviderDedupKey for nango now uses result.dedupKey directly. - Persona's parse() supplies the dedup key from a nango-appropriate field (e.g. delivery_id); runtime no longer silently skips dedup. 6. Stale trigger index on update (Codex P2) - put() reads the prior record, computes its trigger key, and deletes the old KV index entry when waitFor changes — preventing findByTrigger from resolving an obsolete record. - Same symmetry fix on scheduled_wake: only emits a key when wakeUpId is set on both sides. Tests: 53/53 across 11 files (was 42).
8b8584b to
bc52d50
Compare
| if (record.sessionId) { | ||
| await this.storage.put(sessionKey(record.sessionId, record.id), record.id); | ||
| } |
There was a problem hiding this comment.
🟡 CfContinuationStore.put() does not clean up stale session index entries when sessionId changes
The put method correctly detects and cleans up stale trigger index entries when waitFor changes (lines 42-49), but does not apply the same cleanup logic for session index entries. When a record's sessionId is updated or removed, the old session:{oldSessionId}:{recordId} key remains in DO storage. This causes listBySession(oldSessionId) to return the record even though it no longer belongs to that session.
Comparison with trigger index cleanup and delete()
The trigger index cleanup pattern at cf-continuation-store.ts:42-49 shows the correct approach: read the prior record, compute prior and new keys, delete the prior key if it changed, then write the new key. The delete() method at cf-continuation-store.ts:64-66 also correctly removes the session index entry. But put() only writes the new session index entry (lines 38-40) without deleting the old one when it differs.
The spec states (Section 5): "When put updates an existing record whose trigger key has changed, the store deletes the prior trigger index entry before writing the new one, so stale keys never resolve to the wrong record." The session index should have the same guarantee since listBySession is the recommended lookup path for user_reply triggers per the spec.
| if (record.sessionId) { | |
| await this.storage.put(sessionKey(record.sessionId, record.id), record.id); | |
| } | |
| if (record.sessionId) { | |
| await this.storage.put(sessionKey(record.sessionId, record.id), record.id); | |
| } | |
| // Clean up stale session index entry when sessionId changes, mirroring | |
| // the trigger index cleanup below. | |
| const priorSessionId = prior?.sessionId; | |
| if (priorSessionId && priorSessionId !== record.sessionId) { | |
| await this.storage.delete(sessionKey(priorSessionId, record.id)); | |
| } |
Was this helpful? React with 👍 or 👎 to provide feedback.
# Conflicts: # package-lock.json
Summary
Adds
@agent-assistant/cloudflare-runtime— the CF-Workers runtime adapter for personas built on@agent-assistant/continuation.This single PR consolidates W1+W2+W3 of the cf-runtime workflow bundle (originally planned as three sequential PRs). The implementer agents built the full surface in one pass and all 35 tests pass cleanly, so splitting after the fact would just add review overhead.
Public surface
wrapCloudflareWorker— webhook ingress + dedup (via @agent-assistant/surfaces SlackEventDedupGate) + enqueuehandleCfQueue+TurnExecutorDO— queue consumer + per-conversation Durable ObjectcreateFakeExecutionContext— collectswaitUntiland awaits before ack — this is the direct fix for the production Slack-silence bugcreateCfContinuationStore— DO storage primary + KV trigger indexcreateCfContinuationScheduler— DO alarm + queue delayed deliverycreateCfDeliveryAdapter— Slack / GitHub / a2a-callback post-backcreateCfSpecialistClient— async sage to specialist bridgeverifySlackSignature/verifyGitHubSignaturehelpersProduction fix this enables
Sage currently dispatches Slack turn work via
ctx.waitUntil(...), which CF cancels ~30s after HTTP 200. Long turns (Notion lookups, multi-tool specialist routing) are silently dropped — Slack metrics show 100% success while users get no reply.Once cloud's sage-worker migrates onto this package (cloud W6, blocked on this + sage 1.5.0 publish), turns run inside a queue consumer with the full 15-min wall budget, and the fake-ctx pattern catches anything internal that still uses
waitUntil.SPEC invariants enforced
SlackEventDedupGateTest evidence
npx tsc --noEmitcleannpx vitest run: 35 tests across 10 files, all passingOrder
Independent of W4 (
@agent-assistant/webhook-runtimepatch, PR #55). Both should land before publishing the runtime to npm. Sage repo PR (https://github.com/AgentWorkforce/sage/pull/122) lands independently and publishes 1.5.0 separately.Reference
Architecture and SPEC:
workflows/cf-runtime/SPEC.mdandworkflows/cf-runtime/ARCHITECTURE.mdin AgentWorkforce/cloud.