Summary
When multiple SSE streams exist for the same session (e.g. from POST response reconnections), LocalSessionWorker::resume() unconditionally replaces self.common.tx, killing the other stream's receiver. Both EventSource connections then reconnect every sse_retry seconds, leapfrogging each other in an infinite loop that floods the server with GET requests.
Affected versions: rmcp 0.14.0, 0.15.0
Severity: Critical — causes infinite reconnect loops with clients like Cursor, and breaks server-to-client notifications over Streamable HTTP
Root Cause
The MCP Streamable HTTP transport sends POST SSE responses with a priming event containing retry: 3000. When the POST stream ends (after delivering the response), the browser's EventSource API automatically reconnects via GET. This creates multiple competing EventSource connections:
- The initial standalone GET stream (primary notification channel)
- Reconnecting GETs from completed POST responses (initialize, tools/list, etc.)
Each reconnecting GET calls resume() which unconditionally replaces self.common.tx:
// Before fix — local.rs resume()
None => {
let (tx, rx) = tokio::sync::mpsc::channel(self.session_config.channel_capacity);
self.common.tx = tx; // ← Unconditionally replaces sender, kills other stream
// ...
}
Dropping the old sender closes the old receiver, terminating the OTHER EventSource's stream. That stream then reconnects, replacing the sender again. Both leapfrog every sse_retry (3s) indefinitely.
The Leapfrog Loop
1. Client POST initialize → SSE response with priming (retry: 3000) → stream ends
2. Client GET (standalone) → becomes primary common channel (tx1/rx1)
3. POST EventSource reconnects via GET (3s later) → replaces common.tx → kills rx1
4. GET from step 2 reconnects → replaces common.tx → kills stream from step 3
5. Repeat every 3 seconds indefinitely
Server logs confirm the pattern — alternating GET requests every 3 seconds with different Last-Event-ID values:
13:33:51.670 GET Last-Event-ID: 0/2 ← from completed POST response
13:33:54.668 GET Last-Event-ID: 0 ← from killed standalone stream
13:33:57.679 GET Last-Event-ID: 0/2 ← leapfrog
13:34:00.674 GET Last-Event-ID: 0 ← leapfrog
...
Additional Issue: Cache Replay Loop
Even without the leapfrog, resume() called sync() on the common channel to replay cached events. Replaying server-initiated list_changed notifications caused clients to re-process old signals, triggering unnecessary re-fetches every reconnection cycle.
What Happens in Practice
Cursor (infinite loop)
- Connects via POST initialize + GET standalone
- POST SSE stream ends → EventSource reconnects via GET
- Two competing streams leapfrog every 3 seconds
- Server flooded with GET requests indefinitely
- Notifications intermittently lost as channels are swapped
VS Code (silent notification loss)
- Reconnects SSE every ~5 minutes with same session ID
- Each reconnection replaces the channel sender
- Previous stream's receiver is orphaned
notify_tool_list_changed().await returns Ok(()) — silent failure
Fix: Shadow Channels
PR: #660
Instead of unconditionally replacing the common channel, check if the primary is still active:
- Primary dead (
tx.is_closed()) → Replace it. New stream becomes primary.
- Primary alive → Create a shadow stream — an idle SSE connection kept alive by SSE keep-alive pings that does NOT receive notifications and does NOT replace the primary channel.
fn resume_or_shadow_common(&mut self) -> Result<StreamableHttpMessageReceiver, SessionError> {
let (tx, rx) = tokio::sync::mpsc::channel(self.session_config.channel_capacity);
if self.common.tx.is_closed() {
// Primary is dead — replace it
self.common.tx = tx;
} else {
// Primary is alive — create shadow (idle, keep-alive only)
self.shadow_txs.push(tx);
}
Ok(StreamableHttpMessageReceiver { http_request_id: None, inner: rx })
}
Why Not 409 Conflict?
The initial approach (matching the TypeScript SDK) was to return 409 Conflict on duplicate standalone streams. However:
- The MCP spec states: "The client MAY remain connected to multiple SSE streams simultaneously" — 409 is not spec-compliant
- 409 causes Cursor to fail entirely on reconnection (500 errors from unhandled Conflict)
- The reconnecting EventSources are legitimate HTTP requests — they need a valid stream back
Shadow channels are the correct approach: keep all connections alive without interference.
Why No Cache Replay on Common Channel?
Common channel notifications (tools/list_changed, resources/list_changed) are idempotent signals. Replaying cached ones causes clients to re-process old events, triggering unnecessary re-fetches or infinite notification loops. Missing one is harmless — the next real event arrives naturally. Request-wise channels still use sync() for proper response replay.
Changes (5 commits)
| Commit |
Description |
8bd424e |
Initial 409 Conflict approach (returned error on duplicate standalone stream) |
0d03eb5 |
Handle resume with completed request-wise channels (fall through to common) |
a7bb822 |
Remove 409 Conflict — allow channel replacement per MCP spec |
7cf5406 |
Skip cache replay (sync) when replacing active streams |
a7df58c |
Shadow channels — the final fix that prevents the leapfrog loop |
Files Changed
crates/rmcp/src/transport/streamable_http_server/session/local.rs
- Added
shadow_txs: Vec<Sender<ServerSseMessage>> to LocalSessionWorker
- New method
resume_or_shadow_common() with primary-alive check
- Updated
resume() to use shadow logic for both direct common and request-wise fallback paths
- Removed
sync() calls on common channel resume
- Updated
close_sse_stream() to clear shadow senders
- Updated
create_local_session() to initialize shadow_txs
Test Results
Environment
- rmcp 0.15.0 (also affects 0.14.0)
StreamableHttpService with stateful_mode: true
LocalSessionManager (default session manager)
- Clients tested: Cursor 2.4.37, VS Code MCP Extension
Summary
When multiple SSE streams exist for the same session (e.g. from POST response reconnections),
LocalSessionWorker::resume()unconditionally replacesself.common.tx, killing the other stream's receiver. Both EventSource connections then reconnect everysse_retryseconds, leapfrogging each other in an infinite loop that floods the server with GET requests.Affected versions: rmcp 0.14.0, 0.15.0
Severity: Critical — causes infinite reconnect loops with clients like Cursor, and breaks server-to-client notifications over Streamable HTTP
Root Cause
The MCP Streamable HTTP transport sends POST SSE responses with a priming event containing
retry: 3000. When the POST stream ends (after delivering the response), the browser's EventSource API automatically reconnects via GET. This creates multiple competing EventSource connections:Each reconnecting GET calls
resume()which unconditionally replacesself.common.tx:Dropping the old sender closes the old receiver, terminating the OTHER EventSource's stream. That stream then reconnects, replacing the sender again. Both leapfrog every
sse_retry(3s) indefinitely.The Leapfrog Loop
Server logs confirm the pattern — alternating GET requests every 3 seconds with different
Last-Event-IDvalues:Additional Issue: Cache Replay Loop
Even without the leapfrog,
resume()calledsync()on the common channel to replay cached events. Replaying server-initiatedlist_changednotifications caused clients to re-process old signals, triggering unnecessary re-fetches every reconnection cycle.What Happens in Practice
Cursor (infinite loop)
VS Code (silent notification loss)
notify_tool_list_changed().awaitreturnsOk(())— silent failureFix: Shadow Channels
PR: #660
Instead of unconditionally replacing the common channel, check if the primary is still active:
tx.is_closed()) → Replace it. New stream becomes primary.Why Not 409 Conflict?
The initial approach (matching the TypeScript SDK) was to return 409 Conflict on duplicate standalone streams. However:
Shadow channels are the correct approach: keep all connections alive without interference.
Why No Cache Replay on Common Channel?
Common channel notifications (
tools/list_changed,resources/list_changed) are idempotent signals. Replaying cached ones causes clients to re-process old events, triggering unnecessary re-fetches or infinite notification loops. Missing one is harmless — the next real event arrives naturally. Request-wise channels still usesync()for proper response replay.Changes (5 commits)
8bd424e0d03eb5a7bb8227cf5406sync) when replacing active streamsa7df58cFiles Changed
crates/rmcp/src/transport/streamable_http_server/session/local.rsshadow_txs: Vec<Sender<ServerSseMessage>>toLocalSessionWorkerresume_or_shadow_common()with primary-alive checkresume()to use shadow logic for both direct common and request-wise fallback pathssync()calls on common channel resumeclose_sse_stream()to clear shadow senderscreate_local_session()to initializeshadow_txsTest Results
cargo check --workspacepassesEnvironment
StreamableHttpServicewithstateful_mode: trueLocalSessionManager(default session manager)