keynote-2: add fair benchmark variant with leveled playing field#4432
keynote-2: add fair benchmark variant with leveled playing field#4432zalo wants to merge 4 commits intoclockworklabs:masterfrom
Conversation
Add an alternative benchmark configuration that eliminates compounding asymmetries between SpacetimeDB and competitors: - TypeScript client for ALL systems (no custom Rust client for SpacetimeDB) - confirmedReads=true for SpacetimeDB (durable commits, matching Postgres fsync) - Client-side TPS counting for all (no server-side Prometheus metrics) - Same pipeline depth (8) for all systems - Postgres read_committed isolation (its actual default, not serializable) - Postgres synchronous_commit=on (matching SpacetimeDB durability) - New stored-procedure RPC server eliminates Drizzle ORM multi-round-trip overhead, making Postgres comparable to SpacetimeDB's single-call reducer New files: - src/fair-bench.ts: fair benchmark runner with enforced equal settings - src/rpc-servers/postgres-storedproc-rpc-server.ts: PL/pgSQL stored proc - src/connectors/rpc/postgres_storedproc_rpc.ts: connector for above - docker-compose-fair.yml: Postgres with fair configuration - FAIR-BENCHMARK.md: methodology documentation Local test results (Postgres only, same machine): - Postgres (Drizzle ORM): 1,817 TPS @ alpha=0.5, 815 TPS @ alpha=1.5 - Postgres (stored proc): 3,415 TPS @ alpha=0.5, 1,217 TPS @ alpha=1.5 The stored procedure alone provides ~1.9x speedup by eliminating ORM round-trips -- demonstrating that a significant portion of the original benchmark's gap comes from penalizing competitors with unnecessary overhead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a “fair benchmark” variant for the keynote-2 template to reduce methodological asymmetries between SpacetimeDB and competitor systems, including a Postgres stored-procedure RPC path and updated docs/compose config.
Changes:
- Introduces
fair-bench.tsrunner that forces “fair” env settings and runs a selectable set of systems. - Adds a Postgres stored-procedure RPC server + connector, plus a matching test scenario entry.
- Adds
docker-compose-fair.ymlandFAIR-BENCHMARK.mdto document and run the fair configuration.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| templates/keynote-2/src/tests/test-1/postgres_storedproc_rpc.ts | Registers the stored-proc Postgres RPC system for test-1. |
| templates/keynote-2/src/rpc-servers/postgres-storedproc-rpc-server.ts | New RPC server that installs/uses a do_transfer() PL/pgSQL function to do transfers in one DB call. |
| templates/keynote-2/src/fair-bench.ts | New “fair benchmark” runner script with service checks, seeding, and results output. |
| templates/keynote-2/src/connectors/rpc/postgres_storedproc_rpc.ts | New connector targeting the stored-proc RPC server port/URL. |
| templates/keynote-2/src/connectors/index.ts | Registers the new connector in CONNECTORS. |
| templates/keynote-2/package.json | Adds npm run fair-bench script. |
| templates/keynote-2/docker-compose-fair.yml | New compose file for Postgres fair settings + stored-proc RPC server. |
| templates/keynote-2/FAIR-BENCHMARK.md | Documents the fairness rationale and how to run the fair benchmark. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
templates/keynote-2/src/rpc-servers/postgres-storedproc-rpc-server.ts
Outdated
Show resolved
Hide resolved
templates/keynote-2/src/connectors/rpc/postgres_storedproc_rpc.ts
Outdated
Show resolved
Hide resolved
templates/keynote-2/src/rpc-servers/postgres-storedproc-rpc-server.ts
Outdated
Show resolved
Hide resolved
- Use USE_SPACETIME_METRICS_ENDPOINT=1 to avoid broken onTransfer callback in existing connector (SDK callReducer already awaits round-trip confirmation via Promise) - Use port 3100 for SpacetimeDB to avoid conflicts - Make health check port configurable via STDB_PORT env var Local results with all settings leveled: alpha=0.5: STDB 100 TPS, PG ORM 1,902 TPS, PG stored proc 3,387 TPS alpha=1.5: STDB 105 TPS, PG ORM 249 TPS, PG stored proc 277 TPS SpacetimeDB's TypeScript client with confirmedReads=true and pipeline depth 8 shows ~500ms p50 latency, suggesting the WebSocket+BSATN path has significant per-operation overhead when awaiting confirmations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix round-trip count: "4 SQL/ORM round-trips" to "5" in doc comment and FAIR-BENCHMARK.md table (BEGIN+SELECT+UPDATE+UPDATE+COMMIT = 5) - Fix getAccount() balance type: annotate as string (from JSON) and parse with BigInt() to match the RpcConnector interface - Fix amount precision loss in rpcTransfer: parse directly to BigInt instead of going through Number() which truncates values > 2^53 - Remove misleading --pipeline-depth flag and related claims; the fair benchmark runs sequentially (non-pipelined) so the setting had no effect - Add CLI numeric arg validation: reject NaN, Infinity, and <= 0 - Fix bare catch on dynamic import: only fall back to rpc_single_call for MODULE_NOT_FOUND errors, rethrow genuine errors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a Rust client for Postgres that mirrors SpacetimeDB's Rust client: - Direct binary protocol via tokio-postgres (no HTTP, JSON, or Node.js) - Multi-threaded Tokio runtime (one thread per connection) - Batched/pipelined queries with prepared statements - Calls do_transfer() stored procedure (single DB round-trip) This eliminates all middleware overhead to isolate the genuine architectural difference between SpacetimeDB and Postgres. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
FINALLY, THE RIGHT BENCHMARKS |
|
These benchmark changes you're making have significant issues. We'll be doing a proper technical writeup addressing the concerns you mention here. |
Conversation Log for Bias DisclosureThis PR was created through an interactive session with Claude Code (Opus 4.6). Below is a complete log of every user prompt that directed the analysis, for full transparency. Turn 1 — User
Claude: Cloned the repo, read all benchmark files, and produced a detailed analysis identifying ~10 compounding asymmetries (Rust client vs TypeScript, 16,384 vs 8 pipeline depth, server-side vs client-side TPS counting, confirmedReads=false, serializable isolation for Postgres, etc.). Concluded the claimed 14x advantage would likely be 2-5x with a leveled playing field. Turn 2 — User
Claude: Created a fair benchmark variant with:
Ran Postgres-only tests locally (SpacetimeDB CLI not yet available):
Forked repo, pushed branch, created PR #4432. Turn 3 — User
Claude: Checked — only blocker was an unsigned CLA. No build/test CI failures. Turn 4 — User
Claude: Set up SpacetimeDB via Docker (pulled image, published Rust module, seeded). Ran full 3-way comparison with fair settings: alpha=0.5: STDB 100 TPS (p50=500ms), PG ORM 1,902 TPS, PG stored proc 3,387 TPS Committed, pushed, updated PR description with these results. Turn 5 — User
Claude: Ran each platform at maximum capability:
alpha=0.5: STDB 92,480 TPS, PG stored proc 3,880 TPS, PG ORM 1,838 TPS Turn 6 — User
Claude: Built alpha=0.5: STDB 73,123 TPS vs PG Rust 32,232 TPS → 2.3x gap Committed, pushed, updated PR with full decomposition of where the 14x comes from. Turn 7 — User
Claude: Updated PR #4432 description with all three benchmark modes and the factor decomposition table. Disclosure Notes
|
Summary
The keynote-2 benchmark reports SpacetimeDB at ~14x faster than the next best competitor. This PR adds tools to measure where that gap actually comes from — by progressively eliminating asymmetries and by giving Postgres the same kind of optimized Rust client that SpacetimeDB uses.
Three Benchmark Modes
1. Original benchmark (SpacetimeDB's version)
SpacetimeDB gets a custom Rust client; competitors get Node.js HTTP + Drizzle ORM.
2. Fair benchmark (
npm run fair-bench)Same TypeScript client for all, confirmed reads, sequential operations.
3. Both optimized (
postgres-rust-client+spacetimedb-rust-client)Each platform gets a Rust client with binary protocol, batched pipelining, and stored procedures. This isolates the genuine architectural difference.
Results: All Three Modes (local, same machine, 50 connections, 10s)
alpha=0.5 (low contention)
alpha=1.5 (high contention, ~75% collision rate)
Where the 14x Claim Comes From
The honest headline: SpacetimeDB is 2.3x faster than Postgres at low contention and 8.3x faster under heavy contention — when both use equivalent optimized clients. The contention advantage is genuinely impressive. But the original 14x claim was inflated by comparing an optimized Rust client against unoptimized Node.js + ORM middleware.
What's Added
postgres-rust-client/— Rust client for Postgres usingtokio-postgreswith binary protocol, prepared statements, batched pipelining, and thedo_transfer()stored procedure. Mirrors the SpacetimeDB Rust client's architecture.src/fair-bench.ts— Fair benchmark runner forcing equal settings for all systemssrc/rpc-servers/postgres-storedproc-rpc-server.ts— PL/pgSQL stored procedure RPC serversrc/connectors/rpc/postgres_storedproc_rpc.ts— Connector for stored proc serverdocker-compose-fair.yml— Postgres with fair configFAIR-BENCHMARK.md— Detailed methodology documentationUsage
Caveats
Test plan
🤖 Generated with Claude Code