Skip to content

Async Chunked Export for Large Databases#132

Open
suletetes wants to merge 14 commits intoouterbase:mainfrom
suletetes:fix/async-chunked-export-large-databases
Open

Async Chunked Export for Large Databases#132
suletetes wants to merge 14 commits intoouterbase:mainfrom
suletetes:fix/async-chunked-export-large-databases

Conversation

@suletetes
Copy link
Copy Markdown

Summary

This PR introduces an asynchronous, chunked export system that enables StarbaseDB to export databases of any size up to the planned 10GB Durable Object SQLite limit without hitting the 30-second Cloudflare Workers request timeout or blocking the Durable Object from serving other requests.

Problem

The existing export endpoints (GET /export/dump, GET /export/json/:tableName, GET /export/csv/:tableName) load the entire dataset into memory and return it in a single synchronous response. This fails for large databases because:

  1. 30-second timeout: Cloudflare Workers enforce a hard 30-second request timeout. Exports that take longer are killed mid-flight, returning a network error with no data.
  2. Durable Object blocking: While the synchronous export runs, the single-threaded Durable Object cannot process any other requests WebSocket messages, RPC calls, and HTTP requests all queue up and eventually timeout.
  3. Memory limits: Building the entire dump string in memory before sending the response can exceed Worker memory limits for large datasets.
  4. No partial recovery: If the export fails at any point, all progress is lost. There is no way to resume.

Reproduction

curl --location 'https://starbasedb.YOUR-ID-HERE.workers.dev/export/dump' \
--header 'Authorization: Bearer ABC123' \
--output database_dump.sql

On a database large enough that the export exceeds 30 seconds, this returns a network error instead of a dump file.

Solution

A two-tier approach that preserves full backward compatibility:

Tier 1 Synchronous (unchanged)

For small databases that complete within the 30-second window, the existing GET /export/dump, GET /export/json/:tableName, and GET /export/csv/:tableName endpoints continue to work exactly as before. Zero changes for existing users.

Tier 2 Asynchronous Chunked Export (new)

For large databases, a new POST /export/dump endpoint initiates a background export job that:

  1. Returns 202 Accepted immediately with a jobId and statusUrl
  2. Processes data in batches (~5,000 rows) via Durable Object alarm cycles
  3. Each alarm cycle runs for ~4–5 seconds, then yields for ~100ms to let other requests through
  4. Streams formatted chunks (SQL, JSON, or CSV) to Cloudflare R2 via multipart upload
  5. Tracks progress in a tmp_export_jobs internal table (current table, offset, bytes written)
  6. On completion, finalizes the R2 multipart upload and optionally delivers a webhook callback
  7. On failure, retries after 60 seconds; jobs stuck for >10 minutes are marked failed
Client                          Worker                    Durable Object              R2
  │                               │                           │                       │
  │  POST /export/dump            │                           │                       │
  │  { async: true, format: sql } │                           │                       │
  │──────────────────────────────>│                           │                       │
  │                               │  createExportJob()        │                       │
  │                               │──────────────────────────>│                       │
  │                               │                           │  createMultipartUpload│
  │                               │                           │──────────────────────>│
  │                               │                           │  INSERT tmp_export_jobs│
  │                               │                           │                       │
  │  202 { jobId, statusUrl }     │                           │                       │
  │<──────────────────────────────│                           │                       │
  │                               │                           │                       │
  │                               │           alarm()         │                       │
  │                               │                           │  SELECT rows (batch)  │
  │                               │                           │  format chunk         │
  │                               │                           │  uploadPart()         │
  │                               │                           │──────────────────────>│
  │                               │                           │  UPDATE progress      │
  │                               │                           │  setAlarm(+100ms)     │
  │                               │                           │                       │
  │                               │           alarm()         │                       │
  │                               │                           │  ... repeat ...       │
  │                               │                           │                       │
  │                               │           alarm()         │                       │
  │                               │                           │  completeMultipart()  │
  │                               │                           │──────────────────────>│
  │                               │                           │  UPDATE completed     │
  │                               │                           │  POST callbackUrl     │
  │                               │                           │                       │
  │  GET /export/jobs/:id         │                           │                       │
  │──────────────────────────────>│  getExportJob()           │                       │
  │  200 { status: completed }    │                           │                       │
  │<──────────────────────────────│                           │                       │
  │                               │                           │                       │
  │  GET /export/jobs/:id/download│                           │                       │
  │──────────────────────────────>│                           │  R2.get(key)          │
  │  200 [file stream]            │                           │<──────────────────────│
  │<──────────────────────────────│                           │                       │

New API Endpoints

POST /export/dump Start Async Export

Request:

{
    "async": true,
    "format": "sql",
    "callbackUrl": "https://your-webhook.com/export-done"
}
Field Type Required Description
async boolean Yes Must be true to use async export
format string No sql (default), json, or csv
callbackUrl string No Webhook URL to POST when job completes or fails

Response (202 Accepted):

{
    "result": {
        "jobId": "export_20240101-170000_abc123",
        "status": "pending",
        "statusUrl": "/export/jobs/export_20240101-170000_abc123",
        "estimatedTables": 15
    }
}

GET /export/jobs/:jobId Check Job Status

Response (200):

{
    "result": {
        "jobId": "export_20240101-170000_abc123",
        "status": "completed",
        "format": "sql",
        "completedTables": 15,
        "totalTables": 15,
        "bytesWritten": 524288000,
        "createdAt": "2024-01-01 17:00:00",
        "completedAt": "2024-01-01 17:02:34",
        "errorMessage": null
    }
}

Job statuses: pendingin_progresscompleted | failed

GET /export/jobs/:jobId/download Download Export File

Returns the completed export file from R2 with appropriate Content-Type and Content-Disposition headers. Returns 400 if the job is not yet completed, 404 if the job doesn't exist.

Webhook Callback

When a callbackUrl is provided, the system POSTs to it on completion or failure:

On completion:

{
    "jobId": "export_20240101-170000_abc123",
    "status": "completed",
    "downloadUrl": "/export/jobs/export_20240101-170000_abc123/download"
}

On failure:

{
    "jobId": "export_20240101-170000_abc123",
    "status": "failed",
    "error_message": "Export job timed out after 10 minutes"
}

Infrastructure Requirements

R2 Bucket

This feature requires a Cloudflare R2 bucket. The binding is already configured in wrangler.toml:

[[r2_buckets]]
binding = "EXPORT_BUCKET"
bucket_name = "starbasedb-exports"

Users must create the bucket before using async exports:

npx wrangler r2 bucket create starbasedb-exports

If the EXPORT_BUCKET binding is not configured, the async export endpoint returns a clear 400 error:

{
    "error": "Async exports require the EXPORT_BUCKET R2 binding to be configured"
}

The synchronous GET export endpoints are completely unaffected by whether R2 is configured or not.

Files Changed

New Files

File Description
src/export/job.ts Export job lifecycle manager createExportJob, processExportChunk, completeExportJob, failExportJob, getExportJob, deliverCallback, generateJobId
src/export/format.ts Chunk formatting helpers formatChunkAsSQL, formatChunkAsJSON, formatChunkAsCSV
src/export/job.test.ts 36 comprehensive tests covering format helpers, job lifecycle, R2 interactions, callbacks, and full end-to-end integration for all 3 formats
src/export/dump.async.test.ts Bug condition exploration tests validates async routes exist and return correct responses
src/export/preservation.test.ts Preservation tests validates synchronous exports remain unchanged

Modified Files

File Changes
wrangler.toml Added [[r2_buckets]] binding for EXPORT_BUCKET
worker-configuration.d.ts Added EXPORT_BUCKET: R2Bucket to Env interface
src/index.ts Added EXPORT_BUCKET?: R2Bucket to Env interface; passes it to DataSource as r2ExportBucket
src/types.ts Added r2ExportBucket?: R2Bucket to DataSource type
src/do.ts Added tmp_export_jobs table creation; extended alarm() handler for export job processing with stuck job detection and retry logic; stored R2 bucket reference; exposed createExportJob and getExportJob via init() RPC
src/export/dump.ts Added asyncDumpDatabaseRoute, getExportJobRoute, downloadExportJobRoute handlers; existing dumpDatabaseRoute unchanged
src/handler.ts Registered POST /export/dump, GET /export/jobs/:jobId, GET /export/jobs/:jobId/download routes with isInternalSource middleware
src/rls/index.test.ts Fixed pre-existing test failures corrected mock policy schemas and action types to match actual RLS implementation behavior
README.md Added async export feature to features list and full documentation with curl examples

Internal Table Schema

CREATE TABLE IF NOT EXISTS tmp_export_jobs (
    id TEXT PRIMARY KEY,
    format TEXT NOT NULL,              -- 'sql' | 'json' | 'csv'
    status TEXT NOT NULL,              -- 'pending' | 'in_progress' | 'completed' | 'failed'
    target_table TEXT,                 -- NULL for full dump, table name for single-table
    r2_key TEXT NOT NULL,              -- R2 object key (e.g., 'exports/dump_20240101-170000.sql')
    r2_upload_id TEXT,                 -- R2 multipart upload ID
    current_table TEXT,                -- Resume cursor: current table being processed
    current_offset INTEGER DEFAULT 0,  -- Resume cursor: row offset within current table
    total_tables INTEGER,              -- Total number of tables to export
    completed_tables INTEGER DEFAULT 0,-- Number of tables fully exported
    bytes_written INTEGER DEFAULT 0,   -- Total bytes uploaded to R2
    parts_uploaded TEXT DEFAULT '[]',   -- JSON array of R2 uploaded part metadata
    callback_url TEXT,                 -- Optional webhook URL
    error_message TEXT,                -- Error details if status is 'failed'
    created_at TEXT DEFAULT (datetime('now')),
    completed_at TEXT
);

Breathing Strategy

Each Durable Object alarm cycle follows this pattern to prevent blocking:

  1. Start a timer
  2. Fetch the next batch of rows (~5,000) from the current table/offset
  3. Format the batch as SQL INSERT statements, JSON array fragment, or CSV lines
  4. Upload the formatted chunk as an R2 multipart part
  5. Update job progress in tmp_export_jobs
  6. Check elapsed time if >4.5 seconds, save state and schedule next alarm in 100ms
  7. If all tables are processed, finalize the multipart upload and mark the job completed

The 100ms gap between alarm cycles allows the Durable Object to process any queued requests (WebSocket messages, RPC calls, other HTTP requests), keeping the system responsive during long exports.

Error Handling

Scenario Behavior
EXPORT_BUCKET not configured Returns 400 with descriptive error message
Alarm cycle fails (exception) Catches error, schedules retry alarm in 60 seconds, preserves progress
Job stuck in_progress >10 minutes Alarm handler marks it failed with timeout message
R2 upload part fails Job marked failed, multipart upload aborted
Callback delivery fails Logged to console, does not affect job status
Job not found GET /export/jobs/:id returns 404
Job not completed GET /export/jobs/:id/download returns 400

Backward Compatibility

This change is fully backward compatible:

  • All existing GET /export/* endpoints work identically
  • The new POST /export/dump route does not conflict with the existing GET /export/dump route (different HTTP methods)
  • The EXPORT_BUCKET R2 binding is optional if not configured, only the async export returns an error; all other functionality is unaffected
  • No changes to the DataSource type break existing code (the new r2ExportBucket field is optional)
  • No changes to the Durable Object alarm handler break existing cron task processing (export jobs are checked first, cron tasks continue to work alongside)

Test Results

Test Files  22 passed (22)
     Tests  199 passed (199)
  Duration  ~8s

Test Coverage Breakdown

Test File Tests What It Covers
job.test.ts 36 Format helpers (SQL/JSON/CSV with edge cases), generateJobId uniqueness, createExportJob (R2 init, job store, missing bucket, target table, callback), processExportChunk (state transitions, R2 upload, progress tracking, all 3 formats), completeExportJob (R2 finalize, empty export handling), failExportJob (status + R2 abort), getExportJob (found/not found), deliverCallback (success/failure/null/network error), full lifecycle integration for SQL/JSON/CSV
dump.async.test.ts 3 Route existence: POST returns 202, GET status returns job data, GET download returns file
preservation.test.ts 5 Sync dump returns valid SQL, JSON export works, CSV export works, 404 for missing tables, 400 for non-internal source

Live Verification

The feature was tested end-to-end on a running wrangler dev instance with a real R2 bucket:

  1. Created test table with data via POST /query
  2. Verified synchronous GET /export/dump still returns full SQL dump ✓
  3. Initiated async export via POST /export/dump with {"async": true, "format": "sql"} received 202 with jobId
  4. Polled GET /export/jobs/:jobId status progressed from pending to completed in ~1 second ✓
  5. Downloaded via GET /export/jobs/:jobId/download received complete SQL dump with CREATE TABLE and INSERT statements ✓
  6. Repeated for JSON format received valid JSON array ✓
  7. Repeated for CSV format received CSV with headers and data rows ✓

Related Issues

Fixes the database export timeout issue for large databases as described in the original bug report.

Here's the video description to add at the end of your PR:

Screenshot

The Screenshot below shows the full test suite running with npx vitest --run all 199 tests across 22 test files pass successfully, including the new async chunked export tests (bug condition exploration, preservation property tests, and 36 comprehensive job lifecycle tests covering SQL, JSON, and CSV format exports through the complete create → process → complete flow).

Screenshot 2026-04-21 014855

/claim #59

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant