Re-run the LLM benchmarks update by bfops · Pull Request #4110 · clockworklabs/SpacetimeDB

bfops · 2026-01-23T17:21:41Z

Description of Changes

Try fixing this again? It seems to pass on PRs if re-run.

API and ABI breaking changes

None.

Expected complexity level and risk

1

Testing

It passes on this PR now 🤷

bfops · 2026-01-23T17:21:52Z

/update-llm-benchmark

rekhoff

JSON files updated after automation, and the related CI jobs are passing.

…' into bfops/rerun-benchmarks

clockwork-labs-bot · 2026-01-23T20:55:23Z

LLM Benchmark Results (ci-quickfix)

Language	Mode	Category	Tests Passed	Task Pass %
Rust	rustdoc_json	basics	22/27	74.3%
Rust	rustdoc_json	schema	26/34	75.3% ⬆️ +10.0%
Rust	rustdoc_json	total	48/61	74.8% ⬆️ +4.5%
Rust	docs	basics	5/27	11.1%
Rust	docs	schema	4/30	12.5% ⬇️ -8.0%
Rust	docs	total	9/57	11.7% ⬇️ -3.6%
C#	docs	basics	24/27	91.7% ⬇️ -8.3%
C#	docs	schema	22/34	63.7% ⬇️ -10.0%
C#	docs	total	46/61	78.9% ⬇️ -9.1%

Compared against master branch baseline

_{Generated at: 2026-01-23T20:54:53.246Z}

Failure Analysis (click to expand)

Benchmark Failure Analysis

Generated from: /__w/SpacetimeDB/SpacetimeDB/tools/xtask-llm-benchmark/../../docs/llms/docs-benchmark-details.json

Summary

Total failures analyzed: 36

SpacetimeDB Benchmark Failures Analysis

This document analyzes test failures in the SpacetimeDB benchmark organized by language and mode. For each failure, we provide the generated code, the expected code, the error message, and a detailed explanation along with actionable recommendations.

Rust / rustdoc_json Failures (8 total)

Compile/Publish Errors (2 failures)

t_002_scheduled_table & t_017_scheduled_columns

The generated code:

use spacetimedb::{table, reducer, ReducerContext, Table, ScheduleAt};

#[table(name = tick_timer, schedule(reducer = tick, column = scheduled_at))]
pub struct TickTimer {
    #[primary_key]
    #[auto_inc]
    scheduled_id: u64,
    scheduled_at: ScheduleAt,
}

#[reducer(init)]
pub fn init(ctx: &ReducerContext) {
    if ctx.db.tick_timer().count() == 0 {
        ctx.db.tick_timer().insert(TickTimer {
            scheduled_id: 0,
            scheduled_at: ScheduleAt::repeat(std::time::Duration::from_micros(50_000)),
        });
    }
}

#[reducer]
pub fn tick(_ctx: &ReducerContext, _timer: TickTimer) {}

The expected code:

use spacetimedb::{reducer, table, ReducerContext, ScheduleAt, Table};
use std::time::Duration;

#[table(name = tick_timer, scheduled(tick))]
pub struct TickTimer {
    #[primary_key]
    #[auto_inc]
    pub scheduled_id: u64,
    pub scheduled_at: ScheduleAt,
}

#[reducer]
pub fn tick(_ctx: &ReducerContext, _schedule: TickTimer) {}

#[reducer(init)]
pub fn init(ctx: &ReducerContext) {
    let every_50ms: ScheduleAt = Duration::from_millis(50).into();
    ctx.db.tick_timer().insert(TickTimer {
        scheduled_id: 0,
        scheduled_at: every_50ms,
    });
}

The error: publish_error: spacetime publish failed (exit=1)
Explain the difference:
- Incorrect #[table(name = tick_timer, schedule(reducer = tick, column = scheduled_at))] should use #[table(name = tick_timer, scheduled(tick))]
- Use of ScheduleAt::repeat is incorrect; it should use a proper time duration constructor.
Root cause: The documentation lacks clarity on scheduling syntax and constructors for time intervals in scheduled tasks.
Recommendation: Update documentation to emphasize using scheduled(tick) and correct constructors for ScheduleAt using Duration::from_millis.

Other Failures (6 failures)

t_003_struct_in_table, t_004_insert, t_007_crud, t_011_helper_function, t_016_sum_type_columns

The generated code (e.g., for t_003):

use spacetimedb::{ReducerContext, Table, UniqueColumn, SpacetimeType};

#[derive(SpacetimeType, Clone)]
pub struct Position {
    pub x: i32,
    pub y: i32,
}

#[spacetimedb::table(name = entity)]
pub struct Entity {
    #[primary_key]
    pub id: i32,
    pub pos: Position,
}

The expected code:

use spacetimedb::{table, SpacetimeType};

#[derive(SpacetimeType, Clone, Debug)]
pub struct Position {
    pub x: i32,
    pub y: i32,
}

#[table(name = entity)]
pub struct Entity {
    #[primary_key]
    pub id: i32,
    pub pos: Position,
}

The error: schema_parity: reducers differ - expected [], got [...]
Explain the difference: Missing pub for fields in structs which are not public, causing access issues.
Root cause: Insufficient detail in documentation about struct visibility and reducing/scheduling attributes.
Recommendation: Clarify that public fields are required for structs defining database tables.

Additional Observations:

The focus must be on both visibility modifiers and correct API signatures for reducers and tables.
The need for Result<(), String> in reducer functions is missing in many generated snippets.

Rust / docs Failures (22 total)

Other Failures (22 failures)

t_000_empty_reducers, t_001_basic_tables, t_002_scheduled_table, t_004_insert

The generated code (for t_000_empty_reducers):

use spacetimedb::ReducerContext;

#[spacetimedb::reducer]
pub fn empty_reducer_no_args(_ctx: &ReducerContext) {}

The expected code:

use spacetimedb::{reducer, ReducerContext};

#[reducer]
pub fn empty_reducer_no_args(ctx: &ReducerContext) -> Result<(), String> {
    Ok(())
}

The error: schema_parity: describe failed: WARNING: This command is UNSTABLE
Explain the difference: Missing return type Result<(), String> for all reducer functions causes the failure.
Root cause: Documentation does not clearly specify that all reducer functions must return a Result.
Recommendation: Update the documentation to explicitly require a Result return type for all reducer functions to avoid compilation errors.

C# / docs Failures (6 total)

Other Failures (6 failures)

t_008_index_lookup, t_013_spacetime_sum_type

The generated code (for t_008_index_lookup):

using SpacetimeDB;

public static partial class Module
{
    [SpacetimeDB.Table(Name = "User")]
    public partial struct User
    {
        [SpacetimeDB.PrimaryKey]
        public int Id;
        public string Name;
        public int Age;
        public bool Active;
    }

    [SpacetimeDB.Reducer]
    public static void LookupUserName(ReducerContext ctx, int id)
    {
        var user = ctx.Db.User.Id.Find(id);
        if (user != null)
        {
            ctx.Db.Result.Insert(new Result
            {
                Id = user.Id,
                Name = user.Name
            });
        }
    }
}

The expected code:

using SpacetimeDB;

public static partial class Module
{
    [Table(Name = "User")]
    public partial struct User
    {
        [PrimaryKey] public int Id;
        public string Name;
        public int Age;
        public bool Active;
    }

    [Reducer]
    public static void LookupUserName(ReducerContext ctx, int id)
    {
        var u = ctx.Db.User.Id.Find(id);
        if (u.HasValue)
        {
            var row = u.Value;
            ctx.Db.Result.Insert(new Result { Id = row.Id, Name = row.Name });
        }
    }
}

The error: publish_error: spacetime build (csharp) failed (exit=1)
Explain the difference: Use of user != null instead of checking u.HasValue, which is necessary for nullable types.
Root cause: Lacking examples for nullable types or option types in the given context.
Recommendation: Address nullable type usage in the documentation, emphasizing how to correctly check for value presence.

Final Thoughts

A thorough review of generator patterns and failure analysis indicates that clarifying visibility, return types, syntax for scheduling, and handling nullable types are crucial improvements for development efficiency and error avoidance in SpacetimeDB. Documenting common patterns and providing clear guidelines will enhance user experience and reduce test failures.

bfops requested review from cloutiertyler and jdetter as code owners January 23, 2026 17:24

bfops mentioned this pull request Jan 23, 2026

Removing1.5.0 DLLs #4100

Merged

2 tasks

bfops force-pushed the bfops/rerun-benchmarks branch from a4a30af to 6abf2ff Compare January 23, 2026 17:37

bfops changed the base branch from master to bfops/hide-llm-files January 23, 2026 17:45

bfops removed request for cloutiertyler and jdetter January 23, 2026 17:53

rekhoff approved these changes Jan 23, 2026

View reviewed changes

[bfops/rerun-benchmarks]: rebase

5db90ea

bfops force-pushed the bfops/rerun-benchmarks branch from 4a78242 to 5db90ea Compare January 23, 2026 18:05

clockworklabs deleted a comment from clockwork-labs-bot Jan 23, 2026

bfops changed the base branch from bfops/hide-llm-files to master January 23, 2026 20:09

bfops added this pull request to the merge queue Jan 23, 2026

bfops removed this pull request from the merge queue due to a manual request Jan 23, 2026

[bfops/rerun-benchmarks]: Merge remote-tracking branch 'origin/master…

4cc921d

…' into bfops/rerun-benchmarks

Update LLM benchmark results

5a8dd2d

bfops added this pull request to the merge queue Jan 23, 2026

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 23, 2026

bfops added this pull request to the merge queue Jan 23, 2026

Merged via the queue into master with commit 7f6fd18 Jan 23, 2026
44 of 45 checks passed

bfops deleted the bfops/rerun-benchmarks branch February 6, 2026 03:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-run the LLM benchmarks update#4110

Re-run the LLM benchmarks update#4110
bfops merged 3 commits intomasterfrom
bfops/rerun-benchmarks

bfops commented Jan 23, 2026 •

edited

Loading

Uh oh!

bfops commented Jan 23, 2026

Uh oh!

rekhoff left a comment

Uh oh!

Uh oh!

clockwork-labs-bot commented Jan 23, 2026

Benchmark Failure Analysis

Summary

SpacetimeDB Benchmark Failures Analysis

Rust / rustdoc_json Failures (8 total)

Compile/Publish Errors (2 failures)

t_002_scheduled_table & t_017_scheduled_columns

Other Failures (6 failures)

t_003_struct_in_table, t_004_insert, t_007_crud, t_011_helper_function, t_016_sum_type_columns

Additional Observations:

Rust / docs Failures (22 total)

Other Failures (22 failures)

t_000_empty_reducers, t_001_basic_tables, t_002_scheduled_table, t_004_insert

C# / docs Failures (6 total)

Other Failures (6 failures)

t_008_index_lookup, t_013_spacetime_sum_type

Final Thoughts

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bfops commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of Changes

API and ABI breaking changes

Expected complexity level and risk

Testing

Uh oh!

bfops commented Jan 23, 2026

Uh oh!

rekhoff left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

clockwork-labs-bot commented Jan 23, 2026

LLM Benchmark Results (ci-quickfix)

Benchmark Failure Analysis

Summary

SpacetimeDB Benchmark Failures Analysis

Rust / rustdoc_json Failures (8 total)

Compile/Publish Errors (2 failures)

t_002_scheduled_table & t_017_scheduled_columns

Other Failures (6 failures)

t_003_struct_in_table, t_004_insert, t_007_crud, t_011_helper_function, t_016_sum_type_columns

Additional Observations:

Rust / docs Failures (22 total)

Other Failures (22 failures)

t_000_empty_reducers, t_001_basic_tables, t_002_scheduled_table, t_004_insert

C# / docs Failures (6 total)

Other Failures (6 failures)

t_008_index_lookup, t_013_spacetime_sum_type

Final Thoughts

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bfops commented Jan 23, 2026 •

edited

Loading