Conversation
|
/update-llm-benchmark |
a4a30af to
6abf2ff
Compare
rekhoff
left a comment
There was a problem hiding this comment.
JSON files updated after automation, and the related CI jobs are passing.
4a78242 to
5db90ea
Compare
…' into bfops/rerun-benchmarks
LLM Benchmark Results (ci-quickfix)
Compared against master branch baseline Generated at: 2026-01-23T20:54:53.246Z Failure Analysis (click to expand)Benchmark Failure AnalysisGenerated from: Summary
SpacetimeDB Benchmark Failures AnalysisThis document analyzes test failures in the SpacetimeDB benchmark organized by language and mode. For each failure, we provide the generated code, the expected code, the error message, and a detailed explanation along with actionable recommendations. Rust / rustdoc_json Failures (8 total)Compile/Publish Errors (2 failures)t_002_scheduled_table & t_017_scheduled_columns
Other Failures (6 failures)t_003_struct_in_table, t_004_insert, t_007_crud, t_011_helper_function, t_016_sum_type_columns
Additional Observations:
Rust / docs Failures (22 total)Other Failures (22 failures)t_000_empty_reducers, t_001_basic_tables, t_002_scheduled_table, t_004_insert
C# / docs Failures (6 total)Other Failures (6 failures)t_008_index_lookup, t_013_spacetime_sum_type
Final ThoughtsA thorough review of generator patterns and failure analysis indicates that clarifying visibility, return types, syntax for scheduling, and handling nullable types are crucial improvements for development efficiency and error avoidance in SpacetimeDB. Documenting common patterns and providing clear guidelines will enhance user experience and reduce test failures. |
Description of Changes
Try fixing this again? It seems to pass on PRs if re-run.
API and ABI breaking changes
None.
Expected complexity level and risk
1
Testing