-
Notifications
You must be signed in to change notification settings - Fork 806
LLM Benchmarking #3486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
LLM Benchmarking #3486
Changes from all commits
Commits
Show all changes
76 commits
Select commit
Hold shift + click to select a range
7110164
init files
bradleyshep ef61f28
Update llm-benchmark-details.json
bradleyshep a200254
llm benchmarks (moved from private)
bradleyshep 58a5a08
remove dotenvy
bradleyshep af45a20
ignore registry
bradleyshep 961dd1c
summary updates; command
bradleyshep 8e6624f
Merge branch 'LLM-benchmarks' into bradley/llm-benchmark
bradleyshep b59650f
develop updates
bradleyshep 38596ba
DEVELOP + registry ignored
bradleyshep 7d69779
change generated registry to use relative paths + include in git
bradleyshep 3aa051b
attempt fix to pass
bradleyshep e443251
DEVELOP updates; clippy fixes?
bradleyshep 8161e45
clippy fixes
bradleyshep 26de9c4
Update ci.yml
bradleyshep 79d4abe
Potential fix for code scanning alert no. 106: Workflow does not cont…
bradleyshep 0f606b0
Merge branch 'master' into bradley/llm-benchmark
bradleyshep edeefb1
bump to 1.6, fixes
bradleyshep 3a0c2de
partial category scores
bradleyshep 4466fa0
Update DEVELOP.md
bradleyshep eb46333
Remove diff; add ci-quickfix
bradleyshep 1534e75
Merge branch 'master' into bradley/llm-benchmark
bradleyshep fd1933e
Merge branch 'master' into bradley/llm-benchmark
bradleyshep bc5e5bc
Fixes whitespace
cloutiertyler 7d811b1
Merges in master
cloutiertyler 28592d9
Switched to use clap for arg parsing
cloutiertyler 538f77f
Refactored to use SpacetimeDBGuard
cloutiertyler 798c134
Removed unused import
cloutiertyler 9c2736a
Merge branch 'master' into bradley/llm-benchmark
cloutiertyler a38d746
Moved the llm benchmark into tools instead of crates
cloutiertyler f49495b
prelim llm benchmark update workflow
cloutiertyler b395378
Updated the llm benchmark workflow
cloutiertyler b9153a9
Added OpenAI API key
cloutiertyler 8db7782
Update to I can run it from this branch
cloutiertyler 8493a52
Potential fix for code scanning alert no. 129: Untrusted Checkout TOCTOU
cloutiertyler 15ee6c3
run the workflow plz
cloutiertyler 2b6804d
updated CI name
cloutiertyler d4618a1
fixed skip check
cloutiertyler 68a1d7d
Manually putting in the PR number
cloutiertyler 50e54e5
Hopefully?
cloutiertyler e33da58
removed push
cloutiertyler c47d6c1
Fix thing
cloutiertyler 0bf66e2
Fix
cloutiertyler 03f398e
Fix thing
cloutiertyler 280ac58
Fix thing
cloutiertyler 0ba377d
Fix thing
cloutiertyler 82ea3a9
Install spacetime
cloutiertyler 6cb3c62
Install spacetime
cloutiertyler 71b7ca4
Added important comments
cloutiertyler 133851e
Refactor llm-benchmark to pass host_url through the app
cloutiertyler cbe5ee0
Update LLM benchmark results
271ea9e
Cargo fmt and ci fix
cloutiertyler d421301
Update LLM benchmark results
6f64817
Fixed version
cloutiertyler 34b0d43
Update LLM benchmark results
9dffd30
Add PR comment with benchmark results table
cloutiertyler a79f785
Restructure LLM benchmark result files for deterministic output and c…
cloutiertyler fa0b0d7
Cargo fmt
cloutiertyler 819e8ca
Cargo clippy
cloutiertyler b60c28a
Try to fix failure
cloutiertyler 8d916b8
Made long running jobs dependent on short running basic checks
cloutiertyler 25e79c5
Forgot to save file
cloutiertyler 861c804
Consolidated internal tests into the CI workflow
cloutiertyler a0817d5
slight name change
cloutiertyler 0d13679
Small fix. Lints now needed to run c sharp test suite
cloutiertyler bab6938
Fix C# benchmark SIGSEGV crashes in CI
cloutiertyler e278696
cargo fmt
cloutiertyler 1095c95
Try to fix C# problems
cloutiertyler 320c46c
Add MSBuild env vars to fix "Pipe is broken" errors in CI
cloutiertyler 916fafb
Removed errant file
cloutiertyler 1faf580
Hopefully fix thing
cloutiertyler f570c3b
Added nix flake check
cloutiertyler b36fc11
Update LLM benchmark results
04eb91a
Fixed workflow
cloutiertyler c1eb855
Removed nix flake check, see #3955
jdetter cb83f9f
Update workflow to checkout from master branch
cloutiertyler 7c4c5df
Merge branch 'master' into bradley/llm-benchmark
cloutiertyler File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why were these dependencies added?