Skip to content

Conversation

@hannesrudolph
Copy link
Collaborator

@hannesrudolph hannesrudolph commented Dec 5, 2025

Summary

image

This PR updates the evals /runs/new page with several UI improvements and a new multi-model launch feature.

Changes

UI Cleanup

  • Removed "Use Multiple Native Tool Calls" checkbox
  • Removed "Reasoning Effort" dropdown
  • Reorganized sliders into compact rows:
    • 3-column row for Concurrency, Timeout, and Iterations
    • 2-column row for Command Timeout and Shell Integration Timeout

Multi-Model Launch Feature

  • Added + button next to model selectors (for Roo Code Cloud/OpenRouter) and config selectors (for Import mode)
  • Clicking + adds another model/config selector row
  • Each non-last row has a - button to remove that selection
  • When multiple models/configs are selected, pressing "Launch" creates identical test runs for each with a 1-minute delay between runs
  • After all runs are launched, the user is automatically redirected to the main evals dashboard (/)
  • Toast notifications show progress ("Launching X runs (every 20 seconds)..." and "Run X/Y launched" for each)

Testing

  • Type checking passes
  • ESLint passes

Important

Adds multi-model launch feature and UI improvements to evals /runs/new page in new-run.tsx.

  • Multi-Model Launch:
    • Added + and - buttons for model/config selectors in NewRun to allow multiple selections.
    • Launches identical test runs for each model/config with a 1-minute delay between runs.
    • Redirects to main evals dashboard after all runs are launched.
  • UI Improvements:
    • Removed "Use Multiple Native Tool Calls" checkbox and "Reasoning Effort" dropdown.
    • Reorganized sliders into compact rows: 3-column for Concurrency, Timeout, Iterations; 2-column for Command Timeout, Shell Integration Timeout.
  • Misc:
    • Added ModelSelection and ConfigSelection types for handling multiple selections.
    • Updated onSubmit to handle multiple model/config launches with delay and error handling.

This description was created by Ellipsis for cd6cbd1. You can customize this summary. It will automatically update as commits are pushed.

- Remove 'Use Multiple Native Tool Calls' checkbox
- Remove 'Reasoning Effort' dropdown
- Reorganize sliders into compact rows (3-column for concurrency/timeout/iterations, 2-column for terminal timeouts)
- Add multi-model selection with + button to add more models/configs
- Launch identical test runs for each model with 1-minute delays
- Navigate back to main evals UI after launching
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. Enhancement New feature or request UI/UX UI/UX related or focused labels Dec 5, 2025
@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Dec 5, 2025
@RooCodeInc RooCodeInc deleted a comment from roomote bot Dec 5, 2025
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Dec 5, 2025
@mrubens mrubens merged commit c10d1d9 into main Dec 5, 2025
12 of 13 checks passed
@mrubens mrubens deleted the feat/evals-ui-multi-model-launch branch December 5, 2025 01:48
@github-project-automation github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Dec 5, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement New feature or request Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. lgtm This PR has been approved by a maintainer size:XL This PR changes 500-999 lines, ignoring generated files. UI/UX UI/UX related or focused

Projects

No open projects
Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants