Skip to content

Pull requests: openai/evals

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

README: fix Evals starter guide link
#1623 opened Feb 19, 2026 by dcol91863 Loading…
Add Logic Stress Stress-test Suite (v2, v3)
#1622 opened Feb 16, 2026 by 14H034160212 Loading…
fix: correct typos in evals
#1621 opened Feb 7, 2026 by thecaptain789 Loading…
Feat/update readme
#1619 opened Feb 5, 2026 by treepo1 Loading…
13 tasks
Improving CI
#1617 opened Feb 5, 2026 by fsdavi Loading…
13 tasks
Pritiks23 patch 1
#1613 opened Feb 3, 2026 by Pritiks23 Loading…
13 tasks done
Refactor JSONL file loading logic in data.py
#1612 opened Feb 3, 2026 by Pritiks23 Loading…
13 tasks done
Add powershell-encoding-basics eval
#1611 opened Jan 28, 2026 by TheodorNEngoy Loading…
Update to python 3.12
#1607 opened Dec 21, 2025 by omonimus1 Loading…
Update custom-eval.md
#1598 opened Aug 19, 2025 by rajeshkp Loading…
13 tasks
Fix AttributeError: Update OpenAI error imports (Closes #1564)
#1577 opened Jan 27, 2025 by SaiKrishna-KK Loading…
6 of 13 tasks
Update completion-fn-protocol.md
#1575 opened Jan 18, 2025 by NinoRisteski Loading…
13 tasks
Ice linguistic benchmark
#1561 opened Oct 1, 2024 by bjarkiarmanns Loading…
1 task
anthropic_solver.py
#1554 opened Sep 4, 2024 by iHuydang Loading…
13 tasks done
Fix a bug in examples/mmlu.ipynb when using gpt-4o or gpt-4o-mini
#1551 opened Aug 25, 2024 by RobinWitch Loading…
13 tasks done
Fix the is_chat_model function to work with gpt-4o
#1550 opened Aug 22, 2024 by LoryPack Loading…
3 tasks done
Added Icelandic QA evaluation data from news texts
#1548 opened Aug 20, 2024 by thorunna Loading…
12 of 13 tasks
Added Icelandic QA evaluation data from Wikipedia
#1547 opened Aug 20, 2024 by thorunna Loading…
12 of 13 tasks
ProTip! What’s not been updated in a month: updated:<2026-01-28.