-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Pull requests: openai/evals
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
fix: replace 11 bare except clauses with except Exception
#1626
opened Feb 25, 2026 by
haosenwang1018
Loading…
Add finance-agent routing eval dataset and builder guidance
#1625
opened Feb 24, 2026 by
maxpetrusenko
Loading…
Add reasoning consistency eval under constrained intermediate steps
#1615
opened Feb 5, 2026 by
getappai
Loading…
Refactor JSONL file loading logic in data.py
#1612
opened Feb 3, 2026 by
Pritiks23
Loading…
13 tasks done
Add tnengoy_citations.dev.v0 (model-graded factuality eval)
#1603
opened Oct 12, 2025 by
TheodorNEngoy
Loading…
Fix AttributeError: Update OpenAI error imports (Closes #1564)
#1577
opened Jan 27, 2025 by
SaiKrishna-KK
Loading…
6 of 13 tasks
Fix TypeError in add_token_usage_to_result when non-integer usage data is present
#1574
opened Jan 4, 2025 by
masihmoloodian
Loading…
Add support for new models (gpt-4o, o1-preview and o1-mini)
#1558
opened Sep 15, 2024 by
sakher
Loading…
Bugfixing completion stats break with new reasoning tokens release
#1555
opened Sep 13, 2024 by
lucapericlp
Loading…
Fix a bug in examples/mmlu.ipynb when using gpt-4o or gpt-4o-mini
#1551
opened Aug 25, 2024 by
RobinWitch
Loading…
13 tasks done
Fix the is_chat_model function to work with gpt-4o
#1550
opened Aug 22, 2024 by
LoryPack
Loading…
3 tasks done
Added Icelandic QA evaluation data from news texts
#1548
opened Aug 20, 2024 by
thorunna
Loading…
12 of 13 tasks
Added Icelandic QA evaluation data from Wikipedia
#1547
opened Aug 20, 2024 by
thorunna
Loading…
12 of 13 tasks
Previous Next
ProTip!
What’s not been updated in a month: updated:<2026-01-28.