OpenAI Evals
OpenAI Evals is an open-source framework for evaluating large language models (LLMs) and LLM systems. It offers a registry of benchmarks and tools for developers and researchers to run, customize, and manage evaluations to assess model performance and behavior.
Key Information
- Category: Evaluation and Monitoring
- Source: Github
- Tags: Python
- Last updated: January 09, 2026
Structured Metrics
No structured metrics captured yet.
Links
Canonical source: https://github.com/openai/evals