Help

How to use Evaluation Cards.

New here? Start with the quickstart, then dive into a guide written for your role. You can replay the intro tour at any time.

Quickstart

Stakeholder-agnostic · ~6 min

Quickstart

The four signals, the five-level hierarchy, and your first five minutes on the site.

Start reading

Tutorials by stakeholder

Each guide reads the same record through a different lens. Pick the one closest to how you'll use Evaluation Cards.

Documentation

Deeper, more technical references for contributing to and working with the data behind Evaluation Cards.

Suggest missing documentation on our public roadmap and we'll make sure to add it!

How to contribute

Evaluation Cards is a living, community artifact — its coverage and usefulness grow as people report, upload, use, and cite it. Here's what helps most, depending on who you are.

Model developers
  • Report your model's results to Every Eval Ever so they show up here in context.
  • Already on EEE? Cross-post them to Hugging Face so your scores appear on the model page with a backlink.
  • Document the run-level details that raise your signals — temperature and max tokens, the harness, and (for agentic evaluations) the eval plan and limits.
  • See a wrong or missing number for your model? Flag it in the Space discussions or via each record's correction path.
Evaluation developers
  • Upload your benchmark's results to Every Eval Ever so others can find, run, and reuse them.
  • Fill in your benchmark's metadata — goals, construct, scoring rubric, intended uses, and limitations — to raise its completeness score.
  • Report schema gaps or data issues on the EEE issue tracker.
Researchers
  • Use Evaluation Cards in your model-, evaluation-, or field-level analysis — and cite the paper when you build on it.
  • Report third-party results you've run to Every Eval Ever — independent numbers are first-class here.
  • Flag discrepancies or suggest methodology improvements on the issue tracker or in the discussions.
  • Spread the word — share it with collaborators and on socials.
Policymakers
  • Consult Evaluation Cards as an evidence base — what's documented, who reported it, and how comparable it is.
  • Cite the paper in reports and briefings, and point colleagues to the site.
  • Tell us what evidence you need for decisions — suggest features on the public roadmap or via the feedback form.
  • Spread the word so more of the field reports legibly.

Spotted an error? A wrong or missing number anywhere in the corpus can be flagged through the feedback form with a source — corrections are versioned, and coverage improves as developers and third parties publish.

Not sure where something fits? The public roadmap, the feedback form, the EEE issue tracker, and the Space discussions are always open.

How to cite

If you find this effort useful, please consider citing our paper and sharing our work on socials.

Reference

Ghosh, A., Reuel, A., Chim, J., Kennedy, W. M., et al. (2026). Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting. arXiv:2606.09809.

BibTeX · Evaluation Cards
@article{ghosh2026evaluationcards,
  title        = {Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting},
  author       = {Ghosh, Avijit and Reuel, Anka and Chim, Jenny and Kennedy, Wm. Matthew and Yadav, Srishti and Mickel, Jennifer and Long, Yanan and Tran, Andrew and Kornilova, Anastassia and Stachura, Damian and Klyman, Kevin and Friedrich, Felix and Sania, Jeba and Lamparth, Max and Batzner, Jan and Mishra, Anoop and Habba, Eliya and Hao, Yixiong and Heath, Nathan and Rismani, Shalaleh and Gohar, Usman and Loehr, Andrea and Manheim, David and Dhar, Ruchira and Nelaturu, Sree Harsha and Sinha, Aarush and Choshen, Leshem and Sharma, Drishti and Khire, Ishan and Saha, Amit and Sahoo, Subramanyam and Hardy, Michael and Riegler, Michael Alexander and Manghnani, Kabir and Lin, Michelle and Jiang, Yanan and Huang, Yilin and Yehudai, Asaf and Ji, Jessica and Hofmann, Aris and Akhtar, Mubashara and Moniz, Nuno and Jernite, Yacine and Biderman, Stella and Talat, Zeerak and Koyejo, Sanmi and Kochenderfer, Mykel and Solaiman, Irene},
  journal      = {arXiv preprint arXiv:2606.09809},
  year         = {2026},
  url          = {https://arxiv.org/abs/2606.09809}
}

Every Eval Ever (EEE) is a sister EvalEval project and one of the data sources that powers Evaluation Cards — please show it some love and cite it too. 💜

BibTeX · Every Eval Ever
@misc{evaleval2026everyevalever,
  title   = {Every Eval Ever: Toward a Common Language for AI Eval Reporting},
  author  = {Jan Batzner and Leshem Choshen and Avijit Ghosh and Sree Harsha Nelaturu and Anastassia Kornilova and Damian Stachura and Yifan Mai and Asaf Yehudai and Anka Reuel and Irene Solaiman and Stella Biderman},
  year    = {2026},
  month   = {February},
  url     = {https://evalevalai.com/infrastructure/2026/02/17/everyevalever-launch/},
  note    = {Blog Post, EvalEval Coalition}
}
Back to homeAbout Evaluation CardsRead the paper