Loading evaluation record…
SWE-bench Verified Mini (MariusHobbhahn) — software_engineering · Evaluation Cards