DCAgent2/eval-allenai_SERA-8B_DCAgent2_swebench-verified-random-100-folders Viewer • Updated 11 days ago • 4.92k • 34 • 1
BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling Paper • 2606.09707 • Published 23 days ago • 8
LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards Paper • 2605.31584 • Published May 29 • 43
OpenComputer: Verifiable Software Worlds for Computer-Use Agents Paper • 2605.19769 • Published May 19 • 85
Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality? Paper • 2605.22109 • Published May 21 • 171
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise Paper • 2602.12783 • Published Feb 13 • 246
SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models Paper • 2603.16859 • Published Mar 17 • 248