Safe, or Simply Incapable? Rethinking Safety Evaluation for Phone-Use Agents Paper • 2605.07630 • Published May 8 • 1
Chronicles-OCR: A Cross-Temporal Perception Benchmark for the Evolutionary Trajectory of Chinese Characters Paper • 2605.11960 • Published May 12 • 1
ChartArena: Benchmarking Chart Parsing across Languages, Scenarios, and Formats Paper • 2606.01348 • Published 26 days ago • 2
ChartArena: Benchmarking Chart Parsing across Languages, Scenarios, and Formats Paper • 2606.01348 • Published 26 days ago • 2
ChartArena: Benchmarking Chart Parsing across Languages, Scenarios, and Formats Paper • 2606.01348 • Published 26 days ago • 2
Safe, or Simply Incapable? Rethinking Safety Evaluation for Phone-Use Agents Paper • 2605.07630 • Published May 8 • 1
Chronicles-OCR: A Cross-Temporal Perception Benchmark for the Evolutionary Trajectory of Chinese Characters Paper • 2605.11960 • Published May 12 • 1
Uni-OPD: Unifying On-Policy Distillation with a Dual-Perspective Recipe Paper • 2605.03677 • Published May 5 • 27
Uni-OPD: Unifying On-Policy Distillation with a Dual-Perspective Recipe Paper • 2605.03677 • Published May 5 • 27