LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV Paper • 2605.26244 • Published 15 days ago • 38
ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning Paper • 2605.20342 • Published 21 days ago • 34
LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning Paper • 2605.22012 • Published 19 days ago • 46
Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos Paper • 2605.18984 • Published 22 days ago • 22
Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling Paper • 2605.13062 • Published 27 days ago • 33
WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors Paper • 2605.10434 • Published 29 days ago • 29
Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization Paper • 2605.10780 • Published 28 days ago • 33
Self-Adversarial One Step Generation via Condition Shifting Paper • 2604.12322 • Published Apr 14 • 13
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models Paper • 2604.04707 • Published Apr 6 • 203
MultiBind: A Benchmark for Attribute Misbinding in Multi-Subject Generation Paper • 2603.21937 • Published Mar 23 • 7
AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents Paper • 2603.18429 • Published Mar 19 • 26
LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs Paper • 2603.19217 • Published Mar 19 • 29
Imagination Helps Visual Reasoning, But Not Yet in Latent Space Paper • 2602.22766 • Published Feb 26 • 44
TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions Paper • 2602.08711 • Published Feb 9 • 29
DLLM-Searcher: Adapting Diffusion Large Language Model for Search Agents Paper • 2602.07035 • Published Feb 3 • 31