arxiv:2606.22868

MSU-Bench: Towards Speaker-Centric Understanding in Conversational Multi-Speaker Scenarios

Published on Jun 22

Authors:

Abstract

A diagnostic benchmark for multi-speaker conversational understanding is introduced, covering speaker-centric tasks and evaluating models' capabilities in speaker grounding and dialogue reasoning.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Spoken Language Understanding (SLU) is moving from task-specific pipelines toward large audio language models (LALMs) that generate natural-language responses. However, existing speech benchmarks mainly focus on single-speaker settings or isolated subtasks, leaving speaker-centric understanding in realistic multi-speaker conversations insufficiently evaluated. We introduce MSU-Bench, a diagnostic benchmark for multi-speaker conversational understanding, covering 16 speaker-centric tasks and 2,300 QA instances in a two-tier framework from speaker grounding to dialogue reasoning. We build a Gemini-assisted annotation and QA generation pipeline with human-in-the-loop verification, achieving high QA validity and strong agreement between human answers and verified labels. We further analyze speaker-referencing schemes and diagnostic error types to reveal bottlenecks in speaker grounding and reasoning. Experiments reveal clear gaps across model families, with closed-source systems leading overall but all models still facing challenges in complex speaker grounding and multi-speaker reasoning. The benchmark annotations, metadata, and evaluation scripts will be available at the GitHub repository: https://github.com/ASLP-lab/MSU-Bench.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.22868

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.22868 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.22868 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.