Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Edit Datasets filters
Main
Tasks
Libraries
Languages
Licenses
Other
Modalities
3D
Audio
Document
Geospatial
Image
Tabular
Text
Time-series
Video
Size (rows)
Reset Size
< 1K
> 1T
Format
json
csv
parquet
optimized-parquet
imagefolder
soundfolder
webdataset
text
arrow
Type
Benchmark
Traces
Apply filters
Datasets
29
Full-text search
Edit filters
Sort: Trending
Active filters:
official
Clear all
Benchmark datasets
Live leaderboards rank Hub models on evals like SWE-bench, AIME 2026 and HLE.
openai/gsm8k
Benchmark
•
Updated
Mar 23
•
17.6k
•
966k
•
1.35k
Idavidrein/gpqa
Benchmark
•
Updated
Mar 5
•
1.25k
•
140k
•
448
allenai/olmOCR-bench
Benchmark
•
Updated
Feb 19
•
5.3k
•
218
SWE-bench/SWE-bench_Verified
Benchmark
•
Updated
Feb 27
•
500
•
74.7k
•
93
harborframework/terminal-bench-2.0
Benchmark
•
Updated
Apr 24
•
13k
•
38
llamaindex/ParseBench
Benchmark
•
Updated
Apr 19
•
169k
•
50.3k
•
88
ScaleAI/SWE-bench_Pro
Benchmark
•
Updated
Feb 23
•
731
•
54.4k
•
119
mercor/apex-agents
Benchmark
•
Updated
Mar 3
•
480
•
25.8k
•
127
TIGER-Lab/MMLU-Pro
Benchmark
•
Updated
29 days ago
•
12.1k
•
161k
•
478
hf-audio/open-asr-leaderboard
Benchmark
•
Updated
4 days ago
•
99k
•
30k
•
38
MathArena/aime_2026
Benchmark
•
Updated
15 days ago
•
30
•
15k
•
43
claw-eval/Claw-Eval
Benchmark
•
Updated
22 days ago
•
4.79k
•
27
cais/hle
Benchmark
•
Updated
Jan 20
•
2.5k
•
41.3k
•
812
likaixin/ScreenSpot-Pro
Benchmark
•
Updated
Mar 18
•
10.3k
•
65
nvidia/compute-eval
Benchmark
•
Updated
Apr 27
•
2.46k
•
5.96k
•
25
FutureMa/EvasionBench
Benchmark
•
Updated
Feb 19
•
16.7k
•
675
•
110
mteb/BRIGHT
Benchmark
•
Updated
Apr 2
•
1.35M
•
4.43k
•
3
Delores-Lin/MDPBench
Benchmark
•
Updated
Apr 26
•
8.73k
•
20
mteb/arguana
Benchmark
•
Updated
Apr 17
•
11.5k
•
20.9k
•
5
MMMU/MMMU_Pro
Benchmark
•
Updated
about 18 hours ago
•
5.19k
•
41k
•
58
LEXam-Benchmark/LEXam
Benchmark
•
Updated
9 days ago
•
7.54k
•
1.52k
•
42
mercor/ACE
Benchmark
•
Updated
Apr 13
•
592
•
226
•
5
mercor/APEX-v1-extended
Benchmark
•
Updated
Apr 22
•
100
•
2.59k
•
16
VLABench/vlabench_primitive_ft_lerobot_video
Benchmark
•
Updated
Apr 23
•
575k
•
4.85k
•
1
tiiuae/PBench
Benchmark
•
Updated
19 days ago
•
6.34k
•
2.1k
•
15
MathArena/hmmt_feb_2026
Benchmark
•
Updated
15 days ago
•
33
•
3.89k
•
4
collinear-ai/yc-bench
Benchmark
•
Updated
Mar 23
•
149
•
18
internlm/WildClawBench
Benchmark
•
Updated
16 days ago
•
9.88k
•
59
MME-Benchmarks/Video-MME-v2
Benchmark
•
Updated
8 days ago
•
3.2k
•
5.23k
•
41