2 32 2

Runpeng Dai

Leo-Dai

AI & ML interests

None yet

Recent Activity

authored a paper about 8 hours ago

Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling

upvoted a paper 1 day ago

Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling

authored a paper 24 days ago

DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification

View all activity

Organizations

Collections 2

Papers 11

models 17

datasets 5

Leo-Dai/APO_AIME24

Viewer • Updated Mar 2 • 30 • 9

Leo-Dai/APO_AIME25

Viewer • Updated Mar 2 • 30 • 12

Leo-Dai/APO_AMC23

Viewer • Updated Mar 2 • 40 • 13

Leo-Dai/APO_combine

Viewer • Updated May 30, 2025 • 1.92k • 22

Leo-Dai/dapo-math-17k_dedup

Viewer • Updated May 29, 2025 • 17.9k • 41

Runpeng Dai

AI & ML interests

Recent Activity

Organizations

Collections 2

Offline Actor-Critic Reinforcement Learning Scales to Large Models

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

kiendt/llama3-8b-math

aaditya/Llama3-OpenBioLLM-8B

ruslanmv/Medical-Llama3-8B

Offline Actor-Critic Reinforcement Learning Scales to Large Models

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

kiendt/llama3-8b-math

aaditya/Llama3-OpenBioLLM-8B

ruslanmv/Medical-Llama3-8B

Papers 11

models 17

Leo-Dai/PPO_BL_250_critic

Leo-Dai/PPO_BL_200_critic

Leo-Dai/PPO_BL_300_actor

Leo-Dai/PPO_BL_250_actor

Leo-Dai/PPO_BL_300_critic

Leo-Dai/GRPO_BL_40

Leo-Dai/GRPO_BL_30

Leo-Dai/GRPO_BL_20

Leo-Dai/GRPO_BL_400

Leo-Dai/GRPO_BL_10

datasets 5

Leo-Dai/APO_AIME24

Leo-Dai/APO_AIME25

Leo-Dai/APO_AMC23

Leo-Dai/APO_combine

Leo-Dai/dapo-math-17k_dedup

Runpeng Dai

AI & ML interests

Recent Activity

Organizations

Collections 2

Papers 11

models 17 Sort: Recently updated

datasets 5 Sort: Recently updated

models 17

datasets 5