·
AI & ML interests
LLMs
Organizations
None yet
models 739
ZHLiu627/sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-nothink-30step
Updated
ZHLiu627/sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-nothink-15step
Updated
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-150step
Updated
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-135step
Updated
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-120step
Updated
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-105step
Updated
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-90step
Updated
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-75step
Updated
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-60step
Updated
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-45step
Updated
datasets 20
Updated • 212
Updated • 103
ZHLiu627/warm_start_sft_v2
Preview
• Updated • 2
ZHLiu627/sciworld_dataset
Preview
• Updated • 2
ZHLiu627/updated_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212__self_correction_iter1_v1
Viewer
• Updated • 29.3k • 7
ZHLiu627/dataset_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212_2_global_step_70filtered_v1_v1
Viewer
• Updated • 29.3k • 4
• 1
ZHLiu627/dataset_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212_2_global_step_70filtered_v1
Viewer
• Updated • 29.3k • 24
ZHLiu627/updated-code-qwen7-edufiltered
Viewer
• Updated • 43k • 2
ZHLiu627/updated-code-qwen7-edu
Viewer
• Updated • 75.6k • 14
ZHLiu627/updated_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212__self_correction_iter1_v2filtered
Viewer
• Updated • 28.9k • 13