Jin Zhu's picture

1 13

Jin Zhu

mamba413

·

https://mamba413.github.io/

Mamba413

AI & ML interests

reinforcement learning

Recent Activity

liked a model 9 days ago

mistralai/Mistral-7B-v0.1

updated a dataset 25 days ago

mamba413/GenerateText_Qwen2.5-1.5B-Instruct_GRPO_HH_Seed1

published a dataset 29 days ago

mamba413/GenerateText_Qwen2.5-1.5B-Instruct_GRPO_HH_Seed1

View all activity

Organizations

None yet

models 10

mamba413/Qwen2.5-1.5B-PPO-DR-HH-Seed1

2B • Updated Mar 21 • 9

mamba413/Qwen2.5-1.5B-PPO-BENCH-HH-Seed1

2B • Updated Mar 21 • 9

mamba413/Qwen2.5-1.5B-Instruct-Reward-BENCH-HH-Seed1

2B • Updated Mar 21 • 7

mamba413/Qwen2.5-1.5B-Instruct-Reward-BENCH-HH-Seed0

mamba413/Qwen2.5-1.5B-Instruct-Reward-DR-HH-Seed0

mamba413/Qwen2-0.5B-Reward-DR-HH-Seed0

Text Classification • 0.5B • Updated Mar 19 • 9

mamba413/Qwen2.5-1.5B-Reward-DR-IMDB-Seed0

mamba413/Qwen2.5-1.5B-Reward-DR-SIMU-Seed0

mamba413/Qwen2-0.5B-Reward-DR-SIMU-Seed0

Text Classification • 0.5B • Updated Mar 16 • 8

mamba413/Qwen2-0.5B-Reward-DR-SIMU

Text Classification • 0.5B • Updated Mar 15 • 10

datasets 8

mamba413/GenerateText_Qwen2.5-1.5B-Instruct_GRPO_HH_Seed1

Viewer • Updated 25 days ago • 7.06k • 117

mamba413/GenerateText_HH_Seed1

Viewer • Updated Mar 25 • 11.8k • 26

mamba413/GenerateText_HH_Seed1_new

Viewer • Updated Mar 24 • 640 • 26

mamba413/RewardModel-BENCH-HH-Seed1

Viewer • Updated Mar 23 • 64 • 26

mamba413/RewardModel-DR-HH-Seed1

Viewer • Updated Mar 23 • 64 • 27

mamba413/train_data_imdb_simu_valid

Viewer • Updated Mar 16 • 48.1k • 5

mamba413/train_data_imdb_simu

Viewer • Updated Mar 15 • 48.1k • 21

mamba413/train_data_imdb

Viewer • Updated Mar 3 • 2 • 7