reproducing DeepSeek R1 Zero with Qwen2.5-0.5B on two 4090 GPUs
rasdani
rasdani
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 5 hours ago
PROMPTEVALS: A Dataset of Assertions and Guardrails for Custom
Production Large Language Model Pipelines
liked
a dataset
about 5 hours ago
reyavir/PromptEvals
updated
a dataset
about 16 hours ago
rasdani/swe-fixer-debug-DeepSeek-R1-verified
Organizations
Collections
1
Papers
1
models
22

rasdani/Qwen2.5-0.5B-simpleRL-Zoo-8192k
Updated
•
15

rasdani/Qwen2.5-0.5B-simpleRL-Zoo
Text Generation
•
Updated
•
7

rasdani/smolR1-Qwen2.5-0.5B
Text Generation
•
Updated
•
25

rasdani/Qwen2.5-0.5B-simpleRL-Zoo-no-KL
Updated

rasdani/Qwen2.5-0.5B-simpleRL-Zoo-3072k
Updated

rasdani/Qwen2.5-0.5B-simpleRL-Zoo-4096k
Updated

rasdani/Qwen2.5-0.5B-simpleRL-Zoo-2560k
Updated

rasdani/Qwen2.5-0.5B-simpleRL-Zoo-2048k
Updated

rasdani/Qwen2.5-0.5B-simpleRL-Zoo-first-try
Updated
•
1

rasdani/Qwen-1.5B-Distill-GRPO
Text Generation
•
Updated
•
4
datasets
94
rasdani/swe-fixer-debug-DeepSeek-R1-verified
Viewer
•
Updated
•
30
•
132
rasdani/swe-fixer-debug-Qwen3-235B-A22B-verified
Viewer
•
Updated
•
10
•
2
rasdani/swe-fixer-70k
Viewer
•
Updated
•
69.8k
•
101
rasdani/swe-fixer-debug-DeepSeek-R1
Viewer
•
Updated
•
8
•
100
rasdani/swe-fixer-debug-pi-format
Viewer
•
Updated
•
100
•
252
rasdani/reasoning-gym-dataset-debug-pi-format
Viewer
•
Updated
•
20k
•
46
rasdani/reasoning-gym-dataset-debug
Viewer
•
Updated
•
20k
•
34
rasdani/simplerl_qwen_level1to4
Viewer
•
Updated
•
8.14k
•
68
rasdani/smol-RLVR-IFeval
Viewer
•
Updated
•
15k
•
51
rasdani/countdown
Viewer
•
Updated
•
329k
•
18