Gemma-3-1B-GRPO Gemma 3 (1B) model with GRPO training sarthak247/gemma-3-1B-GRPO-Adapter Updated Apr 7 sarthak247/gemma-3-1B-GRPO-float16 Text Generation • 1.0B • Updated Apr 7 • 18
Qwen2.5-3B-GRPO Trained with unsloth on just 250 steps (resource constraints) on GSM8K to add reasoning abilities to Qwen2.5-3B (smaller model because resources) sarthak247/qwen2.5-grpo-gsm8k-250steps-fp16 Text Generation • Updated Feb 24 • 6 sarthak247/qwen2.5-grpo-gsm8k-250steps-lora-adapters Updated Feb 24 sarthak247/qwen2.5-grpo-gsm8k-250steps-gguf 3B • Updated Feb 24 • 14
Gemma-3-1B-GRPO Gemma 3 (1B) model with GRPO training sarthak247/gemma-3-1B-GRPO-Adapter Updated Apr 7 sarthak247/gemma-3-1B-GRPO-float16 Text Generation • 1.0B • Updated Apr 7 • 18
Qwen2.5-3B-GRPO Trained with unsloth on just 250 steps (resource constraints) on GSM8K to add reasoning abilities to Qwen2.5-3B (smaller model because resources) sarthak247/qwen2.5-grpo-gsm8k-250steps-fp16 Text Generation • Updated Feb 24 • 6 sarthak247/qwen2.5-grpo-gsm8k-250steps-lora-adapters Updated Feb 24 sarthak247/qwen2.5-grpo-gsm8k-250steps-gguf 3B • Updated Feb 24 • 14