Gemma 3 (1B) model with GRPO training
Sarthak Thakur
sarthak247
AI & ML interests
None yet
Recent Activity
liked
a model
1 day ago
HiDream-ai/HiDream-I1-Dev
upvoted
a
paper
1 day ago
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training
Tokens
upvoted
a
paper
1 day ago
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
Organizations
models
7

sarthak247/gemma-3-1B-GRPO-float16
Text Generation
•
Updated
•
1

sarthak247/gemma-3-1B-GRPO-Adapter
Updated

sarthak247/Wan2.1-T2V-1.3B-nf4
Text-to-Video
•
Updated
•
69
•
3

sarthak247/qwen2.5-grpo-gsm8k-250steps-gguf
Updated
•
50

sarthak247/qwen2.5-grpo-gsm8k-250steps-lora-adapters
Updated

sarthak247/qwen2.5-grpo-gsm8k-250steps-fp16
Text Generation
•
Updated
•
6

sarthak247/codellama-7b-humaneval-java-fim
Updated
•
1