Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Sarthak Thakur's picture
5 18 30

Sarthak Thakur

sarthak247
21world's profile picture
·

AI & ML interests

None yet

Organizations

Hugging Face for Computer Vision's profile picture Hugging Face Discord Community's profile picture

sarthak247 's collections 2

Gemma-3-1B-GRPO
Gemma 3 (1B) model with GRPO training
  • sarthak247/gemma-3-1B-GRPO-Adapter

    Updated Apr 7
  • sarthak247/gemma-3-1B-GRPO-float16

    Text Generation • 1.0B • Updated Apr 7 • 18
Qwen2.5-3B-GRPO
Trained with unsloth on just 250 steps (resource constraints) on GSM8K to add reasoning abilities to Qwen2.5-3B (smaller model because resources)
  • sarthak247/qwen2.5-grpo-gsm8k-250steps-fp16

    Text Generation • Updated Feb 24 • 6
  • sarthak247/qwen2.5-grpo-gsm8k-250steps-lora-adapters

    Updated Feb 24
  • sarthak247/qwen2.5-grpo-gsm8k-250steps-gguf

    3B • Updated Feb 24 • 14
Gemma-3-1B-GRPO
Gemma 3 (1B) model with GRPO training
  • sarthak247/gemma-3-1B-GRPO-Adapter

    Updated Apr 7
  • sarthak247/gemma-3-1B-GRPO-float16

    Text Generation • 1.0B • Updated Apr 7 • 18
Qwen2.5-3B-GRPO
Trained with unsloth on just 250 steps (resource constraints) on GSM8K to add reasoning abilities to Qwen2.5-3B (smaller model because resources)
  • sarthak247/qwen2.5-grpo-gsm8k-250steps-fp16

    Text Generation • Updated Feb 24 • 6
  • sarthak247/qwen2.5-grpo-gsm8k-250steps-lora-adapters

    Updated Feb 24
  • sarthak247/qwen2.5-grpo-gsm8k-250steps-gguf

    3B • Updated Feb 24 • 14
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs