Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Sarthak Thakur's picture

Sarthak Thakur

sarthak247

21world's profile picture

hmb's profile picture

·

AI & ML interests

None yet

Organizations

sarthak247 's collections 2

Gemma-3-1B-GRPO

Gemma 3 (1B) model with GRPO training

sarthak247/gemma-3-1B-GRPO-Adapter

Updated Apr 7
sarthak247/gemma-3-1B-GRPO-float16

Text Generation • 1.0B • Updated Apr 7

Qwen2.5-3B-GRPO

Trained with unsloth on just 250 steps (resource constraints) on GSM8K to add reasoning abilities to Qwen2.5-3B (smaller model because resources)

sarthak247/qwen2.5-grpo-gsm8k-250steps-fp16

Text Generation • Updated Feb 24
sarthak247/qwen2.5-grpo-gsm8k-250steps-lora-adapters

Updated Feb 24
sarthak247/qwen2.5-grpo-gsm8k-250steps-gguf

3B • Updated Feb 24 • 4

Gemma-3-1B-GRPO

Gemma 3 (1B) model with GRPO training

sarthak247/gemma-3-1B-GRPO-Adapter

Updated Apr 7
sarthak247/gemma-3-1B-GRPO-float16

Text Generation • 1.0B • Updated Apr 7

Qwen2.5-3B-GRPO

Trained with unsloth on just 250 steps (resource constraints) on GSM8K to add reasoning abilities to Qwen2.5-3B (smaller model because resources)

sarthak247/qwen2.5-grpo-gsm8k-250steps-fp16

Text Generation • Updated Feb 24
sarthak247/qwen2.5-grpo-gsm8k-250steps-lora-adapters

Updated Feb 24
sarthak247/qwen2.5-grpo-gsm8k-250steps-gguf

3B • Updated Feb 24 • 4

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs