Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
Edit Models filters
Tasks
Libraries
Datasets
Languages
Licenses
Other
1
Inference Providers
Select all
Fireworks
SambaNova
Hyperbolic
Cerebras
fal
Replicate
Cohere
Novita
Nebius AI Studio
Together AI
HF Inference API
Misc
Reset Misc
GRPO
Inference Endpoints
text-generation-inference
AutoTrain Compatible
Merge
4-bit precision
custom_code
Misc with no match
Eval Results
8-bit precision
text-embeddings-inference
Carbon Emissions
Mixture of Experts
Apply filters
Models
88
Full-text search
Edit filters
Sort: Trending
Active filters:
GRPO
Clear all
t2190/SmolGRPO-135M
Text Generation
•
Updated
Mar 6
t2190/GRPO_1
Text Generation
•
Updated
Mar 12
kaweizhenpi/SmolGRPO-135M
Text Generation
•
Updated
Mar 7
•
1
Shumatsurontek/SmolGRPO-135M
Text Generation
•
Updated
Mar 9
•
1
alperenyildiz/SmolGRPO-135M
Text Generation
•
Updated
Mar 11
•
2
alperenyildiz/SmolGRPO-135M-Q4_K_M-GGUF
Updated
Mar 11
•
4
TheMelonGod/Captain-Eris-BMO_Violent-GRPO-v0.420-exl2
Text Generation
•
Updated
Mar 13
•
17
alperenyildiz/SmolGRPO_vuln
Text Generation
•
Updated
Mar 11
alperenyildiz/SmolGRPO_vuln-Q4_K_M-GGUF
Updated
Mar 12
•
1
alperenyildiz/LLamaGRPO_vuln
Text Generation
•
Updated
Mar 12
alperenyildiz/LLamaGRPO_vuln2
Text Generation
•
Updated
Mar 14
•
1
abdulsamad/SmolGRPO-135M
Text Generation
•
Updated
10 days ago
•
8
tobrun/SmolLM2-135M-GRPO
Text Generation
•
Updated
Mar 15
•
1
stranger47/Qwen2.5-3B-Instruct-GRPO-NuminaMath-TIR
Text Generation
•
Updated
Mar 16
•
1
TharunSivamani/SmolGRPO-135M
Text Generation
•
Updated
Mar 16
•
2
alperenyildiz/LLamaGRPO_vuln_full_decay
Text Generation
•
Updated
12 days ago
•
4
alperenyildiz/LLamaFinetune
Text Generation
•
Updated
30 days ago
•
84
frascuchon/SmolGRPO-135M
Text Generation
•
Updated
30 days ago
•
1
bhaveshgoel07/SmolGRPO-135M
Updated
29 days ago
saracandu/SmolGRPO-135M
Text Generation
•
Updated
27 days ago
•
1
Arushhh/SmolGRPO-135M
Text Generation
•
Updated
23 days ago
•
38
hiroyuki0823/SakanaAI-TinySwallow-1.5B-Instruct-GRPO-lora
Updated
23 days ago
•
11
ykarout/Phi4-ThinkMode-fp16
Text Generation
•
Updated
20 days ago
•
9
mradermacher/Phi4-ThinkMode-fp16-GGUF
Updated
20 days ago
•
294
czuo03/SmolGRPO-135M
Text Generation
•
Updated
19 days ago
•
1
opria123/SmolGRPO-135M
Text Generation
•
Updated
9 days ago
•
196
hfhgj/SmolGRPO-135M
Text Generation
•
Updated
9 days ago
•
3
alonsosilva/SmolGRPO-135M
Text Generation
•
Updated
8 days ago
•
6
Previous
1
2
3
Next