Edit Models filters

Inference Providers

Nebius AI Studio

HF Inference API

Misc

Inference Endpoints

text-generation-inference

4-bit precision

Misc with no match

8-bit precision

text-embeddings-inference

Carbon Emissions

Mixture of Experts

Models

100

Full-text search

Active filters: GRPO

Jarrodbarnes/Cortex-1-mini

Text Generation • Updated Mar 13 • 22 • 2

abdulsamad/SmolGRPO-135M

Text Generation • Updated Apr 6 • 53

tobrun/SmolLM2-135M-GRPO

Text Generation • Updated Mar 15 • 8

stranger47/Qwen2.5-3B-Instruct-GRPO-NuminaMath-TIR

Text Generation • Updated Mar 16 • 9

TharunSivamani/SmolGRPO-135M

Text Generation • Updated Mar 16 • 11

frascuchon/SmolGRPO-135M

Text Generation • Updated Mar 17 • 11

bhaveshgoel07/SmolGRPO-135M

Arushhh/SmolGRPO-135M

Text Generation • Updated Mar 24 • 10

hiroyuki0823/SakanaAI-TinySwallow-1.5B-Instruct-GRPO-lora

Updated Mar 24 • 5

ykarout/Phi4-ThinkMode-fp16

Text Generation • Updated Mar 27 • 24

mradermacher/Phi4-ThinkMode-fp16-GGUF

Updated Mar 27 • 12

czuo03/SmolGRPO-135M

Text Generation • Updated Mar 28 • 14

NuclearAi/Nuke_X_Gemma3_1B_Reasoner_Testing

Text Generation • Updated Apr 1 • 13 • 2

mradermacher/Nuke_X_Gemma3_1B_Reasoner_Testing-GGUF

Updated Apr 2 • 95 • 1

mradermacher/Nuke_X_Gemma3_1B_Reasoner_Testing-i1-GGUF

Updated Apr 2 • 205 • 1

opria123/SmolGRPO-135M

Text Generation • Updated Apr 6 • 13

NuclearAi/Nuke_X_Gemma3_1B_Reasoner_v1.0

Text Generation • Updated Apr 9 • 14 • 1

alonsosilva/SmolGRPO-135M

Text Generation • Updated Apr 8 • 12

VaidikML0508/Shark-Tank-Offer-Evaluator-llama3.2-3B-Instruct-GRPO-16bits-V1

Text Generation • Updated Apr 22 • 12

mradermacher/Shark-Tank-Offer-Evaluator-llama3.2-3B-Instruct-GRPO-16bits-V1-GGUF

Updated Apr 23 • 120

alfredcs/torchrun-gemma-3-12b-grpo-firstaid-merged

Image-Text-to-Text • Updated about 19 hours ago • 33

alfredcs/gemma-3-12b-grpo-firstaid

garethpaul/SmolGRPO-135M

Text Generation • Updated May 8 • 6

Thabet/SmolGRPO-135M-learning

Text Generation • Updated about 1 month ago • 8

jcollado/SmolGRPO-135M

Text Generation • Updated 27 days ago • 39

Brianpuz/SmolGRPO-135M

Text Generation • Updated 22 days ago • 6

yigitkucuk/tint-interact-sft-grpo

Text Generation • Updated 22 days ago • 45

koochikoo25/SmolGRPO-135M

Text Generation • Updated 21 days ago • 20

jackle33/SmolGRPO-135M

Text Generation • Updated 19 days ago • 15

TianheWu/VisualQuality-R1-7B

Reinforcement Learning • Updated 16 days ago • 157 • 1