-
-
-
-
-
-
Inference Providers
Active filters:
trl
mradermacher/Qwen2.5-7B-Instruct-ultrafeedback-iterdpo-iter1-GGUF
8B
•
Updated
•
1
mradermacher/Qwen2.5-7B-Instruct-ultrafeedback-nspin-iter1-GGUF
8B
•
Updated
•
1
openfree/Gemma-3-R1984-1B-0613
1.0B
•
Updated
•
2
lewtun/dummy-trl-model
Reinforcement Learning
•
Updated
•
24
•
1
ybelkada/gpt-neo-125m-detox
Reinforcement Learning
•
Updated
•
874
ybelkada/gpt-neo-125m-detoxified-long-context
Reinforcement Learning
•
Updated
•
29
dshin/flan-t5-ppo
Reinforcement Learning
•
Updated
•
36
SummerSigh/T5-Base-Rule-Of-Thumb-RM
Reinforcement Learning
•
Updated
•
14
dshin/flan-t5-ppo-testing
Reinforcement Learning
•
Updated
•
33
•
1
SummerSigh/T5-Base-EvilPrompterRM
Reinforcement Learning
•
0.2B
•
Updated
•
18
dshin/flan-t5-ppo-testing-violation
Reinforcement Learning
•
Updated
•
14
dshin/flan-t5-ppo-user-b
Reinforcement Learning
•
Updated
•
69
dshin/flan-t5-ppo-user-h-use-violation
Reinforcement Learning
•
Updated
•
14
dshin/flan-t5-ppo-user-f-use-violation
Reinforcement Learning
•
Updated
•
81
dshin/flan-t5-ppo-user-e-use-violation
Reinforcement Learning
•
Updated
•
16
dshin/flan-t5-ppo-user-a-use-violation
Reinforcement Learning
•
Updated
•
38
dshin/flan-t5-ppo-user-h-batch-size-8-epoch-0
Reinforcement Learning
•
Updated
•
33
dshin/flan-t5-ppo-user-e-batch-size-8-epoch-0
Reinforcement Learning
•
Updated
•
29
dshin/flan-t5-ppo-user-h-batch-size-8-epoch-0-use-violation
Reinforcement Learning
•
Updated
•
16
dshin/flan-t5-ppo-user-a-batch-size-8-epoch-0
Reinforcement Learning
•
Updated
•
18
dshin/flan-t5-ppo-user-f-batch-size-8-epoch-0
Reinforcement Learning
•
Updated
•
17
dshin/flan-t5-ppo-user-f-batch-size-8-epoch-0-use-violation
Reinforcement Learning
•
Updated
•
13
dshin/flan-t5-ppo-user-e-batch-size-8-epoch-0-use-violation
Reinforcement Learning
•
Updated
•
47
dshin/flan-t5-ppo-user-h-batch-size-8-epoch-1
Reinforcement Learning
•
Updated
•
15
dshin/flan-t5-ppo-user-e-batch-size-8-epoch-1
Reinforcement Learning
•
Updated
•
75
dshin/flan-t5-ppo-user-a-batch-size-8-epoch-1
Reinforcement Learning
•
Updated
•
15
dshin/flan-t5-ppo-user-f-batch-size-8-epoch-1-use-violation
Reinforcement Learning
•
Updated
•
15
dshin/flan-t5-ppo-user-f-batch-size-8-epoch-1
Reinforcement Learning
•
Updated
•
18
dshin/flan-t5-ppo-user-h-batch-size-8-epoch-1-use-violation
Reinforcement Learning
•
Updated
•
15
dshin/flan-t5-ppo-user-e-batch-size-8-epoch-1-use-violation
Reinforcement Learning
•
Updated
•
14