-
-
-
-
-
-
Inference Providers
Active filters:
trl
dshin/flan-t5-ppo-user-e-allenai-prosocial-dialog
Reinforcement Learning
•
Updated
•
1
wengnews/tuning_llama_rl_checkpointsstep_9
Reinforcement Learning
•
Updated
eurus7/working
Reinforcement Learning
•
Updated
eurus7/ppo_trainer
Reinforcement Learning
•
Updated
eurus7/gpt2-imdb-pos-v2
Reinforcement Learning
•
Updated
zou00080/llama_PPO_pos_formal
Reinforcement Learning
•
Updated
•
3
zou00080/llama_PPO_pos_informal
Reinforcement Learning
•
Updated
•
1
zou00080/llama_PPO_neg_formal
Reinforcement Learning
•
Updated
•
1
zou00080/llama_PPO_neg_informal
Reinforcement Learning
•
Updated
•
1
aleph-null/thesis
rajpabari/gflownets-rlhf
Reinforcement Learning
•
Updated
mariosirt/EleutherAI-gpt-neo-125m-detoxified
Reinforcement Learning
•
Updated
•
1
mariosirt/EleutherAI-gpt-neo-125m-detoxified-perspective
Reinforcement Learning
•
Updated
•
2
mariosirt/gpt2-detoxified
Reinforcement Learning
•
Updated
•
14
merve/peft-copy-test
Text Generation
•
Updated
•
3
renyulin/gptneo125m-detoxify-ppo-0.05
Reinforcement Learning
•
Updated
•
1
renyulin/llama-7b-es-ppo-adpater
Reinforcement Learning
•
Updated
renyulin/gpt-neo-1.3b-es-rlhf-step2500-peft
Reinforcement Learning
•
Updated
Evan-Lin/Bart-RL-little
Reinforcement Learning
•
Updated
•
13
linlinlin/ppo_model
Reinforcement Learning
•
Updated
Evan-Lin/Bart-RL-little-entailment
Reinforcement Learning
•
Updated
•
13
Evan-Lin/Bart-RL-many-entailment-attractive-keywordmax
Reinforcement Learning
•
Updated
•
12
nlp-lab-2023-seq2seq/R-best-fine-tuned-bart-base-full-ft-reward_short_sentences_and_words-2023-07-13T06-49-08
Reinforcement Learning
•
Updated
•
15
•
1
Evan-Lin/Bart-RL-many-entailment-attractive-epoch1
Reinforcement Learning
•
Updated
•
14
amirabdullah19852020/pythia_70m_ppo_imdb_sentiment
Reinforcement Learning
•
Updated
•
14
Evan-Lin/Bart-RL-many-keywordmax-entailment-attractive-reward1
Reinforcement Learning
•
Updated
•
12
Evan-Lin/Bart-RL-many-keywordmax-entailment-attractive-reward2
Reinforcement Learning
•
Updated
•
13
amirabdullah19852020/pythia_70m_ppo_imdb_sentiment_v2
Reinforcement Learning
•
Updated
•
13
Evan-Lin/Bart-RL-many-keywordmax-entailment-attractive-reward5
Reinforcement Learning
•
Updated
•
13
amirabdullah19852020/pythia_70m_ppo_imdb_sentiment_v3
Reinforcement Learning
•
Updated
•
10