Rl - a anujga Collection

anujga 's Collections

Special

PT

Persona

Sft

O1

Rl

Theory

agent

Rl

updated Mar 18

RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment

Paper • 2307.12950 • Published Jul 24, 2023 • 10
HumanLLMs/Human-Like-DPO-Dataset

Viewer • Updated Jan 12 • 10.9k • 1.45k • 222
sam-paech/gutenberg3-generalfiction-scifi-fantasy-romance-adventure-dpo

Viewer • Updated Oct 23, 2024 • 5.65k • 85 • 24
RLHFlow/Deepseek-PRM-Data

Viewer • Updated Nov 9, 2024 • 253k • 114 • 13
RLHFlow/DS-and-Mistral-PRM-Data

Viewer • Updated Nov 10, 2024 • 526k • 47
TIGER-Lab/WebInstruct-CFT

Viewer • Updated Feb 2 • 654k • 243 • 51
deu05232/promptriever-ours2-filtered_FN

Viewer • Updated Feb 10 • 1.31M • 43
argilla/distilabel-intel-orca-dpo-pairs

Viewer • Updated Mar 19 • 12.9k • 1.87k • 173