Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
anujga 's Collections
rl-papers
Multi-lingual
Retrieval
Special
Aggregates
PT
Persona
Pt-classify
Sft
O1
Rl
Programming
Benchmark
Architecture
Datasets
Theory
agent
data/tool
data/vision
chemistry

Rl

updated Mar 18
Upvote
-

  • RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment

    Paper • 2307.12950 • Published Jul 24, 2023 • 10

  • HumanLLMs/Human-Like-DPO-Dataset

    Viewer • Updated Jan 12 • 10.9k • 1.45k • 222

  • sam-paech/gutenberg3-generalfiction-scifi-fantasy-romance-adventure-dpo

    Viewer • Updated Oct 23, 2024 • 5.65k • 85 • 24

  • RLHFlow/Deepseek-PRM-Data

    Viewer • Updated Nov 9, 2024 • 253k • 114 • 13

  • RLHFlow/DS-and-Mistral-PRM-Data

    Viewer • Updated Nov 10, 2024 • 526k • 47

  • TIGER-Lab/WebInstruct-CFT

    Viewer • Updated Feb 2 • 654k • 243 • 51

  • deu05232/promptriever-ours2-filtered_FN

    Viewer • Updated Feb 10 • 1.31M • 43

  • argilla/distilabel-intel-orca-dpo-pairs

    Viewer • Updated Mar 19 • 12.9k • 1.87k • 173
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs