Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
trl-lib 's Collections
Preference datasets
Stepwise supervision datasets
Prompt-completion datasets
Prompt-only datasets
Unpaired preference datasets
Comparing DPO with IPO and KTO
Online-DPO

Online-DPO

updated Jan 8
Upvote
1

  • trl-lib/pythia-1b-deduped-tldr-online-dpo

    1B • Updated Aug 2, 2024 • 7

  • trl-lib/pythia-1b-deduped-tldr-sft

    1B • Updated Aug 2, 2024 • 5.47k

  • trl-lib/pythia-6.9b-deduped-tldr-online-dpo

    7B • Updated Aug 2, 2024 • 4

  • trl-lib/pythia-2.8b-deduped-tldr-sft

    Updated Aug 2, 2024 • 1.02k

  • trl-lib/pythia-2.8b-deduped-tldr-rm

    Updated Aug 2, 2024 • 1.89k

  • trl-lib/pythia-6.9b-deduped-tldr-sft

    Updated Aug 2, 2024 • 61

  • trl-lib/pythia-6.9b-deduped-tldr-rm

    Updated Aug 2, 2024 • 28

  • trl-lib/pythia-1b-deduped-tldr-offline-dpo

    Text Generation • 1B • Updated Aug 2, 2024 • 6

  • trl-lib/pythia-2.8b-deduped-tldr-offline-dpo

    Text Generation • 3B • Updated Aug 2, 2024 • 3

  • trl-lib/pythia-6.9b-deduped-tldr-offline-dpo

    Text Generation • 7B • Updated Aug 2, 2024 • 4

  • trl-lib/pythia-2.8b-deduped-tldr-online-dpo

    Text Generation • 3B • Updated Aug 2, 2024 • 7
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs