Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
felixbrock 's Collections
llm-pretraining
cv-gen-ai-3D
OCR
dataset
cv-eval
cv-pre-eval
llm-model
RLHF/RLAIF
cv-embedding
llm-system
llm-gen-ai-text
text-to-image-model
llm-performance
llm-monitoring
llm-agent
llm-doc-retrieval
privacy/security
cv-performance
llm-eval
selflearning

RLHF/RLAIF

updated Oct 27, 2023
Upvote
-

  • Efficient RLHF: Reducing the Memory Usage of PPO

    Paper • 2309.00754 • Published Sep 1, 2023 • 15

  • Statistical Rejection Sampling Improves Preference Optimization

    Paper • 2309.06657 • Published Sep 13, 2023 • 14

  • Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?

    Paper • 2309.07462 • Published Sep 14, 2023 • 5

  • Stabilizing RLHF through Advantage Model and Selective Rehearsal

    Paper • 2309.10202 • Published Sep 18, 2023 • 11

  • Aligning Large Multimodal Models with Factually Augmented RLHF

    Paper • 2309.14525 • Published Sep 25, 2023 • 30

  • Safe RLHF: Safe Reinforcement Learning from Human Feedback

    Paper • 2310.12773 • Published Oct 19, 2023 • 28

  • Contrastive Prefence Learning: Learning from Human Feedback without RL

    Paper • 2310.13639 • Published Oct 20, 2023 • 25
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs