stereoplegic 's Collections RL/Alignment
updated
Moral Foundations of Large Language Models
Paper
• 2310.15337
• Published
• 1
Specific versus General Principles for Constitutional AI
Paper
• 2310.13798
• Published
• 3
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper
• 2310.13639
• Published
• 25
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI
Feedback
Paper
• 2309.00267
• Published
• 53
Self-Alignment with Instruction Backtranslation
Paper
• 2308.06259
• Published
• 43
Deep Reinforcement Learning from Hierarchical Weak Preference Feedback
Paper
• 2309.02632
• Published
• 1
A General Theoretical Paradigm to Understand Learning from Human
Preferences
Paper
• 2310.12036
• Published
• 19
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for
LLM Alignment
Paper
• 2310.00212
• Published
• 2
Learning Optimal Advantage from Preferences and Mistaking it for Reward
Paper
• 2310.02456
• Published
• 1
Teaching Language Models to Self-Improve through Interactive
Demonstrations
Paper
• 2310.13522
• Published
• 12
Chain-of-Thought Reasoning is a Policy Improvement Operator
Paper
• 2309.08589
• Published
• 2
MAF: Multi-Aspect Feedback for Improving Reasoning in Large Language
Models
Paper
• 2310.12426
• Published
• 1
Enable Language Models to Implicitly Learn Self-Improvement From Data
Paper
• 2310.00898
• Published
• 24
Tuna: Instruction Tuning using Feedback from Large Language Models
Paper
• 2310.13385
• Published
• 10
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning
Paper
• 2310.11716
• Published
• 6
CITING: Large Language Models Create Curriculum for Instruction Tuning
Paper
• 2310.02527
• Published
• 3
Towards Understanding Sycophancy in Language Models
Paper
• 2310.13548
• Published
• 7
Peering Through Preferences: Unraveling Feedback Acquisition for
Aligning Large Language Models
Paper
• 2308.15812
• Published
• 1
SALMON: Self-Alignment with Principle-Following Reward Models
Paper
• 2310.05910
• Published
• 2
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper
• 2310.01377
• Published
• 5
Verbosity Bias in Preference Labeling by Large Language Models
Paper
• 2310.10076
• Published
• 2
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Paper
• 2310.12773
• Published
• 28
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Paper
• 2309.11235
• Published
• 15
DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller
Language Models
Paper
• 2310.05074
• Published
• 1
SELF: Language-Driven Self-Evolution for Large Language Model
Paper
• 2310.00533
• Published
• 2
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language
Feedback
Paper
• 2309.10691
• Published
• 4
A Long Way to Go: Investigating Length Correlations in RLHF
Paper
• 2310.03716
• Published
• 10
Efficient RLHF: Reducing the Memory Usage of PPO
Paper
• 2309.00754
• Published
• 16
Aligning Language Models with Offline Reinforcement Learning from Human
Feedback
Paper
• 2308.12050
• Published
• 1
Reward Model Ensembles Help Mitigate Overoptimization
Paper
• 2310.02743
• Published
• 1
SCREWS: A Modular Framework for Reasoning with Revisions
Paper
• 2309.13075
• Published
• 18
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle
Consistency
Paper
• 2310.03734
• Published
• 15
DSPy: Compiling Declarative Language Model Calls into Self-Improving
Pipelines
Paper
• 2310.03714
• Published
• 37
Aligning Text-to-Image Diffusion Models with Reward Backpropagation
Paper
• 2310.03739
• Published
• 22
Aligning Large Multimodal Models with Factually Augmented RLHF
Paper
• 2309.14525
• Published
• 32
Prometheus: Inducing Fine-grained Evaluation Capability in Language
Models
Paper
• 2310.08491
• Published
• 57
The Consensus Game: Language Model Generation via Equilibrium Search
Paper
• 2310.09139
• Published
• 14
Quality-Diversity through AI Feedback
Paper
• 2310.13032
• Published
• 1
Reward-Augmented Decoding: Efficient Controlled Text Generation With a
Unidirectional Reward Model
Paper
• 2310.09520
• Published
• 11
Controllable Text Generation with Residual Memory Transformer
Paper
• 2309.16231
• Published
• 1
Paper
• 2309.16609
• Published
• 38
Controlled Decoding from Language Models
Paper
• 2310.17022
• Published
• 15
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper
• 2310.17631
• Published
• 35
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language
Models' Alignment
Paper
• 2308.05374
• Published
• 31
SayCanPay: Heuristic Planning with Large Language Models using Learnable
Domain Knowledge
Paper
• 2308.12682
• Published
• 2
Grounding Large Language Models in Interactive Environments with Online
Reinforcement Learning
Paper
• 2302.02662
• Published
• 1
Natural Logic-guided Autoregressive Multi-hop Document Retrieval for
Fact Verification
Paper
• 2212.05276
• Published
• 1
Aligning Large Language Models with Human: A Survey
Paper
• 2307.12966
• Published
• 1
Zephyr: Direct Distillation of LM Alignment
Paper
• 2310.16944
• Published
• 123
Statistical Rejection Sampling Improves Preference Optimization
Paper
• 2309.06657
• Published
• 15
Principle-Driven Self-Alignment of Language Models from Scratch with
Minimal Human Supervision
Paper
• 2305.03047
• Published
• 1
CIEM: Contrastive Instruction Evaluation Method for Better Instruction
Tuning
Paper
• 2309.02301
• Published
• 1
TouchStone: Evaluating Vision-Language Models by Language Models
Paper
• 2308.16890
• Published
• 1
Are Large Language Model-based Evaluators the Solution to Scaling Up
Multilingual Evaluation?
Paper
• 2309.07462
• Published
• 4
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Paper
• 2309.10202
• Published
• 11
VIGC: Visual Instruction Generation and Correction
Paper
• 2308.12714
• Published
• 1
Improving Generalization of Alignment with Human Preferences through
Group Invariant Learning
Paper
• 2310.11971
• Published
• 1
Large Language Models as Optimizers
Paper
• 2309.03409
• Published
• 79
In-Context Alignment: Chat with Vanilla Language Models Before
Fine-Tuning
Paper
• 2308.04275
• Published
• 1
Direct Preference Optimization: Your Language Model is Secretly a Reward
Model
Paper
• 2305.18290
• Published
• 64
Beyond Reward: Offline Preference-guided Policy Optimization
Paper
• 2305.16217
• Published
• 1
Decentralized Policy Optimization
Paper
• 2211.03032
• Published
• 1
Large Language Models are not Fair Evaluators
Paper
• 2305.17926
• Published
• 1
SLiC-HF: Sequence Likelihood Calibration with Human Feedback
Paper
• 2305.10425
• Published
• 7
Don't throw away your value model! Making PPO even better via
Value-Guided Monte-Carlo Tree Search decoding
Paper
• 2309.15028
• Published
• 1
Improving Language Models with Advantage-based Offline Policy Gradients
Paper
• 2305.14718
• Published
• 2
Large Language Models Cannot Self-Correct Reasoning Yet
Paper
• 2310.01798
• Published
• 36
Q-Transformer: Scalable Offline Reinforcement Learning via
Autoregressive Q-Functions
Paper
• 2309.10150
• Published
• 26
CodeRL: Mastering Code Generation through Pretrained Models and Deep
Reinforcement Learning
Paper
• 2207.01780
• Published
• 1
RLTF: Reinforcement Learning from Unit Test Feedback
Paper
• 2307.04349
• Published
• 5
Fine-Grained Human Feedback Gives Better Rewards for Language Model
Training
Paper
• 2306.01693
• Published
• 3
Fine-tuning Language Models with Generative Adversarial Feedback
Paper
• 2305.06176
• Published
• 1
Aligning Large Language Models through Synthetic Feedback
Paper
• 2305.13735
• Published
• 1
Fine-Tuning Language Models with Advantage-Induced Policy Alignment
Paper
• 2306.02231
• Published
• 2
Reinforced Self-Training (ReST) for Language Modeling
Paper
• 2308.08998
• Published
• 3
SuperHF: Supervised Iterative Learning from Human Feedback
Paper
• 2310.16763
• Published
• 1
Split and Merge: Aligning Position Biases in Large Language Model based
Evaluators
Paper
• 2310.01432
• Published
• 1
Generative Judge for Evaluating Alignment
Paper
• 2310.05470
• Published
• 1
Personas as a Way to Model Truthfulness in Language Models
Paper
• 2310.18168
• Published
• 5
A Framework for Automated Measurement of Responsible AI Harms in
Generative AI Applications
Paper
• 2310.17750
• Published
• 9
RRAML: Reinforced Retrieval Augmented Machine Learning
Paper
• 2307.12798
• Published
• 1
Enabling Intelligent Interactions between an Agent and an LLM: A
Reinforcement Learning Approach
Paper
• 2306.03604
• Published
• 1
Conservative Dual Policy Optimization for Efficient Model-Based
Reinforcement Learning
Paper
• 2209.07676
• Published
• 2
Fine-tuning Aligned Language Models Compromises Safety, Even When Users
Do Not Intend To!
Paper
• 2310.03693
• Published
• 1
Evaluating the Moral Beliefs Encoded in LLMs
Paper
• 2307.14324
• Published
• 1
Moral Mimicry: Large Language Models Produce Moral Rationalizations
Tailored to Political Identity
Paper
• 2209.12106
• Published
• 1
Red-Teaming Large Language Models using Chain of Utterances for
Safety-Alignment
Paper
• 2308.09662
• Published
• 3
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large
Language Model Application
Paper
• 2305.17701
• Published
• 1
A Survey on Fairness in Large Language Models
Paper
• 2308.10149
• Published
• 1
Do LLMs Understand User Preferences? Evaluating LLMs On User Rating
Prediction
Paper
• 2305.06474
• Published
• 1
Of Models and Tin Men: A Behavioural Economics Study of Principal-Agent
Problems in AI Alignment using Large-Language Models
Paper
• 2307.11137
• Published
• 1
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
Paper
• 2310.20624
• Published
• 13
ExpeL: LLM Agents Are Experiential Learners
Paper
• 2308.10144
• Published
• 3
Sociotechnical Safety Evaluation of Generative AI Systems
Paper
• 2310.11986
• Published
Open Problems and Fundamental Limitations of Reinforcement Learning from
Human Feedback
Paper
• 2307.15217
• Published
• 39
Secrets of RLHF in Large Language Models Part I: PPO
Paper
• 2307.04964
• Published
• 30
Demystifying GPT Self-Repair for Code Generation
Paper
• 2306.09896
• Published
• 21
Training language models to follow instructions with human feedback
Paper
• 2203.02155
• Published
• 24
Okapi: Instruction-tuned Large Language Models in Multiple Languages
with Reinforcement Learning from Human Feedback
Paper
• 2307.16039
• Published
• 4
A Mixture-of-Expert Approach to RL-based Dialogue Management
Paper
• 2206.00059
• Published
• 1
"Pick-and-Pass" as a Hat-Trick Class for First-Principle Memory,
Generalizability, and Interpretability Benchmarks
Paper
• 2310.20654
• Published
• 1
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in
Zero-Shot Reasoning
Paper
• 2212.08061
• Published
• 1
MUTEX: Learning Unified Policies from Multimodal Task Specifications
Paper
• 2309.14320
• Published
• 1
Lifelong Inverse Reinforcement Learning
Paper
• 2207.00461
• Published
• 1
Improving Code Generation by Training with Natural Language Feedback
Paper
• 2303.16749
• Published
• 1
Tailoring Self-Rationalizers with Multi-Reward Distillation
Paper
• 2311.02805
• Published
• 6
Unleashing the Power of Pre-trained Language Models for Offline
Reinforcement Learning
Paper
• 2310.20587
• Published
• 18
B-Coder: Value-Based Deep Reinforcement Learning for Program
Synthesis
Paper
• 2310.03173
• Published
• 1
Towards Anytime Fine-tuning: Continually Pre-trained Language Models
with Hypernetwork Prompt
Paper
• 2310.13024
• Published
• 1
Multi-Task Recommendations with Reinforcement Learning
Paper
• 2302.03328
• Published
• 1
Curriculum-based Asymmetric Multi-task Reinforcement Learning
Paper
• 2211.03352
• Published
• 1
Efficient Training of Multi-task Combinarotial Neural Solver with
Multi-armed Bandits
Paper
• 2305.06361
• Published
• 1
Rethinking Decision Transformer via Hierarchical Reinforcement Learning
Paper
• 2311.00267
• Published
• 1
Pre-training with Synthetic Data Helps Offline Reinforcement Learning
Paper
• 2310.00771
• Published
• 2
Guiding Pretraining in Reinforcement Learning with Large Language Models
Paper
• 2302.06692
• Published
• 1
Large Language Model Alignment: A Survey
Paper
• 2309.15025
• Published
• 2
Making Large Language Models Better Reasoners with Alignment
Paper
• 2309.02144
• Published
• 2
Pretraining in Deep Reinforcement Learning: A Survey
Paper
• 2211.03959
• Published
• 1
Reinforcement Learning for Generative AI: A Survey
Paper
• 2308.14328
• Published
• 1
d3rlpy: An Offline Deep Reinforcement Learning Library
Paper
• 2111.03788
• Published
• 1
Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory
Weighting
Paper
• 2306.13085
• Published
• 1
Efficient Online Reinforcement Learning with Offline Data
Paper
• 2302.02948
• Published
• 2
Improving Offline-to-Online Reinforcement Learning with Q-Ensembles
Paper
• 2306.06871
• Published
• 1
A Simple Unified Uncertainty-Guided Framework for Offline-to-Online
Reinforcement Learning
Paper
• 2306.07541
• Published
• 2
A Dataset Perspective on Offline Reinforcement Learning
Paper
• 2111.04714
• Published
• 1
Goal-Conditioned Predictive Coding as an Implicit Planner for Offline
Reinforcement Learning
Paper
• 2307.03406
• Published
• 1
Semi-Supervised Offline Reinforcement Learning with Action-Free
Trajectories
Paper
• 2210.06518
• Published
• 1
Mildly Constrained Evaluation Policy for Offline Reinforcement Learning
Paper
• 2306.03680
• Published
• 1
Conservative State Value Estimation for Offline Reinforcement Learning
Paper
• 2302.06884
• Published
• 1
Revisiting the Minimalist Approach to Offline Reinforcement Learning
Paper
• 2305.09836
• Published
• 3
Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch
Size
Paper
• 2211.11092
• Published
• 1
A learning gap between neuroscience and reinforcement learning
Paper
• 2104.10995
• Published
• 1
AF Adapter: Continual Pretraining for Building Chinese Biomedical
Language Model
Paper
• 2211.11363
• Published
• 1
RLocator: Reinforcement Learning for Bug Localization
Paper
• 2305.05586
• Published
• 1
Beyond Words: A Mathematical Framework for Interpreting Large Language
Models
Paper
• 2311.03033
• Published
• 1
LoopTune: Optimizing Tensor Computations with Reinforcement Learning
Paper
• 2309.01825
• Published
• 1
Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs
Paper
• 2308.13387
• Published
• 1
Hypernetworks for Zero-shot Transfer in Reinforcement Learning
Paper
• 2211.15457
• Published
• 1
Offline Experience Replay for Continual Offline Reinforcement Learning
Paper
• 2305.13804
• Published
• 2
Rewarded meta-pruning: Meta Learning with Rewards for Channel Pruning
Paper
• 2301.11063
• Published
• 1
Fine-tuning Language Models for Factuality
Paper
• 2311.08401
• Published
• 30
From Instructions to Intrinsic Human Values -- A Survey of Alignment
Goals for Big Models
Paper
• 2308.12014
• Published
• 1
Automatically Correcting Large Language Models: Surveying the landscape
of diverse self-correction strategies
Paper
• 2308.03188
• Published
• 2
When to Show a Suggestion? Integrating Human Feedback in AI-Assisted
Programming
Paper
• 2306.04930
• Published
• 3
Appropriateness is all you need!
Paper
• 2304.14553
• Published
• 1
LLM Cognitive Judgements Differ From Human
Paper
• 2307.11787
• Published
• 1
Fake Alignment: Are LLMs Really Aligned Well?
Paper
• 2311.05915
• Published
• 2
LLM Augmented Hierarchical Agents
Paper
• 2311.05596
• Published
• 1
Self-driven Grounding: Large Language Model Agents with Automatical
Language-aligned Skill Learning
Paper
• 2309.01352
• Published
• 1
Introspective Tips: Large Language Model for In-Context Decision Making
Paper
• 2305.11598
• Published
• 1
SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and
Reasoning
Paper
• 2305.15486
• Published
• 1
Alignment is not sufficient to prevent large language models from
generating harmful information: A psychoanalytic perspective
Paper
• 2311.08487
• Published
• 2
HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM
Paper
• 2311.09528
• Published
• 2
Learning to Prune Deep Neural Networks via Reinforcement Learning
Paper
• 2007.04756
• Published
• 1
Training Language Models with Language Feedback at Scale
Paper
• 2303.16755
• Published
• 1
Recomposing the Reinforcement Learning Building Blocks with
Hypernetworks
Paper
• 2106.06842
• Published
• 1
Continual Model-Based Reinforcement Learning with Hypernetworks
Paper
• 2009.11997
• Published
• 1
Responsible Task Automation: Empowering Large Language Models as
Responsible Task Automators
Paper
• 2306.01242
• Published
• 2
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning
Paper
• 2312.14878
• Published
• 15
ICE-GRT: Instruction Context Enhancement by Generative Reinforcement
based Transformers
Paper
• 2401.02072
• Published
• 11
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from
Fine-grained Correctional Human Feedback
Paper
• 2312.00849
• Published
• 12
Diffusion Model Alignment Using Direct Preference Optimization
Paper
• 2311.12908
• Published
• 49
Paper
• 2312.07000
• Published
• 15
Pearl: A Production-ready Reinforcement Learning Agent
Paper
• 2312.03814
• Published
• 15
Routing to the Expert: Efficient Reward-guided Ensemble of Large
Language Models
Paper
• 2311.08692
• Published
• 13
Trusted Source Alignment in Large Language Models
Paper
• 2311.06697
• Published
• 12
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
Models
Paper
• 2401.01335
• Published
• 68
Self-Rewarding Language Models
Paper
• 2401.10020
• Published
• 152
Aligning Large Language Models with Human Preferences through
Representation Engineering
Paper
• 2312.15997
• Published
• 2
DRLC: Reinforcement Learning with Dense Rewards from LLM Critic
Paper
• 2401.07382
• Published
• 2
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper
• 2401.06080
• Published
• 28
Data-Efficient Alignment of Large Language Models with Human Feedback
Through Natural Language
Paper
• 2311.14543
• Published
• 1
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper
• 2401.18058
• Published
• 24
StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback
Paper
• 2402.01391
• Published
• 43
Efficient Exploration for LLMs
Paper
• 2402.00396
• Published
• 22
West-of-N: Synthetic Preference Generation for Improved Reward Modeling
Paper
• 2401.12086
• Published
• 1
Improving Reinforcement Learning from Human Feedback with Efficient
Reward Model Ensemble
Paper
• 2401.16635
• Published
• 1
Uncertainty-Penalized Reinforcement Learning from Human Feedback with
Diverse Reward LoRA Ensembles
Paper
• 2401.00243
• Published
• 1
Iterative Data Smoothing: Mitigating Reward Overfitting and
Overoptimization in RLHF
Paper
• 2401.16335
• Published
• 1
Transforming and Combining Rewards for Aligning Large Language Models
Paper
• 2402.00742
• Published
• 12
Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate
Reward Hacking
Paper
• 2312.09244
• Published
• 9
Can LLMs Follow Simple Rules?
Paper
• 2311.04235
• Published
• 13
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
Paper
• 2311.05584
• Published
• 1
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Paper
• 2402.08609
• Published
• 36
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
Language Models
Paper
• 2402.03300
• Published
• 141
ReFT: Reasoning with Reinforced Fine-Tuning
Paper
• 2401.08967
• Published
• 31
In deep reinforcement learning, a pruned network is a good network
Paper
• 2402.12479
• Published
• 19
Q-Probe: A Lightweight Approach to Reward Maximization for Language
Models
Paper
• 2402.14688
• Published
WildChat: 1M ChatGPT Interaction Logs in the Wild
Paper
• 2405.01470
• Published
• 64
FLAME: Factuality-Aware Alignment for Large Language Models
Paper
• 2405.01525
• Published
• 29
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
Paper
• 2405.01481
• Published
• 30
Self-Play Preference Optimization for Language Model Alignment
Paper
• 2405.00675
• Published
• 28
Small Language Model Can Self-correct
Paper
• 2401.07301
• Published
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
Paper
• 2406.00888
• Published
• 33
LongSkywork: A Training Recipe for Efficiently Extending Context Length
in Large Language Models
Paper
• 2406.00605
• Published
• 2
WPO: Enhancing RLHF with Weighted Preference Optimization
Paper
• 2406.11827
• Published
• 17
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context
Reinforcement Learning
Paper
• 2406.08973
• Published
• 89
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement
Learning
Paper
• 2505.16410
• Published
• 58
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought
Reasoning in LLMs
Paper
• 2506.18896
• Published
• 29
RLP: Reinforcement as a Pretraining Objective
Paper
• 2510.01265
• Published
• 45
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning
for LLMs
Paper
• 2510.11696
• Published
• 181
MAXS: Meta-Adaptive Exploration with LLM Agents
Paper
• 2601.09259
• Published
• 95