Mila Iterative DPO

university

AI & ML interests

None defined yet.

Recent Activity

arianhosseini authored a paper 7 days ago

Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference

arianhosseini authored a paper 7 days ago

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

arianhosseini authored a paper 7 days ago

Generative Verifiers: Reward Modeling as Next-Token Prediction

View all activity

arianhosseini

authored 7 papers 7 days ago

Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference

Paper • 2306.12509 • Published Jun 21, 2023 • 14

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

Paper • 2403.17031 • Published Mar 24, 2024 • 6

Generative Verifiers: Reward Modeling as Next-Token Prediction

Paper • 2408.15240 • Published Aug 27, 2024 • 13

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

Paper • 2408.16737 • Published Aug 29, 2024 • 1

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

Paper • 2410.18252 • Published Oct 23, 2024 • 7

Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

Paper • 2505.04842 • Published May 7 • 12

Multi-Turn Puzzles: Evaluating Interactive Reasoning and Strategic Dialogue in LLMs

Paper • 2508.10142 • Published 25 days ago • 3

arianhosseini

authored a paper 5 months ago

When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning

Paper • 2504.01005 • Published Apr 1 • 16

mnoukhov

authored a paper 11 months ago

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

Paper • 2403.17031 • Published Mar 24, 2024 • 6

sophiex

authored a paper about 1 year ago

Efficient Adversarial Training in LLMs with Continuous Attacks

Paper • 2405.15589 • Published May 24, 2024