UCLAML (UCLA Statistical Machine Learning Lab)

angelahzyuan

authored a paper 3 months ago

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 89

thughost

authored a paper 4 months ago

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 89

FrankWu96

updated a model 6 months ago

UCLAML/Mistral-7B-Instruct-ppo_mistral

Text Generation • Updated Nov 22, 2024 • 3

angelahzyuan

authored 2 papers 6 months ago

Accelerated Preference Optimization for Large Language Model Alignment

Paper • 2410.06293 • Published Oct 8, 2024 • 5

MARS: Unleashing the Power of Variance Reduction for Training Large Models

Paper • 2411.10438 • Published Nov 15, 2024 • 13

thughost

authored a paper 6 months ago

MARS: Unleashing the Power of Variance Reduction for Training Large Models

Paper • 2411.10438 • Published Nov 15, 2024 • 13

thughost

authored a paper 7 months ago

DPLM-2: A Multimodal Diffusion Protein Language Model

Paper • 2410.13782 • Published Oct 17, 2024 • 22

zhiqings

authored 2 papers 7 months ago

An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models

Paper • 2408.00724 • Published Aug 1, 2024 • 1

Lean-STaR: Learning to Interleave Thinking and Proving

Paper • 2407.10040 • Published Jul 14, 2024

thughost

authored 2 papers 7 months ago

General Preference Modeling with Preference Representations for Aligning Language Models

Paper • 2410.02197 • Published Oct 3, 2024 • 9

LLaVA-Critic: Learning to Evaluate Multimodal Models

Paper • 2410.02712 • Published Oct 3, 2024 • 37

thughost

authored a paper 8 months ago

ProteinBench: A Holistic Evaluation of Protein Foundation Models

Paper • 2409.06744 • Published Sep 10, 2024 • 9

thughost

posted an update 11 months ago

Post

726

We've open-sourced the code and models for Self-Play Preference Optimization (SPPO)! 🚀🚀🚀
🤗paper: Self-Play Preference Optimization for Language Model Alignment (2405.00675)
⭐ code: https://github.com/uclaml/SPPO
🤗models: UCLA-AGI/sppo-6635fdd844f2b2e4a94d0b9a

zhiqings

authored 2 papers about 1 year ago

Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision

Paper • 2403.09472 • Published Mar 14, 2024 • 1

Self-Play Preference Optimization for Language Model Alignment

Paper • 2405.00675 • Published May 1, 2024 • 28

angelahzyuan

authored a paper about 1 year ago

Self-Play Preference Optimization for Language Model Alignment

Paper • 2405.00675 • Published May 1, 2024 • 28

thughost

authored 4 papers about 1 year ago

DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization

Paper • 2403.13829 • Published Mar 7, 2024

Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs

Paper • 2305.08359 • Published May 15, 2023

Risk Bounds of Accelerated SGD for Overparameterized Linear Regression

Paper • 2311.14222 • Published Nov 23, 2023

How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

Paper • 2310.08391 • Published Oct 12, 2023

UCLA Statistical Machine Learning Lab

AI & ML interests

UCLAML's activity

Tensor Product Attention Is All You Need

Tensor Product Attention Is All You Need

UCLAML/Mistral-7B-Instruct-ppo_mistral

Accelerated Preference Optimization for Large Language Model Alignment

MARS: Unleashing the Power of Variance Reduction for Training Large Models

MARS: Unleashing the Power of Variance Reduction for Training Large Models

DPLM-2: A Multimodal Diffusion Protein Language Model

An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models

Lean-STaR: Learning to Interleave Thinking and Proving

General Preference Modeling with Preference Representations for Aligning Language Models

LLaVA-Critic: Learning to Evaluate Multimodal Models

ProteinBench: A Holistic Evaluation of Protein Foundation Models

Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision

Self-Play Preference Optimization for Language Model Alignment

Self-Play Preference Optimization for Language Model Alignment

DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization

Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs

Risk Bounds of Accelerated SGD for Overparameterized Linear Regression

How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

AI & ML interests

Team members 5

UCLAML's activity