-
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism
Paper • 2407.10457 • Published • 24 -
Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations
Paper • 2411.00640 • Published • 3 -
Law of the Weakest Link: Cross Capabilities of Large Language Models
Paper • 2409.19951 • Published • 54
Vignesh
Vigneshwaran
AI & ML interests
None yet
Recent Activity
liked
a dataset
15 days ago
HuggingFaceFW/finepdfs
updated
a collection
4 months ago
RLHF
updated
a collection
5 months ago
RLHF
Organizations
training
-
A Loss Curvature Perspective on Training Instability in Deep Learning
Paper • 2110.04369 • Published -
Why Do We Need Weight Decay in Modern Deep Learning?
Paper • 2310.04415 • Published -
Small-scale proxies for large-scale Transformer training instabilities
Paper • 2309.14322 • Published • 21 -
Transformers Can Navigate Mazes With Multi-Step Prediction
Paper • 2412.05117 • Published • 5
Synthetic data
RL
RLHF
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 69 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 50 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 31
evaluation
-
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism
Paper • 2407.10457 • Published • 24 -
Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations
Paper • 2411.00640 • Published • 3 -
Law of the Weakest Link: Cross Capabilities of Large Language Models
Paper • 2409.19951 • Published • 54
RL
training
-
A Loss Curvature Perspective on Training Instability in Deep Learning
Paper • 2110.04369 • Published -
Why Do We Need Weight Decay in Modern Deep Learning?
Paper • 2310.04415 • Published -
Small-scale proxies for large-scale Transformer training instabilities
Paper • 2309.14322 • Published • 21 -
Transformers Can Navigate Mazes With Multi-Step Prediction
Paper • 2412.05117 • Published • 5
RLHF
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 69 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 50 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 31
Synthetic data