Running 1.16k 1.16k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
Process Reward Models Collection Model and Datasets for Qwen 2.5 Math PRM 7B • 6 items • Updated 3 days ago • 1
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment Paper • 2502.10391 • Published 7 days ago • 29
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning Paper • 2502.04689 • Published 14 days ago • 7
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models Paper • 2502.04404 • Published 15 days ago • 20
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis Paper • 2502.04128 • Published 15 days ago • 23
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models Paper • 2502.01142 • Published 18 days ago • 23
GuardReasoner: Towards Reasoning-based LLM Safeguards Paper • 2501.18492 • Published 22 days ago • 81