Diving into Self-Evolving Training for Multimodal Reasoning Paper • 2412.17451 • Published 26 days ago • 42
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners Paper • 2412.17256 • Published 26 days ago • 45
Zephyr 7B Gemma Collection Models, dataset, and Demo for Zephyr 7B Gemma. For code to train the models, see: https://github.com/huggingface/alignment-handbook • 5 items • Updated Apr 12, 2024 • 15