ReZero: Enhancing LLM search ability by trying one-more-time Paper • 2504.11001 • Published 3 days ago • 9
DeepHermes Collection Preview models of hybrid reasoner Hermes series • 6 items • Updated Mar 13 • 27
MetaSC: Test-Time Safety Specification Optimization for Language Models Paper • 2502.07985 • Published Feb 11 • 3
Toxic Commons Collection Tools for de-toxifying public domain data, especially multilingual and historical text data and data with OCR errors. • 3 items • Updated Oct 31, 2024 • 6
steiner-preview Collection Reasoning models trained on synthetic data using reinforcement learning. • 3 items • Updated Oct 20, 2024 • 32
Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems Paper • 2410.13334 • Published Oct 17, 2024 • 13
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 140