Self-Training Large Language Models for Tool-Use Without Demonstrations Paper • 2502.05867 • Published Feb 9
Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain Paper • 2307.03042 • Published Jul 6, 2023
Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them Paper • 2507.10616 • Published 20 days ago • 1