SmolLM Variation: PPO & DPO Fine-Tuning for RLHF Collection This collection presents the fine-tuning of the SmolLM model using two (RLHF) approaches: DPO and PPO. • 3 items • Updated Mar 30 • 1