PEFT variations estnafinema0/llm-course-hw3-dora Text Generation • 0.3B • Updated Apr 11 • 5 estnafinema0/llm-course-hw3-lora Text Generation • 0.3B • Updated Apr 11 • 4 estnafinema0/llm-course-hw3-tinyllama-qlora Updated Apr 11 estnafinema0/llm-course-hw3-tinyllamma-qlora Updated Apr 11
SmolLM Variation: PPO & DPO Fine-Tuning for RLHF This collection presents the fine-tuning of the SmolLM model using two (RLHF) approaches: DPO and PPO. estnafinema0/trainer_output Text Classification • 0.1B • Updated Mar 30 • 4 estnafinema0/smolLM-variation-dpo Text Generation • 0.1B • Updated Mar 30 • 7 estnafinema0/smolLM-variation-ppo Text Generation • 0.1B • Updated Mar 30 • 66
NER Extraction. Active Learning Approach. estnafinema0/active-learning-nerc-models-kfold Updated Apr 4 • 4 estnafinema0/nerc-extraction 0.1B • Updated Apr 4 • 10
PEFT variations estnafinema0/llm-course-hw3-dora Text Generation • 0.3B • Updated Apr 11 • 5 estnafinema0/llm-course-hw3-lora Text Generation • 0.3B • Updated Apr 11 • 4 estnafinema0/llm-course-hw3-tinyllama-qlora Updated Apr 11 estnafinema0/llm-course-hw3-tinyllamma-qlora Updated Apr 11
NER Extraction. Active Learning Approach. estnafinema0/active-learning-nerc-models-kfold Updated Apr 4 • 4 estnafinema0/nerc-extraction 0.1B • Updated Apr 4 • 10
SmolLM Variation: PPO & DPO Fine-Tuning for RLHF This collection presents the fine-tuning of the SmolLM model using two (RLHF) approaches: DPO and PPO. estnafinema0/trainer_output Text Classification • 0.1B • Updated Mar 30 • 4 estnafinema0/smolLM-variation-dpo Text Generation • 0.1B • Updated Mar 30 • 7 estnafinema0/smolLM-variation-ppo Text Generation • 0.1B • Updated Mar 30 • 66