--- base_model: unsloth/deepseek-r1-distill-llama-8b-unsloth-bnb-4bit tags: - text-generation-inference - transformers - unsloth - llama - trl - sft license: apache-2.0 language: - en --- # DeepSeek R1 Medical Reasoning - **Finetuned from model:** [unsloth/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B) This model was fine-tuned for medical reasoning using [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library, achieving 2x faster training. [](https://github.com/unslothai/unsloth) ## Model Details - **Fine-tuning task**: Medical reasoning with step-by-step chain-of-thought explanations - **Training dataset**: [Medical reasoning dataset](https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT) (500 examples) - **Training metrics**: - Final loss: 1.3269 - Training runtime: 2191.2041 seconds - Total FLOPs: 4.01e+16 - Epochs completed: 1.896