--- library_name: transformers license: apache-2.0 base_model: - nbeerbower/Qwen3-8B-abliterated-TIES datasets: - nbeerbower/GreatFirewall-DPO - nbeerbower/Schule-DPO - nbeerbower/Purpura-DPO - nbeerbower/Arkhaios-DPO - jondurbin/truthy-dpo-v0.1 - antiven0m/physical-reasoning-dpo - flammenai/Date-DPO-NoAsterisks - flammenai/Prude-Phi3-DPO - Atsunori/HelpSteer2-DPO - jondurbin/gutenberg-dpo-v0.1 - nbeerbower/gutenberg2-dpo - nbeerbower/gutenberg-moderne-dpo - GeneralReasoning/GeneralThought-430K - nvidia/OpenMathReasoning - nvidia/OpenCodeReasoning tags: - orpo - uncensored - reasoning - cot --- ![image/png](https://huggingface.co/nbeerbower/Xiaolong-Qwen3-0.6B/resolve/main/cover.png?download=true) # Xiaolong-Qwen3-8B **Xiaolong** is a small, uncensored, reasoning-focused model finetuned using [ORPO and QLoRA](https://huggingface.co/blog/mlabonne/orpo-llama-3) on top of [Qwen3-8B-abliterated-TIES](https://huggingface.co/nbeerbower/Qwen3-8B-abliterated-TIES). ## Finetuning Details - **Method:** ORPO - **Epochs:** 2 - **Learning Rate:** 5e-6, cosine decay w/ 5% warmup - **Batch Size:** 1 x 32 (32 effective) - **Max Grad Norm:** 0.3 - **LoRA Rank:** 64 - **Hardware:** 1x NVIDIA RTX A6000 ## Dataset Composition ~9,100 samples. 3,000 used Chain of Thought reasoning. * [nbeerbower/GreatFirewall-DPO](https://huggingface.co/datasets/nbeerbower/GreatFirewall-DPO) * [nbeerbower/Schule-DPO](https://huggingface.co/datasets/nbeerbower/Schule-DPO) * [nbeerbower/Purpura-DPO](https://huggingface.co/datasets/nbeerbower/Purpura-DPO) * [nbeerbower/Arkhaios-DPO](https://huggingface.co/datasets/nbeerbower/Arkhaios-DPO) * [jondurbin/truthy-dpo-v0.1](https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1) * [antiven0m/physical-reasoning-dpo](https://huggingface.co/datasets/antiven0m/physical-reasoning-dpo) * [flammenai/Date-DPO-NoAsterisks](https://huggingface.co/datasets/flammenai/Date-DPO-NoAsterisks) * [flammenai/Prude-Phi3-DPO](https://huggingface.co/datasets/flammenai/Prude-Phi3-DPO) * [Atsunori/HelpSteer2-DPO](https://huggingface.co/datasets/Atsunori/HelpSteer2-DPO) (1000 samples) * [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1) * [nbeerbower/gutenberg2-dpo](https://huggingface.co/datasets/nbeerbower/gutenberg2-dpo) * [nbeerbower/gutenberg-moderne-dpo](https://huggingface.co/datasets/nbeerbower/gutenberg-moderne-dpo) ### Chain of Thought * [GeneralReasoning/GeneralThought-430K](https://huggingface.co/datasets/GeneralReasoning/GeneralThought-430K) (1000 samples) * [nvidia/OpenMathReasoning](https://huggingface.co/datasets/nvidia/OpenMathReasoning) (1000 samples) * [nvidia/OpenCodeReasoning](https://huggingface.co/datasets/nvidia/OpenCodeReasoning) (1000 samples)