Xiaolong-Qwen3-8B / README.md
nbeerbower's picture
Create README.md
fc314ee verified
metadata
library_name: transformers
license: apache-2.0
base_model:
  - nbeerbower/Qwen3-8B-abliterated-TIES
datasets:
  - nbeerbower/GreatFirewall-DPO
  - nbeerbower/Schule-DPO
  - nbeerbower/Purpura-DPO
  - nbeerbower/Arkhaios-DPO
  - jondurbin/truthy-dpo-v0.1
  - antiven0m/physical-reasoning-dpo
  - flammenai/Date-DPO-NoAsterisks
  - flammenai/Prude-Phi3-DPO
  - Atsunori/HelpSteer2-DPO
  - jondurbin/gutenberg-dpo-v0.1
  - nbeerbower/gutenberg2-dpo
  - nbeerbower/gutenberg-moderne-dpo
  - GeneralReasoning/GeneralThought-430K
  - nvidia/OpenMathReasoning
  - nvidia/OpenCodeReasoning
tags:
  - orpo
  - uncensored
  - reasoning
  - cot

image/png

Xiaolong-Qwen3-8B

Xiaolong is a small, uncensored, reasoning-focused model finetuned using ORPO and QLoRA on top of Qwen3-8B-abliterated-TIES.

Finetuning Details

  • Method: ORPO
  • Epochs: 2
  • Learning Rate: 5e-6, cosine decay w/ 5% warmup
  • Batch Size: 1 x 32 (32 effective)
  • Max Grad Norm: 0.3
  • LoRA Rank: 64
  • Hardware: 1x NVIDIA RTX A6000

Dataset Composition

~9,100 samples. 3,000 used Chain of Thought reasoning.

Chain of Thought