Triangle104 commited on
Commit
eb7934a
·
verified ·
1 Parent(s): b67cd21

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -0
README.md CHANGED
@@ -31,6 +31,44 @@ tags:
31
  This model was converted to GGUF format from [`nbeerbower/Xiaolong-Qwen3-0.6B`](https://huggingface.co/nbeerbower/Xiaolong-Qwen3-0.6B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
32
  Refer to the [original model card](https://huggingface.co/nbeerbower/Xiaolong-Qwen3-0.6B) for more details on the model.
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  ## Use with llama.cpp
35
  Install llama.cpp through brew (works on Mac and Linux)
36
 
 
31
  This model was converted to GGUF format from [`nbeerbower/Xiaolong-Qwen3-0.6B`](https://huggingface.co/nbeerbower/Xiaolong-Qwen3-0.6B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
32
  Refer to the [original model card](https://huggingface.co/nbeerbower/Xiaolong-Qwen3-0.6B) for more details on the model.
33
 
34
+ ---
35
+ Xiaolong is a small, uncensored, reasoning-focused model finetuned using ORPO and QLoRA on top of Qwen3-0.6B-abliterated-TIES.
36
+
37
+ Finetuning Details
38
+ -
39
+
40
+ - Method: ORPO
41
+ - Epochs: 1.3
42
+ - Learning Rate: 5e-6, cosine decay w/ 5% warmup
43
+ - Batch Size: 4 x 8 (32 effective)
44
+ - Max Grad Norm: 0.3
45
+ - LoRA Rank: 64
46
+ - Hardware: 1x NVIDIA RTX A6000
47
+
48
+ Dataset Composition
49
+ -
50
+ ~9,100 samples. 3,000 used Chain of Thought reasoning.
51
+
52
+ - nbeerbower/GreatFirewall-DPO
53
+ - nbeerbower/Schule-DPO
54
+ - nbeerbower/Purpura-DPO
55
+ - nbeerbower/Arkhaios-DPO
56
+ - jondurbin/truthy-dpo-v0.1
57
+ - antiven0m/physical-reasoning-dpo
58
+ - flammenai/Date-DPO-NoAsterisks
59
+ - flammenai/Prude-Phi3-DPO
60
+ - Atsunori/HelpSteer2-DPO (1000 samples)
61
+ - jondurbin/gutenberg-dpo-v0.1
62
+ - nbeerbower/gutenberg2-dpo
63
+ - nbeerbower/gutenberg-moderne-dpo
64
+
65
+ Chain of Thought
66
+ -
67
+ - GeneralReasoning/GeneralThought-430K (1000 samples)
68
+ - nvidia/OpenMathReasoning (1000 samples)
69
+ - nvidia/OpenCodeReasoning (1000 samples)
70
+
71
+ ---
72
  ## Use with llama.cpp
73
  Install llama.cpp through brew (works on Mac and Linux)
74