Triangle104 commited on
Commit
54f7c8b
·
verified ·
1 Parent(s): 624ca55

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -0
README.md CHANGED
@@ -31,6 +31,43 @@ tags:
31
  This model was converted to GGUF format from [`nbeerbower/Xiaolong-Qwen3-8B`](https://huggingface.co/nbeerbower/Xiaolong-Qwen3-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
32
  Refer to the [original model card](https://huggingface.co/nbeerbower/Xiaolong-Qwen3-8B) for more details on the model.
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  ## Use with llama.cpp
35
  Install llama.cpp through brew (works on Mac and Linux)
36
 
 
31
  This model was converted to GGUF format from [`nbeerbower/Xiaolong-Qwen3-8B`](https://huggingface.co/nbeerbower/Xiaolong-Qwen3-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
32
  Refer to the [original model card](https://huggingface.co/nbeerbower/Xiaolong-Qwen3-8B) for more details on the model.
33
 
34
+ ---
35
+ Xiaolong is a small, uncensored, reasoning-focused model finetuned using ORPO and QLoRA on top of Qwen3-8B-abliterated-TIES.
36
+
37
+ Finetuning Details
38
+ -
39
+ - Method: ORPO
40
+ - Epochs: 2
41
+ - Learning Rate: 5e-6, cosine decay w/ 5% warmup
42
+ - Batch Size: 1 x 32 (32 effective)
43
+ - Max Grad Norm: 0.3
44
+ - LoRA Rank: 64
45
+ - Hardware: 1x NVIDIA RTX A6000
46
+
47
+ Dataset Composition
48
+ -
49
+ ~9,100 samples. 3,000 used Chain of Thought reasoning.
50
+
51
+ nbeerbower/GreatFirewall-DPO
52
+ nbeerbower/Schule-DPO
53
+ nbeerbower/Purpura-DPO
54
+ nbeerbower/Arkhaios-DPO
55
+ jondurbin/truthy-dpo-v0.1
56
+ antiven0m/physical-reasoning-dpo
57
+ flammenai/Date-DPO-NoAsterisks
58
+ flammenai/Prude-Phi3-DPO
59
+ Atsunori/HelpSteer2-DPO (1000 samples)
60
+ jondurbin/gutenberg-dpo-v0.1
61
+ nbeerbower/gutenberg2-dpo
62
+ nbeerbower/gutenberg-moderne-dpo
63
+
64
+ Chain of Thought
65
+ -
66
+ - GeneralReasoning/GeneralThought-430K (1000 samples)
67
+ - nvidia/OpenMathReasoning (1000 samples)
68
+ - nvidia/OpenCodeReasoning (1000 samples)
69
+
70
+ ---
71
  ## Use with llama.cpp
72
  Install llama.cpp through brew (works on Mac and Linux)
73