danielhanchen commited on
Commit
407e3d8
·
verified ·
1 Parent(s): 7c11d03

Add files using upload-large-folder tool

Browse files
Q4_K_M/Qwen3-235B-A22B-128K-Q4_K_M-00001-of-00003.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e91b679d81845286e77fdd55e75973b775fb834c4c238e99bd1bddb83f99bb08
3
- size 49944699904
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce92b8776c643a143afc9684249c368169367cc33357f3f364f894131d3bc121
3
+ size 49944699872
README.md CHANGED
@@ -1,26 +1,14 @@
1
  ---
2
- base_model: Qwen/Qwen3-235B-A22B
3
- language:
4
- - en
5
- library_name: transformers
6
- license_link: https://huggingface.co/Qwen/Qwen3-235B-A22B/blob/main/LICENSE
7
- license: apache-2.0
8
  tags:
9
- - qwen3
10
- - qwen
11
  - unsloth
12
- - transformers
 
 
 
 
 
13
  ---
14
- > [!NOTE]
15
- > With 128K Context Length enabled by YaRN.
16
- >
17
  <div>
18
- <p style="margin-bottom: 0; margin-top: 0;">
19
- <strong>See <a href="https://huggingface.co/collections/unsloth/qwen3-680edabfb790c8c34a242f95">our collection</a> for all versions of Qwen3 including GGUF, 4-bit & 16-bit formats.</strong>
20
- </p>
21
- <p style="margin-bottom: 0;">
22
- <em>Learn to run Qwen3 correctly - <a href="https://docs.unsloth.ai/basics/qwen3-how-to-run-and-fine-tune">Read our Guide</a>.</em>
23
- </p>
24
  <p style="margin-top: 0;margin-bottom: 0;">
25
  <em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em>
26
  </p>
@@ -35,47 +23,13 @@ tags:
35
  <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
36
  </a>
37
  </div>
38
- <h1 style="margin-top: 0rem;">✨ Run & Fine-tune Qwen3 with Unsloth!</h1>
39
  </div>
40
 
41
- - Fine-tune Qwen3 (14B) for free using our Google [Colab notebook here](https://docs.unsloth.ai/get-started/unsloth-notebooks)!
42
- - Read our Blog about Qwen3 support: [unsloth.ai/blog/qwen3](https://unsloth.ai/blog/qwen3)
43
- - View the rest of our notebooks in our [docs here](https://docs.unsloth.ai/get-started/unsloth-notebooks).
44
- - Run & export your fine-tuned model to Ollama, llama.cpp or HF.
45
-
46
- | Unsloth supports | Free Notebooks | Performance | Memory use |
47
- |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------|
48
- | **Qwen3 (14B)** | [▶️ Start on Colab](https://docs.unsloth.ai/get-started/unsloth-notebooks) | 3x faster | 70% less |
49
- | **GRPO with Qwen3 (8B)** | [▶️ Start on Colab](https://docs.unsloth.ai/get-started/unsloth-notebooks) | 3x faster | 80% less |
50
- | **Llama-3.2 (3B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb) | 2.4x faster | 58% less |
51
- | **Llama-3.2 (11B vision)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) | 2x faster | 60% less |
52
- | **Qwen2.5 (7B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_(7B)-Alpaca.ipynb) | 2x faster | 60% less |
53
- | **Phi-4 (14B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4-Conversational.ipynb) | 2x faster | 50% less |
54
-
55
- # To Switch Between Thinking and Non-Thinking
56
- If you are using llama.cpp, Ollama, Open WebUI etc., you can add `/think` and `/no_think` to user prompts or system messages to switch the model's thinking mode from turn to turn. The model will follow the most recent instruction in multi-turn conversations.
57
-
58
- Here is an example of multi-turn conversation:
59
-
60
- ```
61
- > Who are you /no_think
62
-
63
- <think>
64
-
65
- </think>
66
-
67
- I am Qwen, a large-scale language model developed by Alibaba Cloud. [...]
68
-
69
- > How many 'r's are in 'strawberries'? /think
70
-
71
- <think>
72
- Okay, let's see. The user is asking how many times the letter 'r' appears in the word "strawberries". [...]
73
- </think>
74
-
75
- The word strawberries contains 3 instances of the letter r. [...]
76
- ```
77
 
78
  # Qwen3-235B-A22B
 
 
 
79
 
80
  ## Qwen3 Highlights
81
 
@@ -159,21 +113,23 @@ print("thinking content:", thinking_content)
159
  print("content:", content)
160
  ```
161
 
162
- For deployment, you can use `vllm>=0.8.5` or `sglang>=0.4.5.post2` to create an OpenAI-compatible API endpoint:
163
- - vLLM:
164
  ```shell
165
- vllm serve Qwen/Qwen3-235B-A22B --enable-reasoning --reasoning-parser deepseek_r1
166
  ```
167
- - SGLang:
168
  ```shell
169
- python -m sglang.launch_server --model-path Qwen/Qwen3-235B-A22B --reasoning-parser deepseek-r1
170
  ```
171
 
 
 
172
  ## Switching Between Thinking and Non-Thinking Mode
173
 
174
  > [!TIP]
175
- > The `enable_thinking` switch is also available in APIs created by vLLM and SGLang.
176
- > Please refer to our documentation for [vLLM](https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes) and [SGLang](https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes) users.
177
 
178
  ### `enable_thinking=True`
179
 
@@ -271,7 +227,7 @@ if __name__ == "__main__":
271
  print(f"Bot: {response_3}")
272
  ```
273
 
274
- > **Note**
275
  > For API compatibility, when `enable_thinking=True`, regardless of whether the user uses `/think` or `/no_think`, the model will always output a block wrapped in `<think>...</think>`. However, the content inside this block may be empty if thinking is disabled.
276
  > When `enable_thinking=False`, the soft switches are not valid. Regardless of any `/think` or `/no_think` tags input by the user, the model will not generate think content and will not include a `<think>...</think>` block.
277
 
@@ -341,7 +297,7 @@ YaRN is currently supported by several inference frameworks, e.g., `transformers
341
  {
342
  ...,
343
  "rope_scaling": {
344
- "type": "yarn",
345
  "factor": 4.0,
346
  "original_max_position_embeddings": 32768
347
  }
@@ -353,12 +309,12 @@ YaRN is currently supported by several inference frameworks, e.g., `transformers
353
 
354
  For `vllm`, you can use
355
  ```shell
356
- vllm serve ... --rope-scaling '{"type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072
357
  ```
358
 
359
  For `sglang`, you can use
360
  ```shell
361
- python -m sglang.launch_server ... --json-model-override-args '{"rope_scaling":{"type":"yarn","factor":4.0,"original_max_position_embeddings":32768}}'
362
  ```
363
 
364
  For `llama-server` from `llama.cpp`, you can use
 
1
  ---
 
 
 
 
 
 
2
  tags:
 
 
3
  - unsloth
4
+ base_model:
5
+ - Qwen/Qwen3-235B-A22B
6
+ library_name: transformers
7
+ license: apache-2.0
8
+ license_link: https://huggingface.co/Qwen/Qwen3-235B-A22B/blob/main/LICENSE
9
+ pipeline_tag: text-generation
10
  ---
 
 
 
11
  <div>
 
 
 
 
 
 
12
  <p style="margin-top: 0;margin-bottom: 0;">
13
  <em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em>
14
  </p>
 
23
  <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
24
  </a>
25
  </div>
 
26
  </div>
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  # Qwen3-235B-A22B
30
+ <a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
31
+ <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
32
+ </a>
33
 
34
  ## Qwen3 Highlights
35
 
 
113
  print("content:", content)
114
  ```
115
 
116
+ For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint:
117
+ - SGLang:
118
  ```shell
119
+ python -m sglang.launch_server --model-path Qwen/Qwen3-235B-A22B --reasoning-parser qwen3 --tp 8
120
  ```
121
+ - vLLM:
122
  ```shell
123
+ vllm serve Qwen/Qwen3-235B-A22B --enable-reasoning --reasoning-parser deepseek_r1
124
  ```
125
 
126
+ For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
127
+
128
  ## Switching Between Thinking and Non-Thinking Mode
129
 
130
  > [!TIP]
131
+ > The `enable_thinking` switch is also available in APIs created by SGLang and vLLM.
132
+ > Please refer to our documentation for [SGLang](https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes) and [vLLM](https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes) users.
133
 
134
  ### `enable_thinking=True`
135
 
 
227
  print(f"Bot: {response_3}")
228
  ```
229
 
230
+ > [!NOTE]
231
  > For API compatibility, when `enable_thinking=True`, regardless of whether the user uses `/think` or `/no_think`, the model will always output a block wrapped in `<think>...</think>`. However, the content inside this block may be empty if thinking is disabled.
232
  > When `enable_thinking=False`, the soft switches are not valid. Regardless of any `/think` or `/no_think` tags input by the user, the model will not generate think content and will not include a `<think>...</think>` block.
233
 
 
297
  {
298
  ...,
299
  "rope_scaling": {
300
+ "rope_type": "yarn",
301
  "factor": 4.0,
302
  "original_max_position_embeddings": 32768
303
  }
 
309
 
310
  For `vllm`, you can use
311
  ```shell
312
+ vllm serve ... --rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072
313
  ```
314
 
315
  For `sglang`, you can use
316
  ```shell
317
+ python -m sglang.launch_server ... --json-model-override-args '{"rope_scaling":{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}}'
318
  ```
319
 
320
  For `llama-server` from `llama.cpp`, you can use
UD-Q2_K_XL/Qwen3-235B-A22B-128K-UD-Q2_K_XL-00001-of-00002.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d7cb0c0d0229739c6fb1634c17b5219d4ad714f7703781c6ace6fb5baedef89b
3
- size 49841583584
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:acf73a6df97371e8e4fa138fff301272761b97e890bc70001075053473e76f1e
3
+ size 49841583520
UD-Q3_K_XL/Qwen3-235B-A22B-128K-UD-Q3_K_XL-00001-of-00003.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:138341249c4d76799ac2bcbf9a968fcddf082592110ffbcbd98324505ec1abf3
3
- size 49859919424
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8dbab61d78ff038332efc5d77d36b3d0628e77b062298d2777f02461789eab5d
3
+ size 49859919360
UD-Q4_K_XL/Qwen3-235B-A22B-128K-UD-Q4_K_XL-00001-of-00003.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d11942c0b62acb21f28296d720040fe6ee14d7f15922f251255dcb89fd80eae5
3
- size 49875808960
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b9d56fb6ac319b3d9c3e498b05f6edbed43de5ee6fbc457710b824674f27a540
3
+ size 49875808896
UD-Q5_K_XL/Qwen3-235B-A22B-128K-UD-Q5_K_XL-00001-of-00004.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9208cfb487d5628625f39240c29347f5b582ae7447a19becfefa425de2b8c201
3
- size 49835132896
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9fc3787072a32911b09a42f36eab9981320447d9569b44aa7048a097c93ad808
3
+ size 49835132864
config.json CHANGED
@@ -4,7 +4,6 @@
4
  ],
5
  "attention_bias": false,
6
  "attention_dropout": 0.0,
7
- "bos_token_id": 151643,
8
  "decoder_sparse_step": 1,
9
  "eos_token_id": 151645,
10
  "head_dim": 128,
@@ -42,4 +41,4 @@
42
  "use_cache": true,
43
  "use_sliding_window": false,
44
  "vocab_size": 151936
45
- }
 
4
  ],
5
  "attention_bias": false,
6
  "attention_dropout": 0.0,
 
7
  "decoder_sparse_step": 1,
8
  "eos_token_id": 151645,
9
  "head_dim": 128,
 
41
  "use_cache": true,
42
  "use_sliding_window": false,
43
  "vocab_size": 151936
44
+ }