Add files using upload-large-folder tool

Browse files

Files changed (7) hide show

Q4_K_M/Qwen3-235B-A22B-128K-Q4_K_M-00001-of-00003.gguf +2 -2
README.md +22 -66
UD-Q2_K_XL/Qwen3-235B-A22B-128K-UD-Q2_K_XL-00001-of-00002.gguf +2 -2
UD-Q3_K_XL/Qwen3-235B-A22B-128K-UD-Q3_K_XL-00001-of-00003.gguf +2 -2
UD-Q4_K_XL/Qwen3-235B-A22B-128K-UD-Q4_K_XL-00001-of-00003.gguf +2 -2
UD-Q5_K_XL/Qwen3-235B-A22B-128K-UD-Q5_K_XL-00001-of-00004.gguf +2 -2
config.json +1 -2

Q4_K_M/Qwen3-235B-A22B-128K-Q4_K_M-00001-of-00003.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e91b679d81845286e77fdd55e75973b775fb834c4c238e99bd1bddb83f99bb08
-size 49944699904

 version https://git-lfs.github.com/spec/v1
+oid sha256:ce92b8776c643a143afc9684249c368169367cc33357f3f364f894131d3bc121
+size 49944699872

README.md CHANGED Viewed

@@ -1,26 +1,14 @@
 ---
-base_model: Qwen/Qwen3-235B-A22B
-language:
-- en
-library_name: transformers
-license_link: https://huggingface.co/Qwen/Qwen3-235B-A22B/blob/main/LICENSE
-license: apache-2.0
 tags:
-- qwen3
-- qwen
 - unsloth
-- transformers
 ---
-> [!NOTE]
-> With 128K Context Length enabled by YaRN.
->
 <div>
-  <p style="margin-bottom: 0; margin-top: 0;">
-    <strong>See <a href="https://huggingface.co/collections/unsloth/qwen3-680edabfb790c8c34a242f95">our collection</a> for all versions of Qwen3 including GGUF, 4-bit & 16-bit formats.</strong>
-  </p>
-  <p style="margin-bottom: 0;">
-    <em>Learn to run Qwen3 correctly - <a href="https://docs.unsloth.ai/basics/qwen3-how-to-run-and-fine-tune">Read our Guide</a>.</em>
-  </p>
 <p style="margin-top: 0;margin-bottom: 0;">
     <em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em>
   </p>
@@ -35,47 +23,13 @@ tags:
       <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
     </a>
   </div>
-<h1 style="margin-top: 0rem;">✨ Run & Fine-tune Qwen3 with Unsloth!</h1>
 </div>
-- Fine-tune Qwen3 (14B) for free using our Google [Colab notebook here](https://docs.unsloth.ai/get-started/unsloth-notebooks)!
-- Read our Blog about Qwen3 support: [unsloth.ai/blog/qwen3](https://unsloth.ai/blog/qwen3)
-- View the rest of our notebooks in our [docs here](https://docs.unsloth.ai/get-started/unsloth-notebooks).
-- Run & export your fine-tuned model to Ollama, llama.cpp or HF.
-| Unsloth supports          |    Free Notebooks                                                                                           | Performance | Memory use |
-|-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------|
-| **Qwen3 (14B)**      | [▶️ Start on Colab](https://docs.unsloth.ai/get-started/unsloth-notebooks)               | 3x faster | 70% less |
-| **GRPO with Qwen3 (8B)**      | [▶️ Start on Colab](https://docs.unsloth.ai/get-started/unsloth-notebooks)               | 3x faster | 80% less |
-| **Llama-3.2 (3B)**      | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb)               | 2.4x faster | 58% less |
-| **Llama-3.2 (11B vision)**      | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)               | 2x faster | 60% less |
-| **Qwen2.5 (7B)**      | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_(7B)-Alpaca.ipynb)               | 2x faster | 60% less |
-| **Phi-4 (14B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4-Conversational.ipynb)               | 2x faster | 50% less |
-# To Switch Between Thinking and Non-Thinking
-If you are using llama.cpp, Ollama, Open WebUI etc., you can add `/think` and `/no_think` to user prompts or system messages to switch the model's thinking mode from turn to turn. The model will follow the most recent instruction in multi-turn conversations.
-Here is an example of multi-turn conversation:
-```
-> Who are you /no_think
-<think>
-</think>
-I am Qwen, a large-scale language model developed by Alibaba Cloud. [...]
-> How many 'r's are in 'strawberries'? /think
-<think>
-Okay, let's see. The user is asking how many times the letter 'r' appears in the word "strawberries". [...]
-</think>
-The word strawberries contains 3 instances of the letter r. [...]
-```
 # Qwen3-235B-A22B
 ## Qwen3 Highlights
@@ -159,21 +113,23 @@ print("thinking content:", thinking_content)
 print("content:", content)
 ```
-For deployment, you can use `vllm>=0.8.5` or `sglang>=0.4.5.post2` to create an OpenAI-compatible API endpoint:
-- vLLM:
     ```shell
-    vllm serve Qwen/Qwen3-235B-A22B --enable-reasoning --reasoning-parser deepseek_r1
     ```
-- SGLang:
     ```shell
-    python -m sglang.launch_server --model-path Qwen/Qwen3-235B-A22B --reasoning-parser deepseek-r1
     ```
 ## Switching Between Thinking and Non-Thinking Mode
 > [!TIP]
-> The `enable_thinking` switch is also available in APIs created by vLLM and SGLang.
-> Please refer to our documentation for [vLLM](https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes) and [SGLang](https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes) users.
 ### `enable_thinking=True`
@@ -271,7 +227,7 @@ if __name__ == "__main__":
     print(f"Bot: {response_3}")
 ```
-> **Note**
 > For API compatibility, when `enable_thinking=True`, regardless of whether the user uses `/think` or `/no_think`, the model will always output a block wrapped in `<think>...</think>`. However, the content inside this block may be empty if thinking is disabled.
 > When `enable_thinking=False`, the soft switches are not valid. Regardless of any `/think` or `/no_think` tags input by the user, the model will not generate think content and will not include a `<think>...</think>` block.
@@ -341,7 +297,7 @@ YaRN is currently supported by several inference frameworks, e.g., `transformers
     {
         ...,
         "rope_scaling": {
-            "type": "yarn",
             "factor": 4.0,
             "original_max_position_embeddings": 32768
         }
@@ -353,12 +309,12 @@ YaRN is currently supported by several inference frameworks, e.g., `transformers
   For `vllm`, you can use
     ```shell
-    vllm serve ... --rope-scaling '{"type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072
     ```
   For `sglang`, you can use
     ```shell
-    python -m sglang.launch_server ... --json-model-override-args '{"rope_scaling":{"type":"yarn","factor":4.0,"original_max_position_embeddings":32768}}'
     ```
   For `llama-server` from `llama.cpp`, you can use

 ---
 tags:
 - unsloth
+base_model:
+- Qwen/Qwen3-235B-A22B
+library_name: transformers
+license: apache-2.0
+license_link: https://huggingface.co/Qwen/Qwen3-235B-A22B/blob/main/LICENSE
+pipeline_tag: text-generation
 ---
 <div>
 <p style="margin-top: 0;margin-bottom: 0;">
     <em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em>
   </p>
       <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
     </a>
   </div>
 </div>
 # Qwen3-235B-A22B
+<a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
+    <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
+</a>
 ## Qwen3 Highlights
 print("content:", content)
 ```
+For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint:
+- SGLang:
     ```shell
+    python -m sglang.launch_server --model-path Qwen/Qwen3-235B-A22B --reasoning-parser qwen3 --tp 8
     ```
+- vLLM:
     ```shell
+    vllm serve Qwen/Qwen3-235B-A22B --enable-reasoning --reasoning-parser deepseek_r1
     ```
+For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
 ## Switching Between Thinking and Non-Thinking Mode
 > [!TIP]
+> The `enable_thinking` switch is also available in APIs created by SGLang and vLLM.
+> Please refer to our documentation for [SGLang](https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes) and [vLLM](https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes) users.
 ### `enable_thinking=True`
     print(f"Bot: {response_3}")
 ```
+> [!NOTE]
 > For API compatibility, when `enable_thinking=True`, regardless of whether the user uses `/think` or `/no_think`, the model will always output a block wrapped in `<think>...</think>`. However, the content inside this block may be empty if thinking is disabled.
 > When `enable_thinking=False`, the soft switches are not valid. Regardless of any `/think` or `/no_think` tags input by the user, the model will not generate think content and will not include a `<think>...</think>` block.
     {
         ...,
         "rope_scaling": {
+            "rope_type": "yarn",
             "factor": 4.0,
             "original_max_position_embeddings": 32768
         }
   For `vllm`, you can use
     ```shell
+    vllm serve ... --rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072
     ```
   For `sglang`, you can use
     ```shell
+    python -m sglang.launch_server ... --json-model-override-args '{"rope_scaling":{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}}'
     ```
   For `llama-server` from `llama.cpp`, you can use

UD-Q2_K_XL/Qwen3-235B-A22B-128K-UD-Q2_K_XL-00001-of-00002.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d7cb0c0d0229739c6fb1634c17b5219d4ad714f7703781c6ace6fb5baedef89b
-size 49841583584

 version https://git-lfs.github.com/spec/v1
+oid sha256:acf73a6df97371e8e4fa138fff301272761b97e890bc70001075053473e76f1e
+size 49841583520

UD-Q3_K_XL/Qwen3-235B-A22B-128K-UD-Q3_K_XL-00001-of-00003.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:138341249c4d76799ac2bcbf9a968fcddf082592110ffbcbd98324505ec1abf3
-size 49859919424

 version https://git-lfs.github.com/spec/v1
+oid sha256:8dbab61d78ff038332efc5d77d36b3d0628e77b062298d2777f02461789eab5d
+size 49859919360

UD-Q4_K_XL/Qwen3-235B-A22B-128K-UD-Q4_K_XL-00001-of-00003.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d11942c0b62acb21f28296d720040fe6ee14d7f15922f251255dcb89fd80eae5
-size 49875808960

 version https://git-lfs.github.com/spec/v1
+oid sha256:b9d56fb6ac319b3d9c3e498b05f6edbed43de5ee6fbc457710b824674f27a540
+size 49875808896

UD-Q5_K_XL/Qwen3-235B-A22B-128K-UD-Q5_K_XL-00001-of-00004.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9208cfb487d5628625f39240c29347f5b582ae7447a19becfefa425de2b8c201
-size 49835132896

 version https://git-lfs.github.com/spec/v1
+oid sha256:9fc3787072a32911b09a42f36eab9981320447d9569b44aa7048a097c93ad808
+size 49835132864

config.json CHANGED Viewed

@@ -4,7 +4,6 @@
   ],
   "attention_bias": false,
   "attention_dropout": 0.0,
-  "bos_token_id": 151643,
   "decoder_sparse_step": 1,
   "eos_token_id": 151645,
   "head_dim": 128,
@@ -42,4 +41,4 @@
   "use_cache": true,
   "use_sliding_window": false,
   "vocab_size": 151936
-}

   ],
   "attention_bias": false,
   "attention_dropout": 0.0,
   "decoder_sparse_step": 1,
   "eos_token_id": 151645,
   "head_dim": 128,
   "use_cache": true,
   "use_sliding_window": false,
   "vocab_size": 151936
+}