kmouratidis
/

QwenLong-L1-32B-4.25bpw

@@ -1,12 +1,12 @@
 ---
-license: apache-2.0
 base_model:
-  - Tongyi-Zhiwen/QwenLong-L1-32B
-tags:
-  - long-context
-  - large-reasoning-model
 pipeline_tag: text-generation
-library_name: exllamav2
 ---
 ## Quantization
@@ -162,8 +162,10 @@ try:
 except ValueError:
     index = 0
-thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
-content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
 print("thinking content:", thinking_content)
 print("content:", content)
@@ -191,17 +193,17 @@ YaRN is currently supported by several inference frameworks, e.g., `transformers
 - Passing command line arguments:
   For `vllm`, you can use
-  `shell
 vllm serve ... --rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072
-`
   For `sglang`, you can use
-  `shell
 python -m sglang.launch_server ... --json-model-override-args '{"rope_scaling":{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}}'
-`
   For `llama-server` from `llama.cpp`, you can use
-  `shell
 llama-server ... --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768
-`
   > [!IMPORTANT]
   > If you encounter the following warning
@@ -363,4 +365,4 @@ If you find this work is relevant with your research or applications, please fee
   journal={arXiv preprint arXiv:2505.17667},
   year={2025}
 }
-```

 ---
 base_model:
+- Tongyi-Zhiwen/QwenLong-L1-32B
+library_name: transformers
+license: apache-2.0
 pipeline_tag: text-generation
+tags:
+- long-context
+- large-reasoning-model
 ---
 ## Quantization
 except ValueError:
     index = 0
+thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("
+")
+content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("
+")
 print("thinking content:", thinking_content)
 print("content:", content)
 - Passing command line arguments:
   For `vllm`, you can use
+  ```shell
 vllm serve ... --rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072
+```
   For `sglang`, you can use
+  ```shell
 python -m sglang.launch_server ... --json-model-override-args '{"rope_scaling":{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}}'
+```
   For `llama-server` from `llama.cpp`, you can use
+  ```shell
 llama-server ... --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768
+```
   > [!IMPORTANT]
   > If you encounter the following warning
   journal={arXiv preprint arXiv:2505.17667},
   year={2025}
 }
+```