Change library name to transformers (#1)
Browse files- Change library name to transformers (562c9603f1b029689ce68450e29afebf239a1ce2)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
@@ -1,12 +1,12 @@
|
|
1 |
---
|
2 |
-
license: apache-2.0
|
3 |
base_model:
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
- large-reasoning-model
|
8 |
pipeline_tag: text-generation
|
9 |
-
|
|
|
|
|
10 |
---
|
11 |
|
12 |
## Quantization
|
@@ -162,8 +162,10 @@ try:
|
|
162 |
except ValueError:
|
163 |
index = 0
|
164 |
|
165 |
-
thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("
|
166 |
-
|
|
|
|
|
167 |
|
168 |
print("thinking content:", thinking_content)
|
169 |
print("content:", content)
|
@@ -191,17 +193,17 @@ YaRN is currently supported by several inference frameworks, e.g., `transformers
|
|
191 |
- Passing command line arguments:
|
192 |
|
193 |
For `vllm`, you can use
|
194 |
-
|
195 |
vllm serve ... --rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072
|
196 |
-
|
197 |
For `sglang`, you can use
|
198 |
-
|
199 |
python -m sglang.launch_server ... --json-model-override-args '{"rope_scaling":{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}}'
|
200 |
-
|
201 |
For `llama-server` from `llama.cpp`, you can use
|
202 |
-
|
203 |
llama-server ... --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768
|
204 |
-
|
205 |
|
206 |
> [!IMPORTANT]
|
207 |
> If you encounter the following warning
|
@@ -363,4 +365,4 @@ If you find this work is relevant with your research or applications, please fee
|
|
363 |
journal={arXiv preprint arXiv:2505.17667},
|
364 |
year={2025}
|
365 |
}
|
366 |
-
```
|
|
|
1 |
---
|
|
|
2 |
base_model:
|
3 |
+
- Tongyi-Zhiwen/QwenLong-L1-32B
|
4 |
+
library_name: transformers
|
5 |
+
license: apache-2.0
|
|
|
6 |
pipeline_tag: text-generation
|
7 |
+
tags:
|
8 |
+
- long-context
|
9 |
+
- large-reasoning-model
|
10 |
---
|
11 |
|
12 |
## Quantization
|
|
|
162 |
except ValueError:
|
163 |
index = 0
|
164 |
|
165 |
+
thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("
|
166 |
+
")
|
167 |
+
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("
|
168 |
+
")
|
169 |
|
170 |
print("thinking content:", thinking_content)
|
171 |
print("content:", content)
|
|
|
193 |
- Passing command line arguments:
|
194 |
|
195 |
For `vllm`, you can use
|
196 |
+
```shell
|
197 |
vllm serve ... --rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072
|
198 |
+
```
|
199 |
For `sglang`, you can use
|
200 |
+
```shell
|
201 |
python -m sglang.launch_server ... --json-model-override-args '{"rope_scaling":{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}}'
|
202 |
+
```
|
203 |
For `llama-server` from `llama.cpp`, you can use
|
204 |
+
```shell
|
205 |
llama-server ... --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768
|
206 |
+
```
|
207 |
|
208 |
> [!IMPORTANT]
|
209 |
> If you encounter the following warning
|
|
|
365 |
journal={arXiv preprint arXiv:2505.17667},
|
366 |
year={2025}
|
367 |
}
|
368 |
+
```
|