kmouratidis nielsr HF Staff commited on
Commit
7846fe5
·
verified ·
1 Parent(s): fb95e65

Change library name to transformers (#1)

Browse files

- Change library name to transformers (562c9603f1b029689ce68450e29afebf239a1ce2)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +17 -15
README.md CHANGED
@@ -1,12 +1,12 @@
1
  ---
2
- license: apache-2.0
3
  base_model:
4
- - Tongyi-Zhiwen/QwenLong-L1-32B
5
- tags:
6
- - long-context
7
- - large-reasoning-model
8
  pipeline_tag: text-generation
9
- library_name: exllamav2
 
 
10
  ---
11
 
12
  ## Quantization
@@ -162,8 +162,10 @@ try:
162
  except ValueError:
163
  index = 0
164
 
165
- thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
166
- content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
 
 
167
 
168
  print("thinking content:", thinking_content)
169
  print("content:", content)
@@ -191,17 +193,17 @@ YaRN is currently supported by several inference frameworks, e.g., `transformers
191
  - Passing command line arguments:
192
 
193
  For `vllm`, you can use
194
- `shell
195
  vllm serve ... --rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072
196
- `
197
  For `sglang`, you can use
198
- `shell
199
  python -m sglang.launch_server ... --json-model-override-args '{"rope_scaling":{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}}'
200
- `
201
  For `llama-server` from `llama.cpp`, you can use
202
- `shell
203
  llama-server ... --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768
204
- `
205
 
206
  > [!IMPORTANT]
207
  > If you encounter the following warning
@@ -363,4 +365,4 @@ If you find this work is relevant with your research or applications, please fee
363
  journal={arXiv preprint arXiv:2505.17667},
364
  year={2025}
365
  }
366
- ```
 
1
  ---
 
2
  base_model:
3
+ - Tongyi-Zhiwen/QwenLong-L1-32B
4
+ library_name: transformers
5
+ license: apache-2.0
 
6
  pipeline_tag: text-generation
7
+ tags:
8
+ - long-context
9
+ - large-reasoning-model
10
  ---
11
 
12
  ## Quantization
 
162
  except ValueError:
163
  index = 0
164
 
165
+ thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("
166
+ ")
167
+ content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("
168
+ ")
169
 
170
  print("thinking content:", thinking_content)
171
  print("content:", content)
 
193
  - Passing command line arguments:
194
 
195
  For `vllm`, you can use
196
+ ```shell
197
  vllm serve ... --rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072
198
+ ```
199
  For `sglang`, you can use
200
+ ```shell
201
  python -m sglang.launch_server ... --json-model-override-args '{"rope_scaling":{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}}'
202
+ ```
203
  For `llama-server` from `llama.cpp`, you can use
204
+ ```shell
205
  llama-server ... --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768
206
+ ```
207
 
208
  > [!IMPORTANT]
209
  > If you encounter the following warning
 
365
  journal={arXiv preprint arXiv:2505.17667},
366
  year={2025}
367
  }
368
+ ```