bys0318 commited on
Commit
34058b8
·
verified ·
1 Parent(s): 5950b78

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -1
README.md CHANGED
@@ -60,7 +60,7 @@ LongWriter-Zero’s effectiveness is demonstrated on two fronts: **WritingBench*
60
 
61
  ---
62
 
63
- ### 🏆 Human-in-the-Loop Win-Rate
64
 
65
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63369da91ba5d5ece24118a4/_uWmcqnMFWGLN_iQb1bdx.png)
66
 
@@ -70,6 +70,46 @@ LongWriter-Zero’s effectiveness is demonstrated on two fronts: **WritingBench*
70
 
71
 
72
  <a name="quick_start"></a>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
  ## ⚡ Quick Start&nbsp;(SGlang)
74
 
75
  The snippet below shows how to format prompts with LongWriter-Zero’s `<think> … </think><answer> … </answer>` protocol and call the model through an SGlang-powered endpoint supporting streaming responses.
 
60
 
61
  ---
62
 
63
+ ### 🏆 Win-Rate Results
64
 
65
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63369da91ba5d5ece24118a4/_uWmcqnMFWGLN_iQb1bdx.png)
66
 
 
70
 
71
 
72
  <a name="quick_start"></a>
73
+ ## ⚡ Quick Start&nbsp;(HF generate)
74
+
75
+ ```python
76
+ model_name = "THU-KEG/LongWriter-Zero-32B"
77
+
78
+ model = AutoModelForCausalLM.from_pretrained(
79
+ model_name,
80
+ torch_dtype="auto",
81
+ device_map="auto"
82
+ )
83
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
84
+
85
+ prompt = "Write a 500-word story."
86
+ messages = [
87
+ {"role": "user", "content": prompt}
88
+ ]
89
+ text = tokenizer.apply_chat_template(
90
+ messages,
91
+ tokenize=False,
92
+ add_generation_prompt=True
93
+ )
94
+
95
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
96
+
97
+ generated_ids = model.generate(
98
+ **model_inputs,
99
+ max_new_tokens=2048,
100
+ stop_strings=["<|user|>", "<|endoftext|>", "</answer>"],
101
+ tokenizer=tokenizer
102
+ )
103
+ generated_ids = [
104
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
105
+ ]
106
+
107
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
108
+
109
+ print(response)
110
+ ```
111
+ *Note: We use a slightly different tokenizer and chat template compared to the original Qwen2.5-32B-Instruct model.*
112
+
113
  ## ⚡ Quick Start&nbsp;(SGlang)
114
 
115
  The snippet below shows how to format prompts with LongWriter-Zero’s `<think> … </think><answer> … </answer>` protocol and call the model through an SGlang-powered endpoint supporting streaming responses.