electroglyph commited on
Commit
ac20022
·
verified ·
1 Parent(s): 7207001

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,625 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - Qwen/Qwen3-0.6B-Base
5
+ tags:
6
+ - transformers
7
+ - sentence-transformers
8
+ - sentence-similarity
9
+ - feature-extraction
10
+ ---
11
+
12
+ # Qwen3-Embedding-0.6B-onnx-int4
13
+
14
+ This is an onnx version of https://huggingface.co/Qwen/Qwen3-Embedding-0.6B
15
+
16
+ This model has been dynamically quantized to int4/uint8, and further modified to output a uint8 1024 dim tensor.
17
+
18
+ You probably don't want to use this model on CPU. I've tested on a Ryzen CPU with VNNI, and it's the same speed as the base f32 model, but with 2% less retrieval accuracy. I'm posting it here in case it's useful for GPU users. Not sure if it actually is, but I already made it so here it is.
19
+
20
+ This model is compatible with qdrant fastembed, please note these details:
21
+
22
+ - Execute model without pooling and without normalization
23
+ - Pay attention to the example query format in the code below
24
+
25
+ # Quantization method
26
+
27
+ I did an int4 quantization pass with block size == 128 (block size 32 was extremely close in accuracy), with the same nodes excluded as from my uint8 model.
28
+
29
+ Then I quantized the remaining non-excluded nodes to uint8 the same way as here: https://huggingface.co/electroglyph/Qwen3-Embedding-0.6B-onnx-uint8
30
+
31
+ <details>
32
+ <summary>Here are the nodes I excluded</summary>
33
+
34
+ ```python
35
+ ["/0/auto_model/ConstantOfShape",
36
+ "/0/auto_model/Constant_28",
37
+ "/0/auto_model/layers.25/post_attention_layernorm/Pow",
38
+ "/0/auto_model/layers.26/input_layernorm/Pow",
39
+ "/0/auto_model/layers.25/input_layernorm/Pow",
40
+ "/0/auto_model/layers.24/post_attention_layernorm/Pow",
41
+ "/0/auto_model/layers.24/input_layernorm/Pow",
42
+ "/0/auto_model/layers.23/post_attention_layernorm/Pow",
43
+ "/0/auto_model/layers.23/input_layernorm/Pow",
44
+ "/0/auto_model/layers.22/post_attention_layernorm/Pow",
45
+ "/0/auto_model/layers.22/input_layernorm/Pow",
46
+ "/0/auto_model/layers.3/input_layernorm/Pow",
47
+ "/0/auto_model/layers.4/input_layernorm/Pow",
48
+ "/0/auto_model/layers.3/post_attention_layernorm/Pow",
49
+ "/0/auto_model/layers.21/post_attention_layernorm/Pow",
50
+ "/0/auto_model/layers.5/input_layernorm/Pow",
51
+ "/0/auto_model/layers.4/post_attention_layernorm/Pow",
52
+ "/0/auto_model/layers.5/post_attention_layernorm/Pow",
53
+ "/0/auto_model/layers.6/input_layernorm/Pow",
54
+ "/0/auto_model/layers.6/post_attention_layernorm/Pow",
55
+ "/0/auto_model/layers.7/input_layernorm/Pow",
56
+ "/0/auto_model/layers.8/input_layernorm/Pow",
57
+ "/0/auto_model/layers.7/post_attention_layernorm/Pow",
58
+ "/0/auto_model/layers.26/post_attention_layernorm/Pow",
59
+ "/0/auto_model/layers.9/input_layernorm/Pow",
60
+ "/0/auto_model/layers.8/post_attention_layernorm/Pow",
61
+ "/0/auto_model/layers.21/input_layernorm/Pow",
62
+ "/0/auto_model/layers.20/post_attention_layernorm/Pow",
63
+ "/0/auto_model/layers.9/post_attention_layernorm/Pow",
64
+ "/0/auto_model/layers.10/input_layernorm/Pow",
65
+ "/0/auto_model/layers.20/input_layernorm/Pow",
66
+ "/0/auto_model/layers.11/input_layernorm/Pow",
67
+ "/0/auto_model/layers.10/post_attention_layernorm/Pow",
68
+ "/0/auto_model/layers.12/input_layernorm/Pow",
69
+ "/0/auto_model/layers.11/post_attention_layernorm/Pow",
70
+ "/0/auto_model/layers.12/post_attention_layernorm/Pow",
71
+ "/0/auto_model/layers.13/input_layernorm/Pow",
72
+ "/0/auto_model/layers.19/post_attention_layernorm/Pow",
73
+ "/0/auto_model/layers.13/post_attention_layernorm/Pow",
74
+ "/0/auto_model/layers.14/input_layernorm/Pow",
75
+ "/0/auto_model/layers.19/input_layernorm/Pow",
76
+ "/0/auto_model/layers.18/post_attention_layernorm/Pow",
77
+ "/0/auto_model/layers.14/post_attention_layernorm/Pow",
78
+ "/0/auto_model/layers.15/input_layernorm/Pow",
79
+ "/0/auto_model/layers.16/input_layernorm/Pow",
80
+ "/0/auto_model/layers.15/post_attention_layernorm/Pow",
81
+ "/0/auto_model/layers.18/input_layernorm/Pow",
82
+ "/0/auto_model/layers.17/post_attention_layernorm/Pow",
83
+ "/0/auto_model/layers.17/input_layernorm/Pow",
84
+ "/0/auto_model/layers.16/post_attention_layernorm/Pow",
85
+ "/0/auto_model/layers.27/post_attention_layernorm/Pow",
86
+ "/0/auto_model/layers.27/input_layernorm/Pow",
87
+ "/0/auto_model/norm/Pow",
88
+ "/0/auto_model/layers.25/post_attention_layernorm/ReduceMean",
89
+ "/0/auto_model/layers.25/post_attention_layernorm/Add",
90
+ "/0/auto_model/layers.26/input_layernorm/Add",
91
+ "/0/auto_model/layers.26/input_layernorm/ReduceMean",
92
+ "/0/auto_model/layers.25/input_layernorm/ReduceMean",
93
+ "/0/auto_model/layers.25/input_layernorm/Add",
94
+ "/0/auto_model/layers.24/post_attention_layernorm/ReduceMean",
95
+ "/0/auto_model/layers.24/post_attention_layernorm/Add",
96
+ "/0/auto_model/layers.24/input_layernorm/Add",
97
+ "/0/auto_model/layers.24/input_layernorm/ReduceMean",
98
+ "/0/auto_model/layers.23/post_attention_layernorm/Add",
99
+ "/0/auto_model/layers.23/post_attention_layernorm/ReduceMean",
100
+ "/0/auto_model/layers.23/input_layernorm/ReduceMean",
101
+ "/0/auto_model/layers.23/input_layernorm/Add",
102
+ "/0/auto_model/layers.22/post_attention_layernorm/ReduceMean",
103
+ "/0/auto_model/layers.22/post_attention_layernorm/Add",
104
+ "/0/auto_model/layers.26/post_attention_layernorm/ReduceMean",
105
+ "/0/auto_model/layers.26/post_attention_layernorm/Add",
106
+ "/0/auto_model/layers.22/input_layernorm/ReduceMean",
107
+ "/0/auto_model/layers.22/input_layernorm/Add",
108
+ "/0/auto_model/layers.3/input_layernorm/Add",
109
+ "/0/auto_model/layers.3/input_layernorm/ReduceMean",
110
+ "/0/auto_model/layers.21/post_attention_layernorm/ReduceMean",
111
+ "/0/auto_model/layers.21/post_attention_layernorm/Add",
112
+ "/0/auto_model/layers.4/input_layernorm/Add",
113
+ "/0/auto_model/layers.4/input_layernorm/ReduceMean",
114
+ "/0/auto_model/layers.3/post_attention_layernorm/Add",
115
+ "/0/auto_model/layers.3/post_attention_layernorm/ReduceMean",
116
+ "/0/auto_model/layers.5/input_layernorm/Add",
117
+ "/0/auto_model/layers.5/input_layernorm/ReduceMean",
118
+ "/0/auto_model/layers.4/post_attention_layernorm/ReduceMean",
119
+ "/0/auto_model/layers.4/post_attention_layernorm/Add",
120
+ "/0/auto_model/layers.5/post_attention_layernorm/Add",
121
+ "/0/auto_model/layers.5/post_attention_layernorm/ReduceMean",
122
+ "/0/auto_model/layers.6/input_layernorm/Add",
123
+ "/0/auto_model/layers.6/input_layernorm/ReduceMean",
124
+ "/0/auto_model/layers.6/post_attention_layernorm/Add",
125
+ "/0/auto_model/layers.6/post_attention_layernorm/ReduceMean",
126
+ "/0/auto_model/layers.7/input_layernorm/Add",
127
+ "/0/auto_model/layers.7/input_layernorm/ReduceMean",
128
+ "/0/auto_model/layers.8/input_layernorm/ReduceMean",
129
+ "/0/auto_model/layers.8/input_layernorm/Add",
130
+ "/0/auto_model/layers.7/post_attention_layernorm/Add",
131
+ "/0/auto_model/layers.7/post_attention_layernorm/ReduceMean",
132
+ "/0/auto_model/layers.9/input_layernorm/Add",
133
+ "/0/auto_model/layers.9/input_layernorm/ReduceMean",
134
+ "/0/auto_model/layers.8/post_attention_layernorm/Add",
135
+ "/0/auto_model/layers.8/post_attention_layernorm/ReduceMean",
136
+ "/0/auto_model/layers.21/input_layernorm/Add",
137
+ "/0/auto_model/layers.21/input_layernorm/ReduceMean",
138
+ "/0/auto_model/layers.20/post_attention_layernorm/Add",
139
+ "/0/auto_model/layers.20/post_attention_layernorm/ReduceMean",
140
+ "/0/auto_model/layers.9/post_attention_layernorm/ReduceMean",
141
+ "/0/auto_model/layers.9/post_attention_layernorm/Add",
142
+ "/0/auto_model/layers.10/input_layernorm/ReduceMean",
143
+ "/0/auto_model/layers.10/input_layernorm/Add",
144
+ "/0/auto_model/layers.20/input_layernorm/Add",
145
+ "/0/auto_model/layers.20/input_layernorm/ReduceMean",
146
+ "/0/auto_model/layers.11/input_layernorm/ReduceMean",
147
+ "/0/auto_model/layers.11/input_layernorm/Add",
148
+ "/0/auto_model/layers.10/post_attention_layernorm/ReduceMean",
149
+ "/0/auto_model/layers.10/post_attention_layernorm/Add",
150
+ "/0/auto_model/layers.12/input_layernorm/ReduceMean",
151
+ "/0/auto_model/layers.12/input_layernorm/Add",
152
+ "/0/auto_model/layers.11/post_attention_layernorm/Add",
153
+ "/0/auto_model/layers.11/post_attention_layernorm/ReduceMean",
154
+ "/0/auto_model/layers.12/post_attention_layernorm/ReduceMean",
155
+ "/0/auto_model/layers.12/post_attention_layernorm/Add",
156
+ "/0/auto_model/layers.13/input_layernorm/Add",
157
+ "/0/auto_model/layers.13/input_layernorm/ReduceMean",
158
+ "/0/auto_model/layers.19/post_attention_layernorm/Add",
159
+ "/0/auto_model/layers.19/post_attention_layernorm/ReduceMean",
160
+ "/0/auto_model/layers.13/post_attention_layernorm/ReduceMean",
161
+ "/0/auto_model/layers.13/post_attention_layernorm/Add",
162
+ "/0/auto_model/layers.14/input_layernorm/Add",
163
+ "/0/auto_model/layers.14/input_layernorm/ReduceMean",
164
+ "/0/auto_model/layers.19/input_layernorm/ReduceMean",
165
+ "/0/auto_model/layers.19/input_layernorm/Add",
166
+ "/0/auto_model/layers.18/post_attention_layernorm/ReduceMean",
167
+ "/0/auto_model/layers.18/post_attention_layernorm/Add",
168
+ "/0/auto_model/layers.14/post_attention_layernorm/ReduceMean",
169
+ "/0/auto_model/layers.14/post_attention_layernorm/Add",
170
+ "/0/auto_model/layers.15/input_layernorm/ReduceMean",
171
+ "/0/auto_model/layers.15/input_layernorm/Add",
172
+ "/0/auto_model/layers.16/input_layernorm/Add",
173
+ "/0/auto_model/layers.16/input_layernorm/ReduceMean",
174
+ "/0/auto_model/layers.15/post_attention_layernorm/Add",
175
+ "/0/auto_model/layers.15/post_attention_layernorm/ReduceMean",
176
+ "/0/auto_model/layers.18/input_layernorm/Add",
177
+ "/0/auto_model/layers.18/input_layernorm/ReduceMean",
178
+ "/0/auto_model/layers.17/post_attention_layernorm/Add",
179
+ "/0/auto_model/layers.17/post_attention_layernorm/ReduceMean",
180
+ "/0/auto_model/layers.17/input_layernorm/ReduceMean",
181
+ "/0/auto_model/layers.17/input_layernorm/Add",
182
+ "/0/auto_model/layers.16/post_attention_layernorm/Add",
183
+ "/0/auto_model/layers.16/post_attention_layernorm/ReduceMean",
184
+ "/0/auto_model/layers.27/post_attention_layernorm/Add",
185
+ "/0/auto_model/layers.27/post_attention_layernorm/ReduceMean",
186
+ "/0/auto_model/layers.27/input_layernorm/Add",
187
+ "/0/auto_model/layers.27/input_layernorm/ReduceMean",
188
+ "/0/auto_model/layers.27/self_attn/q_norm/Pow",
189
+ "/0/auto_model/layers.14/self_attn/k_norm/Pow",
190
+ "/0/auto_model/layers.26/self_attn/q_norm/Pow",
191
+ "/0/auto_model/layers.25/self_attn/q_norm/Pow",
192
+ "/0/auto_model/layers.26/self_attn/k_norm/Pow",
193
+ "/0/auto_model/layers.8/self_attn/k_norm/Pow",
194
+ "/0/auto_model/layers.24/self_attn/k_norm/Pow",
195
+ "/0/auto_model/layers.24/self_attn/q_norm/Pow",
196
+ "/0/auto_model/layers.25/self_attn/k_norm/Pow",
197
+ "/0/auto_model/layers.23/self_attn/q_norm/Pow",
198
+ "/0/auto_model/layers.27/self_attn/k_norm/Pow",
199
+ "/0/auto_model/layers.12/self_attn/k_norm/Pow",
200
+ "/0/auto_model/layers.13/self_attn/k_norm/Pow",
201
+ "/0/auto_model/layers.2/mlp/down_proj/MatMul",
202
+ "/0/auto_model/layers.3/post_attention_layernorm/Cast",
203
+ "/0/auto_model/layers.3/Add",
204
+ "/0/auto_model/layers.3/Add_1",
205
+ "/0/auto_model/layers.4/input_layernorm/Cast",
206
+ "/0/auto_model/layers.3/input_layernorm/Cast",
207
+ "/0/auto_model/layers.2/Add_1",
208
+ "/0/auto_model/layers.4/Add",
209
+ "/0/auto_model/layers.4/post_attention_layernorm/Cast",
210
+ "/0/auto_model/layers.5/input_layernorm/Cast",
211
+ "/0/auto_model/layers.4/Add_1",
212
+ "/0/auto_model/layers.5/post_attention_layernorm/Cast",
213
+ "/0/auto_model/layers.5/Add",
214
+ "/0/auto_model/layers.5/Add_1",
215
+ "/0/auto_model/layers.6/input_layernorm/Cast",
216
+ "/0/auto_model/layers.7/Add_1",
217
+ "/0/auto_model/layers.8/input_layernorm/Cast",
218
+ "/0/auto_model/layers.7/Add",
219
+ "/0/auto_model/layers.7/post_attention_layernorm/Cast",
220
+ "/0/auto_model/layers.6/Add",
221
+ "/0/auto_model/layers.6/post_attention_layernorm/Cast",
222
+ "/0/auto_model/layers.6/Add_1",
223
+ "/0/auto_model/layers.7/input_layernorm/Cast",
224
+ "/0/auto_model/layers.8/Add",
225
+ "/0/auto_model/layers.8/post_attention_layernorm/Cast",
226
+ "/0/auto_model/layers.9/input_layernorm/Cast",
227
+ "/0/auto_model/layers.8/Add_1",
228
+ "/0/auto_model/layers.9/post_attention_layernorm/Cast",
229
+ "/0/auto_model/layers.9/Add",
230
+ "/0/auto_model/layers.9/Add_1",
231
+ "/0/auto_model/layers.10/input_layernorm/Cast",
232
+ "/0/auto_model/layers.11/input_layernorm/Cast",
233
+ "/0/auto_model/layers.10/Add_1",
234
+ "/0/auto_model/layers.10/Add",
235
+ "/0/auto_model/layers.10/post_attention_layernorm/Cast",
236
+ "/0/auto_model/layers.11/Add",
237
+ "/0/auto_model/layers.11/post_attention_layernorm/Cast",
238
+ "/0/auto_model/layers.11/Add_1",
239
+ "/0/auto_model/layers.12/input_layernorm/Cast",
240
+ "/0/auto_model/layers.12/Add",
241
+ "/0/auto_model/layers.12/post_attention_layernorm/Cast",
242
+ "/0/auto_model/layers.12/Add_1",
243
+ "/0/auto_model/layers.13/input_layernorm/Cast",
244
+ "/0/auto_model/layers.13/Add",
245
+ "/0/auto_model/layers.13/post_attention_layernorm/Cast",
246
+ "/0/auto_model/layers.14/input_layernorm/Cast",
247
+ "/0/auto_model/layers.13/Add_1",
248
+ "/0/auto_model/layers.14/Add_1",
249
+ "/0/auto_model/layers.15/input_layernorm/Cast",
250
+ "/0/auto_model/layers.14/post_attention_layernorm/Cast",
251
+ "/0/auto_model/layers.14/Add",
252
+ "/0/auto_model/layers.15/post_attention_layernorm/Cast",
253
+ "/0/auto_model/layers.15/Add_1",
254
+ "/0/auto_model/layers.16/input_layernorm/Cast",
255
+ "/0/auto_model/layers.15/Add",
256
+ "/0/auto_model/layers.17/input_layernorm/Cast",
257
+ "/0/auto_model/layers.16/Add_1",
258
+ "/0/auto_model/layers.16/Add",
259
+ "/0/auto_model/layers.16/post_attention_layernorm/Cast",
260
+ "/0/auto_model/layers.19/input_layernorm/Cast",
261
+ "/0/auto_model/layers.18/Add_1",
262
+ "/0/auto_model/layers.18/input_layernorm/Cast",
263
+ "/0/auto_model/layers.17/Add_1",
264
+ "/0/auto_model/layers.17/Add",
265
+ "/0/auto_model/layers.17/post_attention_layernorm/Cast",
266
+ "/0/auto_model/layers.18/post_attention_layernorm/Cast",
267
+ "/0/auto_model/layers.18/Add",
268
+ "/0/auto_model/layers.19/Add",
269
+ "/0/auto_model/layers.19/post_attention_layernorm/Cast",
270
+ "/0/auto_model/layers.22/Add_1",
271
+ "/0/auto_model/layers.23/input_layernorm/Cast",
272
+ "/0/auto_model/layers.20/Add_1",
273
+ "/0/auto_model/layers.21/input_layernorm/Cast",
274
+ "/0/auto_model/layers.21/Add_1",
275
+ "/0/auto_model/layers.22/input_layernorm/Cast",
276
+ "/0/auto_model/layers.19/Add_1",
277
+ "/0/auto_model/layers.20/input_layernorm/Cast",
278
+ "/0/auto_model/layers.24/input_layernorm/Cast",
279
+ "/0/auto_model/layers.23/Add_1",
280
+ "/0/auto_model/layers.22/Add",
281
+ "/0/auto_model/layers.22/post_attention_layernorm/Cast",
282
+ "/0/auto_model/layers.21/Add",
283
+ "/0/auto_model/layers.21/post_attention_layernorm/Cast",
284
+ "/0/auto_model/layers.20/Add",
285
+ "/0/auto_model/layers.20/post_attention_layernorm/Cast",
286
+ "/0/auto_model/layers.23/post_attention_layernorm/Cast",
287
+ "/0/auto_model/layers.23/Add",
288
+ "/0/auto_model/layers.25/input_layernorm/Cast",
289
+ "/0/auto_model/layers.24/Add_1",
290
+ "/0/auto_model/layers.24/post_attention_layernorm/Cast",
291
+ "/0/auto_model/layers.24/Add",
292
+ "/0/auto_model/layers.25/Add",
293
+ "/0/auto_model/layers.25/post_attention_layernorm/Cast",
294
+ "/0/auto_model/layers.25/Add_1",
295
+ "/0/auto_model/layers.26/input_layernorm/Cast",
296
+ "/0/auto_model/layers.26/Add",
297
+ "/0/auto_model/layers.26/post_attention_layernorm/Cast",
298
+ "/0/auto_model/layers.21/self_attn/q_norm/Pow",
299
+ "/0/auto_model/layers.26/Add_1",
300
+ "/0/auto_model/layers.27/input_layernorm/Cast",
301
+ "/0/auto_model/layers.27/Add",
302
+ "/0/auto_model/layers.27/post_attention_layernorm/Cast",
303
+ "/0/auto_model/norm/Add",
304
+ "/0/auto_model/norm/ReduceMean",
305
+ "/0/auto_model/layers.23/self_attn/k_norm/Pow",
306
+ "/0/auto_model/layers.21/self_attn/k_norm/Pow",
307
+ "/0/auto_model/layers.22/self_attn/k_norm/Pow",
308
+ "/0/auto_model/layers.10/self_attn/k_norm/Pow",
309
+ "/0/auto_model/layers.19/self_attn/q_norm/Pow",
310
+ "/0/auto_model/layers.2/mlp/Mul",
311
+ "/0/auto_model/layers.22/self_attn/q_norm/Pow",
312
+ "/0/auto_model/layers.11/self_attn/k_norm/Pow",
313
+ "/0/auto_model/layers.20/self_attn/q_norm/Pow",
314
+ "/0/auto_model/layers.20/self_attn/k_norm/Pow",
315
+ "/0/auto_model/layers.18/self_attn/q_norm/Pow",
316
+ "/0/auto_model/layers.17/self_attn/q_norm/Pow",
317
+ "/0/auto_model/layers.27/mlp/down_proj/MatMul",
318
+ "/0/auto_model/layers.19/self_attn/k_norm/Pow",
319
+ "/0/auto_model/layers.27/Add_1",
320
+ "/0/auto_model/norm/Cast",
321
+ "/0/auto_model/layers.16/self_attn/k_norm/Pow",
322
+ "/0/auto_model/layers.18/self_attn/k_norm/Pow",
323
+ "/0/auto_model/layers.11/self_attn/q_norm/Pow",
324
+ "/0/auto_model/layers.9/self_attn/q_norm/Pow",
325
+ "/0/auto_model/layers.26/self_attn/q_norm/Add",
326
+ "/0/auto_model/layers.26/self_attn/q_norm/ReduceMean",
327
+ "/0/auto_model/layers.14/self_attn/k_norm/Add",
328
+ "/0/auto_model/layers.14/self_attn/k_norm/ReduceMean",
329
+ "/0/auto_model/layers.16/self_attn/q_norm/Pow",
330
+ "/0/auto_model/layers.27/mlp/Mul",
331
+ "/0/auto_model/layers.27/self_attn/q_norm/ReduceMean",
332
+ "/0/auto_model/layers.27/self_attn/q_norm/Add",
333
+ "/0/auto_model/layers.9/self_attn/k_norm/Pow",
334
+ "/0/auto_model/layers.17/self_attn/k_norm/Pow",
335
+ "/0/auto_model/layers.26/self_attn/k_norm/ReduceMean",
336
+ "/0/auto_model/layers.26/self_attn/k_norm/Add",
337
+ "/0/auto_model/layers.25/self_attn/k_norm/Add",
338
+ "/0/auto_model/layers.25/self_attn/k_norm/ReduceMean",
339
+ "/0/auto_model/layers.13/self_attn/k_norm/Add",
340
+ "/0/auto_model/layers.13/self_attn/k_norm/ReduceMean",
341
+ "/0/auto_model/layers.10/self_attn/q_norm/Pow",
342
+ "/0/auto_model/layers.25/input_layernorm/Mul_1",
343
+ "/0/auto_model/layers.27/self_attn/k_norm/ReduceMean",
344
+ "/0/auto_model/layers.27/self_attn/k_norm/Add",
345
+ "/0/auto_model/layers.26/input_layernorm/Mul_1",
346
+ "/0/auto_model/layers.15/self_attn/q_norm/Pow",
347
+ "/0/auto_model/layers.12/self_attn/k_norm/Add",
348
+ "/0/auto_model/layers.12/self_attn/k_norm/ReduceMean",
349
+ "/0/auto_model/layers.25/self_attn/q_norm/Add",
350
+ "/0/auto_model/layers.25/self_attn/q_norm/ReduceMean",
351
+ "/0/auto_model/layers.24/input_layernorm/Mul_1",
352
+ "/0/auto_model/layers.12/self_attn/q_norm/Pow",
353
+ "/0/auto_model/layers.24/self_attn/q_norm/ReduceMean",
354
+ "/0/auto_model/layers.24/self_attn/q_norm/Add",
355
+ "/0/auto_model/layers.24/self_attn/k_norm/ReduceMean",
356
+ "/0/auto_model/layers.24/self_attn/k_norm/Add",
357
+ "/0/auto_model/layers.22/mlp/Mul",
358
+ "/0/auto_model/layers.2/post_attention_layernorm/Pow",
359
+ "/0/auto_model/layers.23/mlp/Mul",
360
+ "/0/auto_model/layers.24/mlp/Mul",
361
+ "/0/auto_model/layers.23/input_layernorm/Mul_1",
362
+ "/0/auto_model/layers.14/self_attn/q_norm/Pow",
363
+ "/0/auto_model/layers.14/self_attn/k_proj/MatMul",
364
+ "/0/auto_model/layers.14/self_attn/k_norm/Cast",
365
+ "/0/auto_model/layers.14/self_attn/Reshape_1",
366
+ "/0/auto_model/layers.21/mlp/Mul",
367
+ "/0/auto_model/layers.3/post_attention_layernorm/Sqrt",
368
+ "/0/auto_model/layers.3/input_layernorm/Sqrt",
369
+ "/0/auto_model/layers.4/input_layernorm/Sqrt",
370
+ "/0/auto_model/layers.5/input_layernorm/Sqrt",
371
+ "/0/auto_model/layers.4/post_attention_layernorm/Sqrt",
372
+ "/0/auto_model/layers.5/post_attention_layernorm/Sqrt",
373
+ "/0/auto_model/layers.6/input_layernorm/Sqrt",
374
+ "/0/auto_model/layers.6/post_attention_layernorm/Sqrt",
375
+ "/0/auto_model/layers.8/input_layernorm/Sqrt",
376
+ "/0/auto_model/layers.8/post_attention_layernorm/Sqrt",
377
+ "/0/auto_model/layers.7/post_attention_layernorm/Sqrt",
378
+ "/0/auto_model/layers.7/input_layernorm/Sqrt",
379
+ "/0/auto_model/layers.9/input_layernorm/Sqrt",
380
+ "/0/auto_model/layers.10/input_layernorm/Sqrt",
381
+ "/0/auto_model/layers.9/post_attention_layernorm/Sqrt",
382
+ "/0/auto_model/layers.11/input_layernorm/Sqrt",
383
+ "/0/auto_model/layers.10/post_attention_layernorm/Sqrt",
384
+ "/0/auto_model/layers.12/post_attention_layernorm/Sqrt",
385
+ "/0/auto_model/layers.11/post_attention_layernorm/Sqrt",
386
+ "/0/auto_model/layers.12/input_layernorm/Sqrt",
387
+ "/0/auto_model/layers.13/input_layernorm/Sqrt",
388
+ "/0/auto_model/layers.14/input_layernorm/Sqrt",
389
+ "/0/auto_model/layers.13/post_attention_layernorm/Sqrt",
390
+ "/0/auto_model/layers.15/input_layernorm/Sqrt",
391
+ "/0/auto_model/layers.14/post_attention_layernorm/Sqrt",
392
+ "/0/auto_model/layers.16/input_layernorm/Sqrt",
393
+ "/0/auto_model/layers.15/post_attention_layernorm/Sqrt",
394
+ "/0/auto_model/layers.17/input_layernorm/Sqrt",
395
+ "/0/auto_model/layers.16/post_attention_layernorm/Sqrt",
396
+ "/0/auto_model/layers.19/input_layernorm/Sqrt",
397
+ "/0/auto_model/layers.17/post_attention_layernorm/Sqrt",
398
+ "/0/auto_model/layers.18/input_layernorm/Sqrt",
399
+ "/0/auto_model/layers.18/post_attention_layernorm/Sqrt",
400
+ "/0/auto_model/layers.19/post_attention_layernorm/Sqrt",
401
+ "/0/auto_model/layers.23/input_layernorm/Sqrt",
402
+ "/0/auto_model/layers.20/input_layernorm/Sqrt",
403
+ "/0/auto_model/layers.21/input_layernorm/Sqrt",
404
+ "/0/auto_model/layers.22/input_layernorm/Sqrt",
405
+ "/0/auto_model/layers.22/post_attention_layernorm/Sqrt",
406
+ "/0/auto_model/layers.24/input_layernorm/Sqrt",
407
+ "/0/auto_model/layers.20/post_attention_layernorm/Sqrt",
408
+ "/0/auto_model/layers.21/post_attention_layernorm/Sqrt",
409
+ "/0/auto_model/layers.23/post_attention_layernorm/Sqrt",
410
+ "/0/auto_model/layers.25/input_layernorm/Sqrt",
411
+ "/0/auto_model/layers.24/post_attention_layernorm/Sqrt",
412
+ "/0/auto_model/layers.25/post_attention_layernorm/Sqrt",
413
+ "/0/auto_model/layers.26/input_layernorm/Sqrt",
414
+ "/0/auto_model/layers.26/post_attention_layernorm/Sqrt",
415
+ "/0/auto_model/layers.15/self_attn/k_norm/Pow",
416
+ "/0/auto_model/layers.27/input_layernorm/Sqrt",
417
+ "/0/auto_model/layers.27/post_attention_layernorm/Sqrt",
418
+ "/0/auto_model/layers.2/input_layernorm/Pow",
419
+ "/0/auto_model/layers.26/mlp/Mul",
420
+ "/0/auto_model/layers.23/self_attn/q_norm/Add",
421
+ "/0/auto_model/layers.23/self_attn/q_norm/ReduceMean",
422
+ "/0/auto_model/layers.13/self_attn/q_norm/Pow",
423
+ "/0/auto_model/layers.21/self_attn/q_norm/Add",
424
+ "/0/auto_model/layers.21/self_attn/q_norm/ReduceMean",
425
+ "/0/auto_model/layers.6/self_attn/q_norm/Pow",
426
+ "/0/auto_model/layers.27/self_attn/Reshape_7",
427
+ "/0/auto_model/layers.27/self_attn/MatMul_1",
428
+ "/0/auto_model/layers.27/self_attn/Transpose_4",
429
+ "/0/auto_model/layers.26/self_attn/Expand_1",
430
+ "/0/auto_model/layers.26/self_attn/Unsqueeze_19",
431
+ "/0/auto_model/layers.26/self_attn/v_proj/MatMul",
432
+ "/0/auto_model/layers.26/self_attn/Transpose_2",
433
+ "/0/auto_model/layers.26/self_attn/Reshape_6",
434
+ "/0/auto_model/layers.26/self_attn/Reshape_2",
435
+ "/0/auto_model/layers.11/self_attn/k_norm/ReduceMean",
436
+ "/0/auto_model/layers.11/self_attn/k_norm/Add",
437
+ "/0/auto_model/layers.22/input_layernorm/Mul_1",
438
+ "/0/auto_model/layers.25/mlp/Mul",
439
+ "/0/auto_model/layers.8/self_attn/k_norm/Cast",
440
+ "/0/auto_model/layers.8/self_attn/k_proj/MatMul",
441
+ "/0/auto_model/layers.8/self_attn/Reshape_1",
442
+ "/0/auto_model/layers.21/input_layernorm/Mul_1",
443
+ "/0/auto_model/layers.5/self_attn/q_norm/Pow",
444
+ "/0/auto_model/layers.22/self_attn/q_norm/ReduceMean",
445
+ "/0/auto_model/layers.22/self_attn/q_norm/Add",
446
+ "/0/auto_model/layers.22/mlp/down_proj/MatMul",
447
+ "/0/auto_model/layers.23/self_attn/k_norm/ReduceMean",
448
+ "/0/auto_model/layers.23/self_attn/k_norm/Add",
449
+ "/0/auto_model/layers.23/mlp/down_proj/MatMul",
450
+ "/0/auto_model/layers.26/mlp/down_proj/MatMul",
451
+ "/0/auto_model/layers.1/self_attn/Add_2",
452
+ "/0/auto_model/layers.2/self_attn/Add_2",
453
+ "/0/auto_model/layers.6/self_attn/Add_2",
454
+ "/0/auto_model/layers.11/self_attn/Add_2",
455
+ "/0/auto_model/layers.12/self_attn/Add_2",
456
+ "/0/auto_model/layers.16/self_attn/Add_2",
457
+ "/0/auto_model/layers.21/self_attn/Add_2",
458
+ "/0/auto_model/layers.24/self_attn/Add_2",
459
+ "/0/auto_model/layers.0/self_attn/Add_2",
460
+ "/0/auto_model/layers.8/self_attn/Add_2",
461
+ "/0/auto_model/layers.13/self_attn/Add_2",
462
+ "/0/auto_model/layers.26/self_attn/Add_2",
463
+ "/0/auto_model/layers.3/self_attn/Add_2",
464
+ "/0/auto_model/layers.15/self_attn/Add_2",
465
+ "/0/auto_model/layers.25/self_attn/Add_2",
466
+ "/0/auto_model/layers.4/self_attn/Add_2",
467
+ "/0/auto_model/layers.14/self_attn/Add_2",
468
+ "/0/auto_model/layers.22/self_attn/Add_2",
469
+ "/0/auto_model/layers.9/self_attn/Add_2",
470
+ "/0/auto_model/layers.23/self_attn/Add_2",
471
+ "/0/auto_model/layers.10/self_attn/Add_2",
472
+ "/0/auto_model/layers.5/self_attn/Add_2",
473
+ "/0/auto_model/layers.19/self_attn/Add_2",
474
+ "/0/auto_model/layers.7/self_attn/Add_2",
475
+ "/0/auto_model/layers.27/self_attn/Add_2",
476
+ "/0/auto_model/layers.18/self_attn/Add_2",
477
+ "/0/auto_model/layers.20/self_attn/Add_2",
478
+ "/0/auto_model/layers.17/self_attn/Add_2",
479
+ "/0/auto_model/Slice_1",
480
+ "/0/auto_model/layers.5/self_attn/Slice_4",
481
+ "/0/auto_model/layers.12/self_attn/Slice_4",
482
+ "/0/auto_model/layers.18/self_attn/Slice_4",
483
+ "/0/auto_model/layers.3/self_attn/Slice_4",
484
+ "/0/auto_model/layers.11/self_attn/Slice_4",
485
+ "/0/auto_model/layers.22/self_attn/Slice_4",
486
+ "/0/auto_model/Expand",
487
+ "/0/auto_model/layers.4/self_attn/Slice_4",
488
+ "/0/auto_model/Slice_2",
489
+ "/0/auto_model/layers.8/self_attn/Slice_4",
490
+ "/0/auto_model/layers.2/self_attn/Slice_4",
491
+ "/0/auto_model/layers.15/self_attn/Slice_4",
492
+ "/0/auto_model/layers.26/self_attn/Slice_4",
493
+ "/0/auto_model/layers.24/self_attn/Slice_4",
494
+ "/0/auto_model/Expand_1",
495
+ "/0/auto_model/layers.14/self_attn/Slice_4",
496
+ "/0/auto_model/layers.21/self_attn/Slice_4",
497
+ "/0/auto_model/layers.1/self_attn/Slice_4",
498
+ "/0/auto_model/Reshape_2",
499
+ "/0/auto_model/layers.19/self_attn/Slice_4",
500
+ "/0/auto_model/Slice",
501
+ "/0/auto_model/layers.6/self_attn/Slice_4",
502
+ "/0/auto_model/layers.0/self_attn/Slice_4",
503
+ "/0/auto_model/layers.25/self_attn/Slice_4",
504
+ "/0/auto_model/Unsqueeze_4",
505
+ "/0/auto_model/layers.10/self_attn/Slice_4",
506
+ "/0/auto_model/layers.23/self_attn/Slice_4",
507
+ "/0/auto_model/layers.17/self_attn/Slice_4",
508
+ "/0/auto_model/Where_1",
509
+ "/0/auto_model/layers.27/self_attn/Slice_4",
510
+ "/0/auto_model/layers.20/self_attn/Slice_4",
511
+ "/0/auto_model/Add",
512
+ "/0/auto_model/Mul",
513
+ "/0/auto_model/layers.7/self_attn/Slice_4",
514
+ "/0/auto_model/layers.13/self_attn/Slice_4",
515
+ "/0/auto_model/layers.9/self_attn/Slice_4",
516
+ "/0/auto_model/layers.16/self_attn/Slice_4",
517
+ "/0/auto_model/Unsqueeze_3",
518
+ "/0/auto_model/ScatterND"]
519
+ ```
520
+
521
+ </details>
522
+
523
+ # Benchmarks
524
+
525
+ ## Speed
526
+
527
+ Method = Big chunk of text x10 runs
528
+
529
+ Seconds elapsed for dynamic_int4.onnx: 45.37 (this model)
530
+
531
+ Seconds elapsed for opt_f32.onnx: 46.07 (base f32 model ready for quantization)
532
+
533
+ Seconds elapsed for dynamic_uint8.onnx: 34.61 (probably the one you want to use on CPU)
534
+
535
+ Verdict: This model kinda sucks on CPU. Let me know how it is on GPU please.
536
+
537
+ ## Accuracy
538
+
539
+ I used beir-qdrant with the scifact dataset.
540
+
541
+ This retrieval benchmark isn't the greatest result.
542
+
543
+ I welcome any additional benchmarks by the community, please feel free to share any further results.
544
+
545
+ If someone wants to sponsor me with an NVIDIA GPU I can have a much faster turnaround time with my model experiments and explore some different quantization strategies.
546
+
547
+
548
+ onnx f32 model with f32 output (baseline):
549
+
550
+ ```
551
+ ndcg: {'NDCG@1': 0.57, 'NDCG@3': 0.65655, 'NDCG@5': 0.68177, 'NDCG@10': 0.69999, 'NDCG@100': 0.72749, 'NDCG@1000': 0.73301}
552
+ recall: {'Recall@1': 0.53828, 'Recall@3': 0.71517, 'Recall@5': 0.77883, 'Recall@10': 0.83056, 'Recall@100': 0.95333, 'Recall@1000': 0.99667}
553
+ precision: {'P@1': 0.57, 'P@3': 0.26111, 'P@5': 0.17467, 'P@10': 0.09467, 'P@100': 0.01083, 'P@1000': 0.00113}
554
+ ```
555
+
556
+ onnx dynamic int4/uint8 model with f32 output (this model's parent):
557
+
558
+ ```
559
+ ndcg: {'NDCG@1': 0.55333, 'NDCG@3': 0.6491, 'NDCG@5': 0.6674, 'NDCG@10': 0.69277, 'NDCG@100': 0.7183, 'NDCG@1000': 0.72434}
560
+ recall: {'Recall@1': 0.52161, 'Recall@3': 0.71739, 'Recall@5': 0.7645, 'Recall@10': 0.83656, 'Recall@100': 0.95, 'Recall@1000': 0.99667}
561
+ precision: {'P@1': 0.55333, 'P@3': 0.26222, 'P@5': 0.17067, 'P@10': 0.095, 'P@100': 0.0108, 'P@1000': 0.00113}
562
+ ```
563
+
564
+ onnx dynamic int4/uint8 model with uint8 output (this model):
565
+
566
+ ```
567
+ ndcg: {'NDCG@1': 0.55333, 'NDCG@3': 0.64613, 'NDCG@5': 0.67406, 'NDCG@10': 0.68834, 'NDCG@100': 0.71482, 'NDCG@1000': 0.72134}
568
+ recall: {'Recall@1': 0.52161, 'Recall@3': 0.70961, 'Recall@5': 0.77828, 'Recall@10': 0.81822, 'Recall@100': 0.94333, 'Recall@1000': 0.99333}
569
+ precision: {'P@1': 0.55333, 'P@3': 0.25889, 'P@5': 0.17533, 'P@10': 0.09333, 'P@100': 0.01073, 'P@1000': 0.00112}
570
+ ```
571
+
572
+ # Example inference/benchmark code and how to use the model with Fastembed
573
+
574
+ After installing beir-qdrant make sure to upgrade fastembed.
575
+
576
+ ```python
577
+ # pip install qdrant_client beir-qdrant
578
+ # pip install -U fastembed
579
+ from fastembed import TextEmbedding
580
+ from fastembed.common.model_description import PoolingType, ModelSource
581
+ from beir import util
582
+ from beir.datasets.data_loader import GenericDataLoader
583
+ from beir.retrieval.evaluation import EvaluateRetrieval
584
+ from qdrant_client import QdrantClient
585
+ from qdrant_client.models import Datatype
586
+ from beir_qdrant.retrieval.models.fastembed import DenseFastEmbedModelAdapter
587
+ from beir_qdrant.retrieval.search.dense import DenseQdrantSearch
588
+
589
+ TextEmbedding.add_custom_model(
590
+ model="electroglyph/Qwen3-Embedding-0.6B-onnx-uint8",
591
+ pooling=PoolingType.DISABLED,
592
+ normalization=False,
593
+ sources=ModelSource(hf="electroglyph/Qwen3-Embedding-0.6B-onnx-uint8"),
594
+ dim=1024,
595
+ model_file="dynamic_int4.onnx",
596
+ )
597
+
598
+ dataset = "scifact"
599
+ url = "https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{}.zip".format(dataset)
600
+ data_path = util.download_and_unzip(url, "datasets")
601
+ corpus, queries, qrels = GenericDataLoader(data_folder=data_path).load(split="test")
602
+
603
+ # IMPORTANT: USE THIS (OR A SIMILAR) QUERY FORMAT WITH THIS MODEL:
604
+ for k in queries.keys():
605
+ queries[k] = (
606
+ f"Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: {queries[k]}"
607
+ )
608
+
609
+ qdrant_client = QdrantClient("http://localhost:6333")
610
+
611
+ model = DenseQdrantSearch(
612
+ qdrant_client,
613
+ model=DenseFastEmbedModelAdapter(model_name="Qwen3-Embedding-0.6B-onnx-uint8"),
614
+ collection_name="scifact-qwen3-uint8",
615
+ initialize=True,
616
+ datatype=Datatype.UINT8,
617
+ )
618
+
619
+ retriever = EvaluateRetrieval(model)
620
+ results = retriever.retrieve(corpus, queries)
621
+
622
+ ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values)
623
+ print(f"ndcg: {ndcg}\nrecall: {recall}\nprecision: {precision}")
624
+
625
+ ```
added_tokens.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</think>": 151668,
3
+ "</tool_call>": 151658,
4
+ "</tool_response>": 151666,
5
+ "<think>": 151667,
6
+ "<tool_call>": 151657,
7
+ "<tool_response>": 151665,
8
+ "<|box_end|>": 151649,
9
+ "<|box_start|>": 151648,
10
+ "<|endoftext|>": 151643,
11
+ "<|file_sep|>": 151664,
12
+ "<|fim_middle|>": 151660,
13
+ "<|fim_pad|>": 151662,
14
+ "<|fim_prefix|>": 151659,
15
+ "<|fim_suffix|>": 151661,
16
+ "<|im_end|>": 151645,
17
+ "<|im_start|>": 151644,
18
+ "<|image_pad|>": 151655,
19
+ "<|object_ref_end|>": 151647,
20
+ "<|object_ref_start|>": 151646,
21
+ "<|quad_end|>": 151651,
22
+ "<|quad_start|>": 151650,
23
+ "<|repo_name|>": 151663,
24
+ "<|video_pad|>": 151656,
25
+ "<|vision_end|>": 151653,
26
+ "<|vision_pad|>": 151654,
27
+ "<|vision_start|>": 151652
28
+ }
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_attn_implementation_autoset": true,
3
+ "architectures": [
4
+ "Qwen3ForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 151643,
9
+ "eos_token_id": 151643,
10
+ "export_model_type": "transformer",
11
+ "head_dim": 128,
12
+ "hidden_act": "silu",
13
+ "hidden_size": 1024,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "max_position_embeddings": 32768,
17
+ "max_window_layers": 28,
18
+ "model_type": "qwen3",
19
+ "num_attention_heads": 16,
20
+ "num_hidden_layers": 28,
21
+ "num_key_value_heads": 8,
22
+ "rms_norm_eps": 1e-06,
23
+ "rope_scaling": null,
24
+ "rope_theta": 1000000,
25
+ "sliding_window": null,
26
+ "tie_word_embeddings": true,
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.51.3",
29
+ "use_cache": true,
30
+ "use_sliding_window": false,
31
+ "vocab_size": 151669
32
+ }
dynamic_int4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f353ee6a1ca54dbddbc19d1992e8cedfeb012e8fd68f4d3e9db061ffb7aa70e
3
+ size 457148700
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:def76fb086971c7867b829c23a26261e38d9d74e02139253b38aeb9df8b4b50a
3
+ size 11423705
tokenizer_config.json ADDED
@@ -0,0 +1,240 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<tool_response>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "151666": {
190
+ "content": "</tool_response>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "151667": {
198
+ "content": "<think>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "151668": {
206
+ "content": "</think>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ }
213
+ },
214
+ "additional_special_tokens": [
215
+ "<|im_start|>",
216
+ "<|im_end|>",
217
+ "<|object_ref_start|>",
218
+ "<|object_ref_end|>",
219
+ "<|box_start|>",
220
+ "<|box_end|>",
221
+ "<|quad_start|>",
222
+ "<|quad_end|>",
223
+ "<|vision_start|>",
224
+ "<|vision_end|>",
225
+ "<|vision_pad|>",
226
+ "<|image_pad|>",
227
+ "<|video_pad|>"
228
+ ],
229
+ "bos_token": null,
230
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set content = message.content %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is defined and message.reasoning_content is not none %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in message.content %}\n {%- set content = message.content.split('</think>')[-1].lstrip('\\n') %}\n {%- set reasoning_content = message.content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- endif %}\n{%- endif %}",
231
+ "clean_up_tokenization_spaces": false,
232
+ "eos_token": "<|im_end|>",
233
+ "errors": "replace",
234
+ "extra_special_tokens": {},
235
+ "model_max_length": 131072,
236
+ "pad_token": "<|endoftext|>",
237
+ "split_special_tokens": false,
238
+ "tokenizer_class": "Qwen2Tokenizer",
239
+ "unk_token": null
240
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff