rs-test ekurtic commited on
Commit
9f3936a
·
verified ·
0 Parent(s):

Duplicate from RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic

Browse files

Co-authored-by: Eldar Kurtic <[email protected]>

.gitattributes ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,450 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: vllm
3
+ language:
4
+ - ar
5
+ - de
6
+ - en
7
+ - es
8
+ - fr
9
+ - hi
10
+ - id
11
+ - it
12
+ - pt
13
+ - th
14
+ - tl
15
+ - vi
16
+ base_model:
17
+ - meta-llama/Llama-4-Scout-17B-16E-Instruct
18
+ pipeline_tag: image-text-to-text
19
+ tags:
20
+ - facebook
21
+ - meta
22
+ - pytorch
23
+ - llama
24
+ - llama4
25
+ - neuralmagic
26
+ - redhat
27
+ - llmcompressor
28
+ - quantized
29
+ - FP8
30
+ license: other
31
+ license_name: llama4
32
+ ---
33
+
34
+ <h1 style="display: flex; align-items: center; gap: 10px; margin: 0;">
35
+ Llama-4-Scout-17B-16E-Instruct-FP8-dynamic
36
+ <img src="https://www.redhat.com/rhdc/managed-files/Catalog-Validated_model_0.png" alt="Model Icon" width="40" style="margin: 0; padding: 0;" />
37
+ </h1>
38
+
39
+ <a href="https://www.redhat.com/en/products/ai/validated-models" target="_blank" style="margin: 0; padding: 0;">
40
+ <img src="https://www.redhat.com/rhdc/managed-files/Validated_badge-Dark.png" alt="Validated Badge" width="250" style="margin: 0; padding: 0;" />
41
+ </a>
42
+
43
+ ## Model Overview
44
+ - **Model Architecture:** Llama4ForConditionalGeneration
45
+ - **Input:** Text / Image
46
+ - **Output:** Text
47
+ - **Model Optimizations:**
48
+ - **Activation quantization:** FP8
49
+ - **Weight quantization:** FP8
50
+ - **Release Date:** 04/15/2025
51
+ - **Version:** 1.0
52
+ - **Model Developers:** Red Hat (Neural Magic)
53
+
54
+
55
+ ### Model Optimizations
56
+
57
+ This model was obtained by quantizing activations and weights of [Llama-4-Scout-17B-16E-Instruct](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct) to FP8 data type.
58
+ This optimization reduces the number of bits used to represent weights and activations from 16 to 8, reducing GPU memory requirements (by approximately 50%) and increasing matrix-multiply compute throughput (by approximately 2x).
59
+ Weight quantization also reduces disk size requirements by approximately 50%. The [llm-compressor](https://github.com/vllm-project/llm-compressor) library is used for quantization.
60
+
61
+ ## Deployment
62
+
63
+ This model can be deployed efficiently on vLLM, Red Hat Enterprise Linux AI, and Openshift AI, as shown in the example below.
64
+
65
+ Deploy on <strong>vLLM</strong>
66
+
67
+ ```python
68
+ from vllm import LLM, SamplingParams
69
+ from transformers import AutoTokenizer
70
+
71
+ model_id = "RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic"
72
+ number_gpus = 4
73
+
74
+ sampling_params = SamplingParams(temperature=0.7, top_p=0.8, max_tokens=256)
75
+
76
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
77
+
78
+ prompt = "Give me a short introduction to large language model."
79
+
80
+ llm = LLM(model=model_id, tensor_parallel_size=number_gpus)
81
+
82
+ outputs = llm.generate(prompt, sampling_params)
83
+
84
+ generated_text = outputs[0].outputs[0].text
85
+ print(generated_text)
86
+ ```
87
+
88
+ vLLM also supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
89
+
90
+
91
+ <details>
92
+ <summary>Deploy on <strong>Red Hat AI Inference Server</strong></summary>
93
+
94
+ ```bash
95
+ podman run --rm -it --device nvidia.com/gpu=all -p 8000:8000 \
96
+ --ipc=host \
97
+ --env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
98
+ --env "HF_HUB_OFFLINE=0" -v ~/.cache/vllm:/home/vllm/.cache \
99
+ --name=vllm \
100
+ registry.access.redhat.com/rhaiis/rh-vllm-cuda \
101
+ vllm serve \
102
+ --tensor-parallel-size 8 \
103
+ --max-model-len 32768 \
104
+ --enforce-eager --model RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic
105
+ ```
106
+ </details>
107
+
108
+ <details>
109
+ <summary>Deploy on <strong>Red Hat Enterprise Linux AI</strong></summary>
110
+
111
+ ```bash
112
+ # Download model from Red Hat Registry via docker
113
+ # Note: This downloads the model to ~/.cache/instructlab/models unless --model-dir is specified.
114
+ ilab model download --repository docker://registry.redhat.io/rhelai1/llama-4-scout-17b-16e-instruct-fp8-dynamic:1.5
115
+ ```
116
+
117
+ ```bash
118
+ # Serve model via ilab
119
+ ilab model serve --model-path ~/.cache/instructlab/models/llama-4-scout-17b-16e-instruct-fp8-dynamic
120
+
121
+ # Chat with model
122
+ ilab model chat --model ~/.cache/instructlab/models/llama-4-scout-17b-16e-instruct-fp8-dynamic
123
+ ```
124
+ See [Red Hat Enterprise Linux AI documentation](https://docs.redhat.com/en/documentation/red_hat_enterprise_linux_ai/1.4) for more details.
125
+ </details>
126
+
127
+ <details>
128
+ <summary>Deploy on <strong>Red Hat Openshift AI</strong></summary>
129
+
130
+ ```python
131
+ # Setting up vllm server with ServingRuntime
132
+ # Save as: vllm-servingruntime.yaml
133
+ apiVersion: serving.kserve.io/v1alpha1
134
+ kind: ServingRuntime
135
+ metadata:
136
+ name: vllm-cuda-runtime # OPTIONAL CHANGE: set a unique name
137
+ annotations:
138
+ openshift.io/display-name: vLLM NVIDIA GPU ServingRuntime for KServe
139
+ opendatahub.io/recommended-accelerators: '["nvidia.com/gpu"]'
140
+ labels:
141
+ opendatahub.io/dashboard: 'true'
142
+ spec:
143
+ annotations:
144
+ prometheus.io/port: '8080'
145
+ prometheus.io/path: '/metrics'
146
+ multiModel: false
147
+ supportedModelFormats:
148
+ - autoSelect: true
149
+ name: vLLM
150
+ containers:
151
+ - name: kserve-container
152
+ image: quay.io/modh/vllm:rhoai-2.20-cuda # CHANGE if needed. If AMD: quay.io/modh/vllm:rhoai-2.20-rocm
153
+ command:
154
+ - python
155
+ - -m
156
+ - vllm.entrypoints.openai.api_server
157
+ args:
158
+ - "--port=8080"
159
+ - "--model=/mnt/models"
160
+ - "--served-model-name={{.Name}}"
161
+ env:
162
+ - name: HF_HOME
163
+ value: /tmp/hf_home
164
+ ports:
165
+ - containerPort: 8080
166
+ protocol: TCP
167
+ ```
168
+
169
+ ```python
170
+ # Attach model to vllm server. This is an NVIDIA template
171
+ # Save as: inferenceservice.yaml
172
+ apiVersion: serving.kserve.io/v1beta1
173
+ kind: InferenceService
174
+ metadata:
175
+ annotations:
176
+ openshift.io/display-name: Llama-4-Scout-17B-16E-Instruct-FP8-dynamic # OPTIONAL CHANGE
177
+ serving.kserve.io/deploymentMode: RawDeployment
178
+ name: Llama-4-Scout-17B-16E-Instruct-FP8-dynamic # specify model name. This value will be used to invoke the model in the payload
179
+ labels:
180
+ opendatahub.io/dashboard: 'true'
181
+ spec:
182
+ predictor:
183
+ maxReplicas: 1
184
+ minReplicas: 1
185
+ model:
186
+ modelFormat:
187
+ name: vLLM
188
+ name: ''
189
+ resources:
190
+ limits:
191
+ cpu: '2' # this is model specific
192
+ memory: 8Gi # this is model specific
193
+ nvidia.com/gpu: '1' # this is accelerator specific
194
+ requests: # same comment for this block
195
+ cpu: '1'
196
+ memory: 4Gi
197
+ nvidia.com/gpu: '1'
198
+ runtime: vllm-cuda-runtime # must match the ServingRuntime name above
199
+ storageUri: oci://registry.redhat.io/rhelai1/modelcar-llama-4-scout-17b-16e-instruct-fp8-dynamic:1.5
200
+ tolerations:
201
+ - effect: NoSchedule
202
+ key: nvidia.com/gpu
203
+ operator: Exists
204
+ ```
205
+
206
+ ```bash
207
+ # make sure first to be in the project where you want to deploy the model
208
+ # oc project <project-name>
209
+
210
+ # apply both resources to run model
211
+
212
+ # Apply the ServingRuntime
213
+ oc apply -f vllm-servingruntime.yaml
214
+
215
+ # Apply the InferenceService
216
+ oc apply -f qwen-inferenceservice.yaml
217
+ ```
218
+
219
+ ```python
220
+ # Replace <inference-service-name> and <cluster-ingress-domain> below:
221
+ # - Run `oc get inferenceservice` to find your URL if unsure.
222
+
223
+ # Call the server using curl:
224
+ curl https://<inference-service-name>-predictor-default.<domain>/v1/chat/completions
225
+ -H "Content-Type: application/json" \
226
+ -d '{
227
+ "model": "Llama-4-Scout-17B-16E-Instruct-FP8-dynamic",
228
+ "stream": true,
229
+ "stream_options": {
230
+ "include_usage": true
231
+ },
232
+ "max_tokens": 1,
233
+ "messages": [
234
+ {
235
+ "role": "user",
236
+ "content": "How can a bee fly when its wings are so small?"
237
+ }
238
+ ]
239
+ }'
240
+
241
+ ```
242
+
243
+ See [Red Hat Openshift AI documentation](https://docs.redhat.com/en/documentation/red_hat_openshift_ai/2025) for more details.
244
+ </details>
245
+
246
+ ## Creation
247
+
248
+ <details>
249
+ <summary>Creation details</summary>
250
+ This model was created with [llm-compressor](https://github.com/vllm-project/llm-compressor) by running the code snippet below.
251
+
252
+
253
+ ```python
254
+ #!/usr/bin/env python3
255
+ """
256
+ This script loads an LLM model and applies FP8 quantization to
257
+ weights and activations. Activations are dynamically quantized, i.e. during
258
+ actual runtime.
259
+ """
260
+
261
+ import argparse
262
+ import torch
263
+ from transformers import AutoTokenizer, AutoModelForCausalLM, Llama4ForConditionalGeneration
264
+ from llmcompressor.modifiers.quantization import QuantizationModifier
265
+ from llmcompressor import oneshot
266
+ from compressed_tensors.quantization import (
267
+ QuantizationScheme,
268
+ QuantizationArgs,
269
+ QuantizationType,
270
+ QuantizationStrategy,
271
+ )
272
+
273
+
274
+ def parse_arguments():
275
+ """Parse command line arguments."""
276
+ parser = argparse.ArgumentParser(description="Quantize a causal language model")
277
+ parser.add_argument(
278
+ "--model_path",
279
+ type=str,
280
+ required=True,
281
+ help="Path to the pre-trained model",
282
+ )
283
+ parser.add_argument(
284
+ "--quant_path",
285
+ type=str,
286
+ required=True,
287
+ help="Output path for the quantized model",
288
+ )
289
+ return parser.parse_args()
290
+
291
+
292
+ def main():
293
+ """Main function to load and quantize the model."""
294
+ args = parse_arguments()
295
+
296
+ print(f"Loading model from {args.model_path}...")
297
+ model = Llama4ForConditionalGeneration.from_pretrained(
298
+ args.model_path,
299
+ device_map="auto",
300
+ torch_dtype="auto",
301
+ trust_remote_code=True,
302
+ )
303
+
304
+ quant_scheme = QuantizationScheme(
305
+ targets=["Linear"],
306
+ weights=QuantizationArgs(
307
+ num_bits=8,
308
+ type=QuantizationType.FLOAT,
309
+ strategy=QuantizationStrategy.CHANNEL,
310
+ symmetric=True,
311
+ observer="mse",
312
+ ),
313
+ input_activations=QuantizationArgs(
314
+ num_bits=8,
315
+ type=QuantizationType.FLOAT,
316
+ strategy=QuantizationStrategy.TOKEN,
317
+ symmetric=True,
318
+ dynamic=True,
319
+ ),
320
+ output_activations=None,
321
+ )
322
+
323
+ recipe = QuantizationModifier(
324
+ targets="Linear",
325
+ config_groups={"group_0": quant_scheme},
326
+ ignore=[
327
+ 're:.*lm_head',
328
+ 're:.*self_attn',
329
+ 're:.*router',
330
+ 're:.*vision_model',
331
+ 're:.*multi_modal_projector',
332
+ ]
333
+ )
334
+
335
+ print("Applying quantization...")
336
+ oneshot(
337
+ model=model,
338
+ recipe=recipe,
339
+ trust_remote_code_model=True,
340
+ )
341
+
342
+ model.save_pretrained(args.quant_path, save_compressed=True, skip_compression_stats=True, disable_sparse_compression=True)
343
+ print(f"Quantized model saved to {args.quant_path}")
344
+
345
+
346
+ if __name__ == "__main__":
347
+ main()
348
+ ```
349
+ </details>
350
+
351
+
352
+
353
+ ## Evaluation
354
+
355
+ The model was evaluated on the OpenLLM leaderboard tasks (v1 and v2), long context RULER, multimodal MMMU, and multimodal ChartQA.
356
+ All evaluations are obtained through [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness).
357
+
358
+ <details>
359
+ <summary>Evaluation details</summary>
360
+
361
+ **OpenLLM v1**
362
+ ```
363
+ lm_eval \
364
+ --model vllm \
365
+ --model_args pretrained="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=8,gpu_memory_utilization=0.7,enable_chunked_prefill=True,trust_remote_code=True \
366
+ --tasks openllm \
367
+ --batch_size auto
368
+ ```
369
+
370
+ **OpenLLM v2**
371
+ ```
372
+ lm_eval \
373
+ --model vllm \
374
+ --model_args pretrained="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic",dtype=auto,add_bos_token=False,max_model_len=16384,tensor_parallel_size=8,gpu_memory_utilization=0.5,enable_chunked_prefill=True,trust_remote_code=True \
375
+ --tasks leaderboard \
376
+ --apply_chat_template \
377
+ --fewshot_as_multiturn \
378
+ --batch_size auto
379
+ ```
380
+
381
+ **Long Context RULER**
382
+ ```
383
+ lm_eval \
384
+ --model vllm \
385
+ --model_args pretrained="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic",dtype=auto,add_bos_token=False,max_model_len=524288,tensor_parallel_size=8,gpu_memory_utilization=0.9,enable_chunked_prefill=True,trust_remote_code=True \
386
+ --tasks ruler \
387
+ --metadata='{"max_seq_lengths":[131072]}' \
388
+ --batch_size auto
389
+ ```
390
+
391
+ **Multimodal MMMU**
392
+ ```
393
+ lm_eval \
394
+ --model vllm-vlm \
395
+ --model_args pretrained="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic",dtype=auto,add_bos_token=False,max_model_len=1000000,tensor_parallel_size=8,gpu_memory_utilization=0.9,enable_chunked_prefill=True,trust_remote_code=True,max_images=10 \
396
+ --tasks mmmu_val \
397
+ --apply_chat_template \
398
+ --batch_size auto
399
+ ```
400
+
401
+ **Multimodal ChartQA**
402
+ ```
403
+ export VLLM_MM_INPUT_CACHE_GIB=8
404
+ lm_eval \
405
+ --model vllm-vlm \
406
+ --model_args pretrained="RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic",dtype=auto,add_bos_token=False,max_model_len=1000000,tensor_parallel_size=8,gpu_memory_utilization=0.9,enable_chunked_prefill=True,trust_remote_code=True,max_images=10 \
407
+ --tasks chartqa \
408
+ --apply_chat_template \
409
+ --batch_size auto
410
+ ```
411
+
412
+ </details>
413
+
414
+ ### Accuracy
415
+
416
+ | | Recovery (%) | meta-llama/Llama-4-Scout-17B-16E-Instruct | RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic<br>(this model) |
417
+ | ---------------------------------------------- | :-----------: | :---------------------------------------: | :-----------------------------------------------------------------: |
418
+ | ARC-Challenge<br>25-shot | 100.36 | 69.37 | 69.62 |
419
+ | GSM8k<br>5-shot | 99.24 | 90.45 | 89.76 |
420
+ | HellaSwag<br>10-shot | 99.94 | 85.23 | 85.18 |
421
+ | MMLU<br>5-shot | 99.94 | 80.54 | 80.49 |
422
+ | TruthfulQA<br>0-shot | 99.17 | 61.41 | 60.90 |
423
+ | WinoGrande<br>5-shot | 98.88 | 77.90 | 77.03 |
424
+ | **OpenLLM v1<br>Average Score** | **99.59** | **77.48** | **77.16** |
425
+ | IFEval<br>0-shot<br>avg of inst and prompt acc | 100.91 | 86.90 | 87.69 |
426
+ | Big Bench Hard<br>3-shot | 99.82 | 65.13 | 65.01 |
427
+ | Math Lvl 5<br>4-shot | 98.82 | 57.78 | 57.10 |
428
+ | GPQA<br>0-shot | 100.53 | 31.88 | 32.05 |
429
+ | MuSR<br>0-shot | 102.18 | 42.20 | 43.12 |
430
+ | MMLU-Pro<br>5-shot | 99.82 | 55.70 | 55.60 |
431
+ | **OpenLLM v2<br>Average Score** | **100.28** | **56.60** | **56.76** |
432
+ | RULER<br>seqlen = 131072<br>niah_multikey_1 | 101.36 | 88.20 | 89.40 |
433
+ | RULER<br>seqlen = 131072<br>niah_multikey_2 | 100.72 | 83.60 | 84.20 |
434
+ | RULER<br>seqlen = 131072<br>niah_multikey_3 | 96.19 | 78.80 | 75.80 |
435
+ | RULER<br>seqlen = 131072<br>niah_multiquery | 100.79 | 95.40 | 96.15 |
436
+ | RULER<br>seqlen = 131072<br>niah_multivalue | 97.22 | 73.75 | 71.70 |
437
+ | RULER<br>seqlen = 131072<br>niah_single_1 | 100.00 | 100.00 | 100.00 |
438
+ | RULER<br>seqlen = 131072<br>niah_single_2 | 100.00 | 99.80 | 99.80 |
439
+ | RULER<br>seqlen = 131072<br>niah_single_3 | 100.00 | 99.80 | 99.80 |
440
+ | RULER<br>seqlen = 131072<br>ruler_cwe | 96.19 | 39.42 | 37.92 |
441
+ | RULER<br>seqlen = 131072<br>ruler_fwe | 98.86 | 92.93 | 91.87 |
442
+ | RULER<br>seqlen = 131072<br>ruler_qa_hotpot | 100.00 | 48.20 | 48.20 |
443
+ | RULER<br>seqlen = 131072<br>ruler_qa_squad | 98.81 | 53.57 | 52.93 |
444
+ | RULER<br>seqlen = 131072<br>ruler_qa_vt | 100.35 | 92.28 | 92.60 |
445
+ | **RULER<br>seqlen = 131072<br>Average Score** | **99.49** | **80.44** | **80.03** |
446
+ | MMMU<br>0-shot | 97.92 | 53.44 | 52.33 |
447
+ | ChartQA<br>0-shot<br>exact_match | 100.12 | 65.88 | 65.96 |
448
+ | ChartQA<br>0-shot<br>relaxed_accuracy | 99.69 | 88.92 | 88.64 |
449
+ | **Multimodal Average Score** | **99.38** | **69.41** | **68.98** |
450
+
chat_template.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "chat_template": "{{- bos_token }}\n{%- if custom_tools is defined %}\n {%- set tools = custom_tools %}\n{%- endif %}\n{%- if not tools_in_user_message is defined %}\n {%- set tools_in_user_message = true %}\n{%- endif %}\n{%- if not date_string is defined %}\n {%- if strftime_now is defined %}\n {%- set date_string = strftime_now(\"%d %b %Y\") %}\n {%- else %}\n {%- set date_string = \"26 Jul 2024\" %}\n {%- endif %}\n{%- endif %}\n{%- if not tools is defined %}\n {%- set tools = none %}\n{%- endif %}\n\n{#- This block extracts the system message, so we can slot it into the right place. #}\n{%- if messages[0]['role'] == 'system' %} \n {%- if messages[0]['content'] is string %}\n {%- set system_message = messages[0]['content']|trim %}\n {%- else %}\n {#- FIXME: The processor requires an array, always. #}\n {%- set system_message = messages[0]['content'][0]['text']|trim %}\n {%- endif %}\n {%- set messages = messages[1:] %}\n {%- set user_supplied_system_message = true %}\n{%- else %}\n {%- set system_message = \"\" %}\n {%- set user_supplied_system_message = false %}\n{%- endif %}\n\n{#- System message if the user supplied one #}\n{%- if user_supplied_system_message %}\n {{- \"<|header_start|>system<|header_end|>\\n\\n\" }}\n {%- if tools is not none %}\n {{- \"Environment: ipython\\n\" }}\n {%- endif %}\n {%- if tools is not none and not tools_in_user_message %}\n {{- \"You have access to the following functions. To call a function, please respond with JSON for a function call.\" }}\n {{- 'Respond in the format {\"name\": function name, \"parameters\": dictionary of argument name and its value}.' }}\n {{- \"Do not use variables.\\n\\n\" }}\n {%- for t in tools %}\n {{- t | tojson(indent=4) }}\n {{- \"\\n\\n\" }}\n {%- endfor %}\n {%- endif %}\n {{- system_message }}\n {{- \"<|eot|>\" }}\n{%- endif %}\n\n{#- Custom tools are passed in a user message with some extra guidance #}\n{%- if tools_in_user_message and not tools is none %}\n {#- Extract the first user message so we can plug it in here #}\n {%- if messages | length != 0 %}\n {%- set first_user_message = messages[0]['content']|trim %}\n {%- set messages = messages[1:] %}\n {%- else %}\n {{- raise_exception(\"Cannot put tools in the first user message when there's no first user message!\") }}\n{%- endif %}\n {{- '<|header_start|>user<|header_end|>\\n\\n' -}}\n {{- \"Given the following functions, please respond with a JSON for a function call \" }}\n {{- \"with its proper arguments that best answers the given prompt.\\n\\n\" }}\n {{- 'Respond in the format {\"name\": function name, \"parameters\": dictionary of argument name and its value}.' }}\n {{- \"Do not use variables.\\n\\n\" }}\n {%- for t in tools %}\n {{- t | tojson(indent=4) }}\n {{- \"\\n\\n\" }}\n {%- endfor %}\n {{- first_user_message + \"<|eot|>\"}}\n{%- endif %}\n\n{%- for message in messages %}\n {%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}\n {{- '<|header_start|>' + message['role'] + '<|header_end|>\\n\\n' }}\n {%- if message['content'] is string %}\n {{- message['content'] }}\n {%- else %}\n {%- for content in message['content'] %}\n {%- if content['type'] == 'image' %}\n {{- '<|image|>' }}\n {%- elif content['type'] == 'text' %}\n {{- content['text'] }}\n {%- endif %}\n {%- endfor %}\n {%- endif %}\n {{- \"<|eot|>\" }}\n {%- elif 'tool_calls' in message and message.tool_calls|length > 0 %}\n {{- '<|header_start|>assistant<|header_end|>\\n\\n' -}}\n {{- '<|python_start|>' }}\n {%- if message['content'] is string %}\n {{- message['content'] }}\n {%- else %}\n {%- for content in message['content'] %}\n {%- if content['type'] == 'image' %}\n {{- '<|image|>' }}\n {%- elif content['type'] == 'text' %}\n {{- content['text'] }}\n {%- endif %}\n {%- endfor %}\n {%- endif %}\n {{- '<|python_end|>' }}\n {%- for tool_call in message.tool_calls %}\n {{- '{\"name\": \"' + tool_call.function.name + '\", ' }}\n {{- '\"parameters\": ' }}\n {{- tool_call.function.arguments | tojson }}\n {{- \"}\" }}\n {%- endfor %}\n {{- \"<|eot|>\" }}\n {%- elif message.role == \"tool\" or message.role == \"ipython\" %}\n {{- \"<|header_start|>ipython<|header_end|>\\n\\n\" }}\n {%- if message.content is mapping or message.content is iterable %}\n {{- message.content | tojson }}\n {%- else %}\n {{- message.content }}\n {%- endif %}\n {{- \"<|eot|>\" }}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|header_start|>assistant<|header_end|>\\n\\n' }}\n{%- endif %}\n"
3
+ }
config.json ADDED
@@ -0,0 +1,570 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Llama4ForConditionalGeneration"
4
+ ],
5
+ "boi_token_index": 200080,
6
+ "eoi_token_index": 200081,
7
+ "image_token_index": 200092,
8
+ "model_type": "llama4",
9
+ "text_config": {
10
+ "_attn_implementation_autoset": true,
11
+ "attention_bias": false,
12
+ "attention_chunk_size": 8192,
13
+ "attention_dropout": 0.0,
14
+ "bos_token_id": 200000,
15
+ "eos_token_id": [
16
+ 200001,
17
+ 200007,
18
+ 200008
19
+ ],
20
+ "for_llm_compressor": true,
21
+ "head_dim": 128,
22
+ "hidden_act": "silu",
23
+ "hidden_size": 5120,
24
+ "initializer_range": 0.02,
25
+ "interleave_moe_layer_step": 1,
26
+ "intermediate_size": 8192,
27
+ "intermediate_size_mlp": 16384,
28
+ "max_position_embeddings": 10485760,
29
+ "model_type": "llama4_text",
30
+ "no_rope_layers": [],
31
+ "num_attention_heads": 40,
32
+ "num_experts_per_tok": 1,
33
+ "num_hidden_layers": 48,
34
+ "num_key_value_heads": 8,
35
+ "num_local_experts": 16,
36
+ "output_router_logits": false,
37
+ "pad_token_id": 200018,
38
+ "rms_norm_eps": 1e-05,
39
+ "rope_scaling": {
40
+ "factor": 16.0,
41
+ "high_freq_factor": 1.0,
42
+ "low_freq_factor": 1.0,
43
+ "original_max_position_embeddings": 8192,
44
+ "rope_type": "llama3"
45
+ },
46
+ "rope_theta": 500000.0,
47
+ "router_aux_loss_coef": 0.001,
48
+ "router_jitter_noise": 0.0,
49
+ "torch_dtype": "bfloat16",
50
+ "use_cache": true,
51
+ "use_qk_norm": true,
52
+ "vocab_size": 202048
53
+ },
54
+ "torch_dtype": "bfloat16",
55
+ "transformers_version": "4.51.0.dev0",
56
+ "vision_config": {
57
+ "_attn_implementation_autoset": true,
58
+ "attention_dropout": 0.0,
59
+ "hidden_act": "gelu",
60
+ "hidden_size": 1408,
61
+ "image_size": 336,
62
+ "initializer_range": 0.02,
63
+ "intermediate_size": 5632,
64
+ "model_type": "llama4_vision_model",
65
+ "multi_modal_projector_bias": false,
66
+ "norm_eps": 1e-05,
67
+ "num_attention_heads": 16,
68
+ "num_channels": 3,
69
+ "num_hidden_layers": 34,
70
+ "patch_size": 14,
71
+ "pixel_shuffle_ratio": 0.5,
72
+ "projector_dropout": 0.0,
73
+ "projector_input_dim": 4096,
74
+ "projector_output_dim": 4096,
75
+ "rope_theta": 10000,
76
+ "vision_feature_layer": -1,
77
+ "vision_feature_select_strategy": "default",
78
+ "vision_output_dim": 4096
79
+ },
80
+ "quantization_config": {
81
+ "config_groups": {
82
+ "group_0": {
83
+ "input_activations": {
84
+ "actorder": null,
85
+ "block_structure": null,
86
+ "dynamic": true,
87
+ "group_size": null,
88
+ "num_bits": 8,
89
+ "observer": null,
90
+ "observer_kwargs": {},
91
+ "strategy": "token",
92
+ "symmetric": true,
93
+ "type": "float"
94
+ },
95
+ "output_activations": null,
96
+ "targets": [
97
+ "Linear"
98
+ ],
99
+ "weights": {
100
+ "actorder": null,
101
+ "block_structure": null,
102
+ "dynamic": false,
103
+ "group_size": null,
104
+ "num_bits": 8,
105
+ "observer": "mse",
106
+ "observer_kwargs": {},
107
+ "strategy": "channel",
108
+ "symmetric": true,
109
+ "type": "float"
110
+ }
111
+ }
112
+ },
113
+ "format": "float-quantized",
114
+ "global_compression_ratio": null,
115
+ "ignore": [
116
+ "vision_model.patch_embedding.linear",
117
+ "vision_model.model.layers.0.self_attn.q_proj",
118
+ "vision_model.model.layers.0.self_attn.k_proj",
119
+ "vision_model.model.layers.0.self_attn.v_proj",
120
+ "vision_model.model.layers.0.self_attn.o_proj",
121
+ "vision_model.model.layers.0.mlp.fc1",
122
+ "vision_model.model.layers.0.mlp.fc2",
123
+ "vision_model.model.layers.1.self_attn.q_proj",
124
+ "vision_model.model.layers.1.self_attn.k_proj",
125
+ "vision_model.model.layers.1.self_attn.v_proj",
126
+ "vision_model.model.layers.1.self_attn.o_proj",
127
+ "vision_model.model.layers.1.mlp.fc1",
128
+ "vision_model.model.layers.1.mlp.fc2",
129
+ "vision_model.model.layers.2.self_attn.q_proj",
130
+ "vision_model.model.layers.2.self_attn.k_proj",
131
+ "vision_model.model.layers.2.self_attn.v_proj",
132
+ "vision_model.model.layers.2.self_attn.o_proj",
133
+ "vision_model.model.layers.2.mlp.fc1",
134
+ "vision_model.model.layers.2.mlp.fc2",
135
+ "vision_model.model.layers.3.self_attn.q_proj",
136
+ "vision_model.model.layers.3.self_attn.k_proj",
137
+ "vision_model.model.layers.3.self_attn.v_proj",
138
+ "vision_model.model.layers.3.self_attn.o_proj",
139
+ "vision_model.model.layers.3.mlp.fc1",
140
+ "vision_model.model.layers.3.mlp.fc2",
141
+ "vision_model.model.layers.4.self_attn.q_proj",
142
+ "vision_model.model.layers.4.self_attn.k_proj",
143
+ "vision_model.model.layers.4.self_attn.v_proj",
144
+ "vision_model.model.layers.4.self_attn.o_proj",
145
+ "vision_model.model.layers.4.mlp.fc1",
146
+ "vision_model.model.layers.4.mlp.fc2",
147
+ "vision_model.model.layers.5.self_attn.q_proj",
148
+ "vision_model.model.layers.5.self_attn.k_proj",
149
+ "vision_model.model.layers.5.self_attn.v_proj",
150
+ "vision_model.model.layers.5.self_attn.o_proj",
151
+ "vision_model.model.layers.5.mlp.fc1",
152
+ "vision_model.model.layers.5.mlp.fc2",
153
+ "vision_model.model.layers.6.self_attn.q_proj",
154
+ "vision_model.model.layers.6.self_attn.k_proj",
155
+ "vision_model.model.layers.6.self_attn.v_proj",
156
+ "vision_model.model.layers.6.self_attn.o_proj",
157
+ "vision_model.model.layers.6.mlp.fc1",
158
+ "vision_model.model.layers.6.mlp.fc2",
159
+ "vision_model.model.layers.7.self_attn.q_proj",
160
+ "vision_model.model.layers.7.self_attn.k_proj",
161
+ "vision_model.model.layers.7.self_attn.v_proj",
162
+ "vision_model.model.layers.7.self_attn.o_proj",
163
+ "vision_model.model.layers.7.mlp.fc1",
164
+ "vision_model.model.layers.7.mlp.fc2",
165
+ "vision_model.model.layers.8.self_attn.q_proj",
166
+ "vision_model.model.layers.8.self_attn.k_proj",
167
+ "vision_model.model.layers.8.self_attn.v_proj",
168
+ "vision_model.model.layers.8.self_attn.o_proj",
169
+ "vision_model.model.layers.8.mlp.fc1",
170
+ "vision_model.model.layers.8.mlp.fc2",
171
+ "vision_model.model.layers.9.self_attn.q_proj",
172
+ "vision_model.model.layers.9.self_attn.k_proj",
173
+ "vision_model.model.layers.9.self_attn.v_proj",
174
+ "vision_model.model.layers.9.self_attn.o_proj",
175
+ "vision_model.model.layers.9.mlp.fc1",
176
+ "vision_model.model.layers.9.mlp.fc2",
177
+ "vision_model.model.layers.10.self_attn.q_proj",
178
+ "vision_model.model.layers.10.self_attn.k_proj",
179
+ "vision_model.model.layers.10.self_attn.v_proj",
180
+ "vision_model.model.layers.10.self_attn.o_proj",
181
+ "vision_model.model.layers.10.mlp.fc1",
182
+ "vision_model.model.layers.10.mlp.fc2",
183
+ "vision_model.model.layers.11.self_attn.q_proj",
184
+ "vision_model.model.layers.11.self_attn.k_proj",
185
+ "vision_model.model.layers.11.self_attn.v_proj",
186
+ "vision_model.model.layers.11.self_attn.o_proj",
187
+ "vision_model.model.layers.11.mlp.fc1",
188
+ "vision_model.model.layers.11.mlp.fc2",
189
+ "vision_model.model.layers.12.self_attn.q_proj",
190
+ "vision_model.model.layers.12.self_attn.k_proj",
191
+ "vision_model.model.layers.12.self_attn.v_proj",
192
+ "vision_model.model.layers.12.self_attn.o_proj",
193
+ "vision_model.model.layers.12.mlp.fc1",
194
+ "vision_model.model.layers.12.mlp.fc2",
195
+ "vision_model.model.layers.13.self_attn.q_proj",
196
+ "vision_model.model.layers.13.self_attn.k_proj",
197
+ "vision_model.model.layers.13.self_attn.v_proj",
198
+ "vision_model.model.layers.13.self_attn.o_proj",
199
+ "vision_model.model.layers.13.mlp.fc1",
200
+ "vision_model.model.layers.13.mlp.fc2",
201
+ "vision_model.model.layers.14.self_attn.q_proj",
202
+ "vision_model.model.layers.14.self_attn.k_proj",
203
+ "vision_model.model.layers.14.self_attn.v_proj",
204
+ "vision_model.model.layers.14.self_attn.o_proj",
205
+ "vision_model.model.layers.14.mlp.fc1",
206
+ "vision_model.model.layers.14.mlp.fc2",
207
+ "vision_model.model.layers.15.self_attn.q_proj",
208
+ "vision_model.model.layers.15.self_attn.k_proj",
209
+ "vision_model.model.layers.15.self_attn.v_proj",
210
+ "vision_model.model.layers.15.self_attn.o_proj",
211
+ "vision_model.model.layers.15.mlp.fc1",
212
+ "vision_model.model.layers.15.mlp.fc2",
213
+ "vision_model.model.layers.16.self_attn.q_proj",
214
+ "vision_model.model.layers.16.self_attn.k_proj",
215
+ "vision_model.model.layers.16.self_attn.v_proj",
216
+ "vision_model.model.layers.16.self_attn.o_proj",
217
+ "vision_model.model.layers.16.mlp.fc1",
218
+ "vision_model.model.layers.16.mlp.fc2",
219
+ "vision_model.model.layers.17.self_attn.q_proj",
220
+ "vision_model.model.layers.17.self_attn.k_proj",
221
+ "vision_model.model.layers.17.self_attn.v_proj",
222
+ "vision_model.model.layers.17.self_attn.o_proj",
223
+ "vision_model.model.layers.17.mlp.fc1",
224
+ "vision_model.model.layers.17.mlp.fc2",
225
+ "vision_model.model.layers.18.self_attn.q_proj",
226
+ "vision_model.model.layers.18.self_attn.k_proj",
227
+ "vision_model.model.layers.18.self_attn.v_proj",
228
+ "vision_model.model.layers.18.self_attn.o_proj",
229
+ "vision_model.model.layers.18.mlp.fc1",
230
+ "vision_model.model.layers.18.mlp.fc2",
231
+ "vision_model.model.layers.19.self_attn.q_proj",
232
+ "vision_model.model.layers.19.self_attn.k_proj",
233
+ "vision_model.model.layers.19.self_attn.v_proj",
234
+ "vision_model.model.layers.19.self_attn.o_proj",
235
+ "vision_model.model.layers.19.mlp.fc1",
236
+ "vision_model.model.layers.19.mlp.fc2",
237
+ "vision_model.model.layers.20.self_attn.q_proj",
238
+ "vision_model.model.layers.20.self_attn.k_proj",
239
+ "vision_model.model.layers.20.self_attn.v_proj",
240
+ "vision_model.model.layers.20.self_attn.o_proj",
241
+ "vision_model.model.layers.20.mlp.fc1",
242
+ "vision_model.model.layers.20.mlp.fc2",
243
+ "vision_model.model.layers.21.self_attn.q_proj",
244
+ "vision_model.model.layers.21.self_attn.k_proj",
245
+ "vision_model.model.layers.21.self_attn.v_proj",
246
+ "vision_model.model.layers.21.self_attn.o_proj",
247
+ "vision_model.model.layers.21.mlp.fc1",
248
+ "vision_model.model.layers.21.mlp.fc2",
249
+ "vision_model.model.layers.22.self_attn.q_proj",
250
+ "vision_model.model.layers.22.self_attn.k_proj",
251
+ "vision_model.model.layers.22.self_attn.v_proj",
252
+ "vision_model.model.layers.22.self_attn.o_proj",
253
+ "vision_model.model.layers.22.mlp.fc1",
254
+ "vision_model.model.layers.22.mlp.fc2",
255
+ "vision_model.model.layers.23.self_attn.q_proj",
256
+ "vision_model.model.layers.23.self_attn.k_proj",
257
+ "vision_model.model.layers.23.self_attn.v_proj",
258
+ "vision_model.model.layers.23.self_attn.o_proj",
259
+ "vision_model.model.layers.23.mlp.fc1",
260
+ "vision_model.model.layers.23.mlp.fc2",
261
+ "vision_model.model.layers.24.self_attn.q_proj",
262
+ "vision_model.model.layers.24.self_attn.k_proj",
263
+ "vision_model.model.layers.24.self_attn.v_proj",
264
+ "vision_model.model.layers.24.self_attn.o_proj",
265
+ "vision_model.model.layers.24.mlp.fc1",
266
+ "vision_model.model.layers.24.mlp.fc2",
267
+ "vision_model.model.layers.25.self_attn.q_proj",
268
+ "vision_model.model.layers.25.self_attn.k_proj",
269
+ "vision_model.model.layers.25.self_attn.v_proj",
270
+ "vision_model.model.layers.25.self_attn.o_proj",
271
+ "vision_model.model.layers.25.mlp.fc1",
272
+ "vision_model.model.layers.25.mlp.fc2",
273
+ "vision_model.model.layers.26.self_attn.q_proj",
274
+ "vision_model.model.layers.26.self_attn.k_proj",
275
+ "vision_model.model.layers.26.self_attn.v_proj",
276
+ "vision_model.model.layers.26.self_attn.o_proj",
277
+ "vision_model.model.layers.26.mlp.fc1",
278
+ "vision_model.model.layers.26.mlp.fc2",
279
+ "vision_model.model.layers.27.self_attn.q_proj",
280
+ "vision_model.model.layers.27.self_attn.k_proj",
281
+ "vision_model.model.layers.27.self_attn.v_proj",
282
+ "vision_model.model.layers.27.self_attn.o_proj",
283
+ "vision_model.model.layers.27.mlp.fc1",
284
+ "vision_model.model.layers.27.mlp.fc2",
285
+ "vision_model.model.layers.28.self_attn.q_proj",
286
+ "vision_model.model.layers.28.self_attn.k_proj",
287
+ "vision_model.model.layers.28.self_attn.v_proj",
288
+ "vision_model.model.layers.28.self_attn.o_proj",
289
+ "vision_model.model.layers.28.mlp.fc1",
290
+ "vision_model.model.layers.28.mlp.fc2",
291
+ "vision_model.model.layers.29.self_attn.q_proj",
292
+ "vision_model.model.layers.29.self_attn.k_proj",
293
+ "vision_model.model.layers.29.self_attn.v_proj",
294
+ "vision_model.model.layers.29.self_attn.o_proj",
295
+ "vision_model.model.layers.29.mlp.fc1",
296
+ "vision_model.model.layers.29.mlp.fc2",
297
+ "vision_model.model.layers.30.self_attn.q_proj",
298
+ "vision_model.model.layers.30.self_attn.k_proj",
299
+ "vision_model.model.layers.30.self_attn.v_proj",
300
+ "vision_model.model.layers.30.self_attn.o_proj",
301
+ "vision_model.model.layers.30.mlp.fc1",
302
+ "vision_model.model.layers.30.mlp.fc2",
303
+ "vision_model.model.layers.31.self_attn.q_proj",
304
+ "vision_model.model.layers.31.self_attn.k_proj",
305
+ "vision_model.model.layers.31.self_attn.v_proj",
306
+ "vision_model.model.layers.31.self_attn.o_proj",
307
+ "vision_model.model.layers.31.mlp.fc1",
308
+ "vision_model.model.layers.31.mlp.fc2",
309
+ "vision_model.model.layers.32.self_attn.q_proj",
310
+ "vision_model.model.layers.32.self_attn.k_proj",
311
+ "vision_model.model.layers.32.self_attn.v_proj",
312
+ "vision_model.model.layers.32.self_attn.o_proj",
313
+ "vision_model.model.layers.32.mlp.fc1",
314
+ "vision_model.model.layers.32.mlp.fc2",
315
+ "vision_model.model.layers.33.self_attn.q_proj",
316
+ "vision_model.model.layers.33.self_attn.k_proj",
317
+ "vision_model.model.layers.33.self_attn.v_proj",
318
+ "vision_model.model.layers.33.self_attn.o_proj",
319
+ "vision_model.model.layers.33.mlp.fc1",
320
+ "vision_model.model.layers.33.mlp.fc2",
321
+ "vision_model.vision_adapter.mlp.fc1",
322
+ "vision_model.vision_adapter.mlp.fc2",
323
+ "multi_modal_projector.linear_1",
324
+ "language_model.model.layers.0.self_attn.q_proj",
325
+ "language_model.model.layers.0.self_attn.k_proj",
326
+ "language_model.model.layers.0.self_attn.v_proj",
327
+ "language_model.model.layers.0.self_attn.o_proj",
328
+ "language_model.model.layers.0.feed_forward.router",
329
+ "language_model.model.layers.1.self_attn.q_proj",
330
+ "language_model.model.layers.1.self_attn.k_proj",
331
+ "language_model.model.layers.1.self_attn.v_proj",
332
+ "language_model.model.layers.1.self_attn.o_proj",
333
+ "language_model.model.layers.1.feed_forward.router",
334
+ "language_model.model.layers.2.self_attn.q_proj",
335
+ "language_model.model.layers.2.self_attn.k_proj",
336
+ "language_model.model.layers.2.self_attn.v_proj",
337
+ "language_model.model.layers.2.self_attn.o_proj",
338
+ "language_model.model.layers.2.feed_forward.router",
339
+ "language_model.model.layers.3.self_attn.q_proj",
340
+ "language_model.model.layers.3.self_attn.k_proj",
341
+ "language_model.model.layers.3.self_attn.v_proj",
342
+ "language_model.model.layers.3.self_attn.o_proj",
343
+ "language_model.model.layers.3.feed_forward.router",
344
+ "language_model.model.layers.4.self_attn.q_proj",
345
+ "language_model.model.layers.4.self_attn.k_proj",
346
+ "language_model.model.layers.4.self_attn.v_proj",
347
+ "language_model.model.layers.4.self_attn.o_proj",
348
+ "language_model.model.layers.4.feed_forward.router",
349
+ "language_model.model.layers.5.self_attn.q_proj",
350
+ "language_model.model.layers.5.self_attn.k_proj",
351
+ "language_model.model.layers.5.self_attn.v_proj",
352
+ "language_model.model.layers.5.self_attn.o_proj",
353
+ "language_model.model.layers.5.feed_forward.router",
354
+ "language_model.model.layers.6.self_attn.q_proj",
355
+ "language_model.model.layers.6.self_attn.k_proj",
356
+ "language_model.model.layers.6.self_attn.v_proj",
357
+ "language_model.model.layers.6.self_attn.o_proj",
358
+ "language_model.model.layers.6.feed_forward.router",
359
+ "language_model.model.layers.7.self_attn.q_proj",
360
+ "language_model.model.layers.7.self_attn.k_proj",
361
+ "language_model.model.layers.7.self_attn.v_proj",
362
+ "language_model.model.layers.7.self_attn.o_proj",
363
+ "language_model.model.layers.7.feed_forward.router",
364
+ "language_model.model.layers.8.self_attn.q_proj",
365
+ "language_model.model.layers.8.self_attn.k_proj",
366
+ "language_model.model.layers.8.self_attn.v_proj",
367
+ "language_model.model.layers.8.self_attn.o_proj",
368
+ "language_model.model.layers.8.feed_forward.router",
369
+ "language_model.model.layers.9.self_attn.q_proj",
370
+ "language_model.model.layers.9.self_attn.k_proj",
371
+ "language_model.model.layers.9.self_attn.v_proj",
372
+ "language_model.model.layers.9.self_attn.o_proj",
373
+ "language_model.model.layers.9.feed_forward.router",
374
+ "language_model.model.layers.10.self_attn.q_proj",
375
+ "language_model.model.layers.10.self_attn.k_proj",
376
+ "language_model.model.layers.10.self_attn.v_proj",
377
+ "language_model.model.layers.10.self_attn.o_proj",
378
+ "language_model.model.layers.10.feed_forward.router",
379
+ "language_model.model.layers.11.self_attn.q_proj",
380
+ "language_model.model.layers.11.self_attn.k_proj",
381
+ "language_model.model.layers.11.self_attn.v_proj",
382
+ "language_model.model.layers.11.self_attn.o_proj",
383
+ "language_model.model.layers.11.feed_forward.router",
384
+ "language_model.model.layers.12.self_attn.q_proj",
385
+ "language_model.model.layers.12.self_attn.k_proj",
386
+ "language_model.model.layers.12.self_attn.v_proj",
387
+ "language_model.model.layers.12.self_attn.o_proj",
388
+ "language_model.model.layers.12.feed_forward.router",
389
+ "language_model.model.layers.13.self_attn.q_proj",
390
+ "language_model.model.layers.13.self_attn.k_proj",
391
+ "language_model.model.layers.13.self_attn.v_proj",
392
+ "language_model.model.layers.13.self_attn.o_proj",
393
+ "language_model.model.layers.13.feed_forward.router",
394
+ "language_model.model.layers.14.self_attn.q_proj",
395
+ "language_model.model.layers.14.self_attn.k_proj",
396
+ "language_model.model.layers.14.self_attn.v_proj",
397
+ "language_model.model.layers.14.self_attn.o_proj",
398
+ "language_model.model.layers.14.feed_forward.router",
399
+ "language_model.model.layers.15.self_attn.q_proj",
400
+ "language_model.model.layers.15.self_attn.k_proj",
401
+ "language_model.model.layers.15.self_attn.v_proj",
402
+ "language_model.model.layers.15.self_attn.o_proj",
403
+ "language_model.model.layers.15.feed_forward.router",
404
+ "language_model.model.layers.16.self_attn.q_proj",
405
+ "language_model.model.layers.16.self_attn.k_proj",
406
+ "language_model.model.layers.16.self_attn.v_proj",
407
+ "language_model.model.layers.16.self_attn.o_proj",
408
+ "language_model.model.layers.16.feed_forward.router",
409
+ "language_model.model.layers.17.self_attn.q_proj",
410
+ "language_model.model.layers.17.self_attn.k_proj",
411
+ "language_model.model.layers.17.self_attn.v_proj",
412
+ "language_model.model.layers.17.self_attn.o_proj",
413
+ "language_model.model.layers.17.feed_forward.router",
414
+ "language_model.model.layers.18.self_attn.q_proj",
415
+ "language_model.model.layers.18.self_attn.k_proj",
416
+ "language_model.model.layers.18.self_attn.v_proj",
417
+ "language_model.model.layers.18.self_attn.o_proj",
418
+ "language_model.model.layers.18.feed_forward.router",
419
+ "language_model.model.layers.19.self_attn.q_proj",
420
+ "language_model.model.layers.19.self_attn.k_proj",
421
+ "language_model.model.layers.19.self_attn.v_proj",
422
+ "language_model.model.layers.19.self_attn.o_proj",
423
+ "language_model.model.layers.19.feed_forward.router",
424
+ "language_model.model.layers.20.self_attn.q_proj",
425
+ "language_model.model.layers.20.self_attn.k_proj",
426
+ "language_model.model.layers.20.self_attn.v_proj",
427
+ "language_model.model.layers.20.self_attn.o_proj",
428
+ "language_model.model.layers.20.feed_forward.router",
429
+ "language_model.model.layers.21.self_attn.q_proj",
430
+ "language_model.model.layers.21.self_attn.k_proj",
431
+ "language_model.model.layers.21.self_attn.v_proj",
432
+ "language_model.model.layers.21.self_attn.o_proj",
433
+ "language_model.model.layers.21.feed_forward.router",
434
+ "language_model.model.layers.22.self_attn.q_proj",
435
+ "language_model.model.layers.22.self_attn.k_proj",
436
+ "language_model.model.layers.22.self_attn.v_proj",
437
+ "language_model.model.layers.22.self_attn.o_proj",
438
+ "language_model.model.layers.22.feed_forward.router",
439
+ "language_model.model.layers.23.self_attn.q_proj",
440
+ "language_model.model.layers.23.self_attn.k_proj",
441
+ "language_model.model.layers.23.self_attn.v_proj",
442
+ "language_model.model.layers.23.self_attn.o_proj",
443
+ "language_model.model.layers.23.feed_forward.router",
444
+ "language_model.model.layers.24.self_attn.q_proj",
445
+ "language_model.model.layers.24.self_attn.k_proj",
446
+ "language_model.model.layers.24.self_attn.v_proj",
447
+ "language_model.model.layers.24.self_attn.o_proj",
448
+ "language_model.model.layers.24.feed_forward.router",
449
+ "language_model.model.layers.25.self_attn.q_proj",
450
+ "language_model.model.layers.25.self_attn.k_proj",
451
+ "language_model.model.layers.25.self_attn.v_proj",
452
+ "language_model.model.layers.25.self_attn.o_proj",
453
+ "language_model.model.layers.25.feed_forward.router",
454
+ "language_model.model.layers.26.self_attn.q_proj",
455
+ "language_model.model.layers.26.self_attn.k_proj",
456
+ "language_model.model.layers.26.self_attn.v_proj",
457
+ "language_model.model.layers.26.self_attn.o_proj",
458
+ "language_model.model.layers.26.feed_forward.router",
459
+ "language_model.model.layers.27.self_attn.q_proj",
460
+ "language_model.model.layers.27.self_attn.k_proj",
461
+ "language_model.model.layers.27.self_attn.v_proj",
462
+ "language_model.model.layers.27.self_attn.o_proj",
463
+ "language_model.model.layers.27.feed_forward.router",
464
+ "language_model.model.layers.28.self_attn.q_proj",
465
+ "language_model.model.layers.28.self_attn.k_proj",
466
+ "language_model.model.layers.28.self_attn.v_proj",
467
+ "language_model.model.layers.28.self_attn.o_proj",
468
+ "language_model.model.layers.28.feed_forward.router",
469
+ "language_model.model.layers.29.self_attn.q_proj",
470
+ "language_model.model.layers.29.self_attn.k_proj",
471
+ "language_model.model.layers.29.self_attn.v_proj",
472
+ "language_model.model.layers.29.self_attn.o_proj",
473
+ "language_model.model.layers.29.feed_forward.router",
474
+ "language_model.model.layers.30.self_attn.q_proj",
475
+ "language_model.model.layers.30.self_attn.k_proj",
476
+ "language_model.model.layers.30.self_attn.v_proj",
477
+ "language_model.model.layers.30.self_attn.o_proj",
478
+ "language_model.model.layers.30.feed_forward.router",
479
+ "language_model.model.layers.31.self_attn.q_proj",
480
+ "language_model.model.layers.31.self_attn.k_proj",
481
+ "language_model.model.layers.31.self_attn.v_proj",
482
+ "language_model.model.layers.31.self_attn.o_proj",
483
+ "language_model.model.layers.31.feed_forward.router",
484
+ "language_model.model.layers.32.self_attn.q_proj",
485
+ "language_model.model.layers.32.self_attn.k_proj",
486
+ "language_model.model.layers.32.self_attn.v_proj",
487
+ "language_model.model.layers.32.self_attn.o_proj",
488
+ "language_model.model.layers.32.feed_forward.router",
489
+ "language_model.model.layers.33.self_attn.q_proj",
490
+ "language_model.model.layers.33.self_attn.k_proj",
491
+ "language_model.model.layers.33.self_attn.v_proj",
492
+ "language_model.model.layers.33.self_attn.o_proj",
493
+ "language_model.model.layers.33.feed_forward.router",
494
+ "language_model.model.layers.34.self_attn.q_proj",
495
+ "language_model.model.layers.34.self_attn.k_proj",
496
+ "language_model.model.layers.34.self_attn.v_proj",
497
+ "language_model.model.layers.34.self_attn.o_proj",
498
+ "language_model.model.layers.34.feed_forward.router",
499
+ "language_model.model.layers.35.self_attn.q_proj",
500
+ "language_model.model.layers.35.self_attn.k_proj",
501
+ "language_model.model.layers.35.self_attn.v_proj",
502
+ "language_model.model.layers.35.self_attn.o_proj",
503
+ "language_model.model.layers.35.feed_forward.router",
504
+ "language_model.model.layers.36.self_attn.q_proj",
505
+ "language_model.model.layers.36.self_attn.k_proj",
506
+ "language_model.model.layers.36.self_attn.v_proj",
507
+ "language_model.model.layers.36.self_attn.o_proj",
508
+ "language_model.model.layers.36.feed_forward.router",
509
+ "language_model.model.layers.37.self_attn.q_proj",
510
+ "language_model.model.layers.37.self_attn.k_proj",
511
+ "language_model.model.layers.37.self_attn.v_proj",
512
+ "language_model.model.layers.37.self_attn.o_proj",
513
+ "language_model.model.layers.37.feed_forward.router",
514
+ "language_model.model.layers.38.self_attn.q_proj",
515
+ "language_model.model.layers.38.self_attn.k_proj",
516
+ "language_model.model.layers.38.self_attn.v_proj",
517
+ "language_model.model.layers.38.self_attn.o_proj",
518
+ "language_model.model.layers.38.feed_forward.router",
519
+ "language_model.model.layers.39.self_attn.q_proj",
520
+ "language_model.model.layers.39.self_attn.k_proj",
521
+ "language_model.model.layers.39.self_attn.v_proj",
522
+ "language_model.model.layers.39.self_attn.o_proj",
523
+ "language_model.model.layers.39.feed_forward.router",
524
+ "language_model.model.layers.40.self_attn.q_proj",
525
+ "language_model.model.layers.40.self_attn.k_proj",
526
+ "language_model.model.layers.40.self_attn.v_proj",
527
+ "language_model.model.layers.40.self_attn.o_proj",
528
+ "language_model.model.layers.40.feed_forward.router",
529
+ "language_model.model.layers.41.self_attn.q_proj",
530
+ "language_model.model.layers.41.self_attn.k_proj",
531
+ "language_model.model.layers.41.self_attn.v_proj",
532
+ "language_model.model.layers.41.self_attn.o_proj",
533
+ "language_model.model.layers.41.feed_forward.router",
534
+ "language_model.model.layers.42.self_attn.q_proj",
535
+ "language_model.model.layers.42.self_attn.k_proj",
536
+ "language_model.model.layers.42.self_attn.v_proj",
537
+ "language_model.model.layers.42.self_attn.o_proj",
538
+ "language_model.model.layers.42.feed_forward.router",
539
+ "language_model.model.layers.43.self_attn.q_proj",
540
+ "language_model.model.layers.43.self_attn.k_proj",
541
+ "language_model.model.layers.43.self_attn.v_proj",
542
+ "language_model.model.layers.43.self_attn.o_proj",
543
+ "language_model.model.layers.43.feed_forward.router",
544
+ "language_model.model.layers.44.self_attn.q_proj",
545
+ "language_model.model.layers.44.self_attn.k_proj",
546
+ "language_model.model.layers.44.self_attn.v_proj",
547
+ "language_model.model.layers.44.self_attn.o_proj",
548
+ "language_model.model.layers.44.feed_forward.router",
549
+ "language_model.model.layers.45.self_attn.q_proj",
550
+ "language_model.model.layers.45.self_attn.k_proj",
551
+ "language_model.model.layers.45.self_attn.v_proj",
552
+ "language_model.model.layers.45.self_attn.o_proj",
553
+ "language_model.model.layers.45.feed_forward.router",
554
+ "language_model.model.layers.46.self_attn.q_proj",
555
+ "language_model.model.layers.46.self_attn.k_proj",
556
+ "language_model.model.layers.46.self_attn.v_proj",
557
+ "language_model.model.layers.46.self_attn.o_proj",
558
+ "language_model.model.layers.46.feed_forward.router",
559
+ "language_model.model.layers.47.self_attn.q_proj",
560
+ "language_model.model.layers.47.self_attn.k_proj",
561
+ "language_model.model.layers.47.self_attn.v_proj",
562
+ "language_model.model.layers.47.self_attn.o_proj",
563
+ "language_model.model.layers.47.feed_forward.router",
564
+ "language_model.lm_head"
565
+ ],
566
+ "kv_cache_scheme": null,
567
+ "quant_method": "compressed-tensors",
568
+ "quantization_status": "compressed"
569
+ }
570
+ }
generation_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 200000,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 200001,
6
+ 200007,
7
+ 200008
8
+ ],
9
+ "pad_token_id": 200018,
10
+ "temperature": 0.6,
11
+ "top_p": 0.9,
12
+ "transformers_version": "4.51.0.dev0"
13
+ }
model-00001-of-00023.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:36c6c6b4a70837939c51ebbc1a3682c2dfd0695e5ba8f8509d8860088191cf51
3
+ size 4987679352
model-00002-of-00023.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:45564a59bceae860c561c0df3a749fc9cd53f9544f0f3eb7e20c3ea3abaeca50
3
+ size 4993243544
model-00003-of-00023.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5d3709c7ec359d32f8a2e5087abf949fa142aa277e44036af4532b654c412bee
3
+ size 4993249744
model-00004-of-00023.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cfbd86bd62812b58da4002f26e43d8ee6f6842a30c84cdc6d209dc08a47c80a9
3
+ size 4993385008
model-00005-of-00023.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3a2bdbd7070ebe089dd41dcd30d0974c846923fb28b864a34625828c8bb4720e
3
+ size 4993243448
model-00006-of-00023.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:086ff4c6faf089ed2dcc7e0526d0a88cec285ffbaa7022058f3f3e2028b5e00c
3
+ size 4993249968
model-00007-of-00023.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5c714823d9aa502d897584058d4591782582f98d97fe11533994f2ebce6c39c2
3
+ size 4993243800
model-00008-of-00023.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:39da01bc2ff9b736889b0d1bb4315ca841823e0bbd1a31fb0049e58e51ac9f8b
3
+ size 4993407816
model-00009-of-00023.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:40f590c5cbb908cc5c8d317594859894db13bcac090a7fd46b6d2818197816cc
3
+ size 4993227400
model-00010-of-00023.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:97f05df75e2df713003387ede5cc74881be14267e3b7ac14eec84353a25da127
3
+ size 4993243696
model-00011-of-00023.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d1d7b546425089d7f141df49c480b4ad85c925993e9507f351d470bdf731aadd
3
+ size 4993243736
model-00012-of-00023.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:48e491ca3534c4e178a6fd7c8f2560b01432a0915cae017d735b51aa01d131f6
3
+ size 4993249944
model-00013-of-00023.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:01fb18922b16aa486f4b86dc435125eb4b375c189186f07179fdd6a502cb237f
3
+ size 4993407832
model-00014-of-00023.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9aa009c000001a690cb7e4106639c19862a5b2deea42f6ae4b7121decbfaa22
3
+ size 4993221256
model-00015-of-00023.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d67b71b3b97463e83aadf441ffa9b6acb53408420c42c3e566dd023404174593
3
+ size 4993249848
model-00016-of-00023.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f4d43bdb18deaaf45a819c87f680e69c6b792449982efb8c300d1599b62cbe04
3
+ size 4993243744
model-00017-of-00023.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2b1470c606ee6cf40a30b6b9639b5f03cb3f005542d4a120a9edc7bca3bf3d8f
3
+ size 4993243808
model-00018-of-00023.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7a7c1418d67d020178d1eec2b95a94f1f18e8663186a4e0273a62cc9720fe637
3
+ size 4993413984
model-00019-of-00023.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f2640c96edc392c6d34f7bba8a7c02dba377b31d1c4719abfed3126ae76c458d
3
+ size 4993221256
model-00020-of-00023.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e527ea23f0451c33ace375998d9923b8596f91db1298be0d5b9fbd61286764cb
3
+ size 4993243704
model-00021-of-00023.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c7a9fa2abdef027a6c0b606834d006715aaec059de76eabf78ea382117d312eb
3
+ size 4993249896
model-00022-of-00023.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b5275c8540db1f40baf698cf480c844f534fbdc918717ddc654f49c3fe27f45
3
+ size 4993243816
model-00023-of-00023.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:77ac77a23183abb72299176e5782765908f48c37c480b3256686046ce365e64a
3
+ size 4796555224
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
preprocessor_config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "crop_size": null,
3
+ "data_format": "channels_first",
4
+ "default_to_square": true,
5
+ "device": null,
6
+ "do_center_crop": null,
7
+ "do_convert_rgb": true,
8
+ "do_normalize": true,
9
+ "do_rescale": true,
10
+ "do_resize": true,
11
+ "image_mean": [
12
+ 0.5,
13
+ 0.5,
14
+ 0.5
15
+ ],
16
+ "image_processor_type": "Llama4ImageProcessorFast",
17
+ "image_std": [
18
+ 0.5,
19
+ 0.5,
20
+ 0.5
21
+ ],
22
+ "input_data_format": null,
23
+ "max_patches": 16,
24
+ "processor_class": "Llama4Processor",
25
+ "resample": 2,
26
+ "rescale_factor": 0.00392156862745098,
27
+ "resize_to_max_canvas": false,
28
+ "return_tensors": null,
29
+ "size": {
30
+ "height": 336,
31
+ "width": 336
32
+ }
33
+ }
processor_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "fake_image_token": "<|image|>",
3
+ "image_token": "<|image|>",
4
+ "patch_size": 14,
5
+ "processor_class": "Llama4Processor"
6
+ }
recipe.yaml ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ default_stage:
2
+ default_modifiers:
3
+ QuantizationModifier:
4
+ config_groups:
5
+ group_0:
6
+ targets: [Linear]
7
+ weights: {num_bits: 8, type: float, symmetric: true, strategy: channel, observer: mse}
8
+ input_activations: {num_bits: 8, type: float, symmetric: true, strategy: token,
9
+ dynamic: true, observer: null}
10
+ output_activations: null
11
+ ignore: ['re:.*lm_head', 're:.*self_attn', 're:.*router', 're:.*vision_model', 're:.*multi_modal_projector']
12
+ targets: [Linear]
special_tokens_map.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<|begin_of_text|>",
3
+ "eos_token": "<|eot|>",
4
+ "pad_token": "<|finetune_right_pad_id|>"
5
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:172c9eb4beafc72601690da3ccfcede5c2e6806a8d5ec1fca33e22acea8023a4
3
+ size 27948578
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d0bdbaf59b0762c8c807617e2d8ea51420eb1b1de266df2495be755c8e0ed6ed
3
+ size 3622230
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff