diff --git "a/logs.txt" "b/logs.txt"
new file mode 100644--- /dev/null
+++ "b/logs.txt"
@@ -0,0 +1,498 @@
+/home/junrushao/micromamba/envs/python311/bin/python -m mlc_chat gen_config /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1 --quantization q3f16_1 --conv-template LM --output /home/junrushao/tmp/tmpoazjl9lj --context-window-size 8192
+[2024-01-08 05:07:21] INFO auto_config.py:115: [92mFound[0m model configuration: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/config.json
+[2024-01-08 05:07:21] INFO auto_config.py:151: [92mFound[0m model type: [1mmixtral[0m. Use `--model-type` to override.
+[2024-01-08 05:07:21] INFO mistral_model.py:54: [1mprefill_chunk_size[0m defaults to [1msliding_window_size[0m (4096)
+[2024-01-08 05:07:21] WARNING compiler_flags.py:111: [91mWarning[0m: Cannot override [1mcontext_window_size[0m, because [1mMixtralConfig[0m does not have this field
+[2024-01-08 05:07:21] INFO gen_config.py:117: [generation_config.json] Setting [1mbos_token_id[0m: 1
+[2024-01-08 05:07:21] INFO gen_config.py:117: [generation_config.json] Setting [1meos_token_id[0m: 2
+[2024-01-08 05:07:21] INFO gen_config.py:129: [92mFound[0m tokenizer config: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/tokenizer.model. Copying to [1m/home/junrushao/tmp/tmpoazjl9lj/tokenizer.model[0m
+[2024-01-08 05:07:21] INFO gen_config.py:129: [92mFound[0m tokenizer config: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/tokenizer.json. Copying to [1m/home/junrushao/tmp/tmpoazjl9lj/tokenizer.json[0m
+[2024-01-08 05:07:21] INFO gen_config.py:131: [91mNot found[0m tokenizer config: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/vocab.json
+[2024-01-08 05:07:21] INFO gen_config.py:131: [91mNot found[0m tokenizer config: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/merges.txt
+[2024-01-08 05:07:21] INFO gen_config.py:131: [91mNot found[0m tokenizer config: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/added_tokens.json
+[2024-01-08 05:07:21] INFO gen_config.py:129: [92mFound[0m tokenizer config: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/tokenizer_config.json. Copying to [1m/home/junrushao/tmp/tmpoazjl9lj/tokenizer_config.json[0m
+[2024-01-08 05:07:21] INFO gen_config.py:70: [System default] Setting [1mpad_token_id[0m: 0
+[2024-01-08 05:07:21] INFO gen_config.py:70: [System default] Setting [1mtemperature[0m: 0.7
+[2024-01-08 05:07:21] INFO gen_config.py:70: [System default] Setting [1mrepetition_penalty[0m: 1.0
+[2024-01-08 05:07:21] INFO gen_config.py:70: [System default] Setting [1mtop_p[0m: 0.95
+[2024-01-08 05:07:21] INFO gen_config.py:70: [System default] Setting [1mmean_gen_len[0m: 128
+[2024-01-08 05:07:21] INFO gen_config.py:70: [System default] Setting [1mmax_gen_len[0m: 512
+[2024-01-08 05:07:21] INFO gen_config.py:70: [System default] Setting [1mshift_fill_factor[0m: 0.3
+[2024-01-08 05:07:21] INFO gen_config.py:159: Dumping configuration file to: [1m/home/junrushao/tmp/tmpoazjl9lj/mlc-chat-config.json[0m
+/home/junrushao/micromamba/envs/python311/bin/python -m mlc_chat convert_weight /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1 --quantization q3f16_1 --source-format auto --output /home/junrushao/tmp/tmpoazjl9lj
+[2024-01-08 05:07:22] INFO auto_config.py:115: [92mFound[0m model configuration: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/config.json
+[2024-01-08 05:07:22] INFO auto_device.py:76: [92mFound[0m device: cuda:0
+[2024-01-08 05:07:22] INFO auto_device.py:76: [92mFound[0m device: cuda:1
+[2024-01-08 05:07:22] INFO auto_device.py:76: [92mFound[0m device: cuda:2
+[2024-01-08 05:07:22] INFO auto_device.py:76: [92mFound[0m device: cuda:3
+[2024-01-08 05:07:23] INFO auto_device.py:85: [91mNot found[0m device: rocm:0
+[2024-01-08 05:07:23] INFO auto_device.py:85: [91mNot found[0m device: metal:0
+[2024-01-08 05:07:23] INFO auto_device.py:85: [91mNot found[0m device: vulkan:0
+[2024-01-08 05:07:24] INFO auto_device.py:85: [91mNot found[0m device: opencl:0
+[2024-01-08 05:07:24] INFO auto_device.py:33: Using device: [1mcuda:0[0m
+[2024-01-08 05:07:24] INFO auto_weight.py:70: Finding weights in: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1
+[2024-01-08 05:07:24] INFO auto_weight.py:136: [91mNot found[0m Huggingface PyTorch
+[2024-01-08 05:07:24] INFO auto_weight.py:143: [92mFound[0m source weight format: huggingface-safetensor. Source configuration: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model.safetensors.index.json
+[2024-01-08 05:07:24] INFO auto_weight.py:106: Using source weight configuration: [1m/opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model.safetensors.index.json[0m. Use `--source` to override.
+[2024-01-08 05:07:24] INFO auto_weight.py:110: Using source weight format: [1mhuggingface-safetensor[0m. Use `--source-format` to override.
+[2024-01-08 05:07:24] INFO auto_config.py:151: [92mFound[0m model type: [1mmixtral[0m. Use `--model-type` to override.
+[2024-01-08 05:07:24] INFO mistral_model.py:54: [1mprefill_chunk_size[0m defaults to [1msliding_window_size[0m (4096)
+[1mWeight conversion with arguments:[0m
+  [1m--config[0m          /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/config.json
+  [1m--quantization[0m    GroupQuantize(name='q3f16_1', kind='group-quant', group_size=40, quantize_dtype='int3', storage_dtype='uint32', model_dtype='float16', num_elem_per_storage=10, num_storage_per_group=4, max_int_value=3)
+  [1m--model-type[0m      mixtral
+  [1m--device[0m          cuda:0
+  [1m--source[0m          /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model.safetensors.index.json
+  [1m--source-format[0m   huggingface-safetensor
+  [1m--output[0m          /home/junrushao/tmp/tmpoazjl9lj
+  0%|                                                                                                                                                                 | 0/227 [00:00<?, ?it/s]                                                                                                                                                                                              [2024-01-08 05:07:59] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00019-of-00019.safetensors
+  0%|                                                                                                                                                                 | 0/227 [00:00<?, ?it/s]                                                                                                                                                                                              [2024-01-08 05:08:04] INFO group_quantization.py:212: Compiling quantize function for key: (32000, 4096, 'float16', 'cuda')
+  0%|                                                                                                                                                                 | 0/227 [00:05<?, ?it/s]                                                                                                                                                                                              [2024-01-08 05:08:06] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mlm_head.q_weight[0m", shape: (32000, 412), dtype: uint32
+  0%|                                                                                                                                                                 | 0/227 [00:07<?, ?it/s]                                                                                                                                                                                              [2024-01-08 05:08:06] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mlm_head.q_scale[0m", shape: (32000, 103), dtype: float16
+  0%|                                                                                                                                                                 | 0/227 [00:07<?, ?it/s]  0%|▋                                                                                                                                                        | 1/227 [00:07<26:52,  7.14s/it]                                                                                                                                                                                              [2024-01-08 05:08:06] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00018-of-00019.safetensors
+  0%|▋                                                                                                                                                        | 1/227 [00:07<26:52,  7.14s/it]                                                                                                                                                                                              [2024-01-08 05:08:37] INFO group_quantization.py:212: Compiling quantize function for key: (8, 28672, 4096, 'float16', 'cuda')
+  0%|▋                                                                                                                                                        | 1/227 [00:38<26:52,  7.14s/it]                                                                                                                                                                                              [2024-01-08 05:08:38] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.30.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+  0%|▋                                                                                                                                                        | 1/227 [00:39<26:52,  7.14s/it]                                                                                                                                                                                              [2024-01-08 05:08:38] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.30.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+  0%|▋                                                                                                                                                        | 1/227 [00:39<26:52,  7.14s/it]  1%|█▎                                                                                                                                                     | 2/227 [00:39<1:22:32, 22.01s/it]                                                                                                                                                                                              [2024-01-08 05:08:52] INFO group_quantization.py:212: Compiling quantize function for key: (8, 4096, 14336, 'float16', 'cuda')
+  1%|█▎                                                                                                                                                     | 2/227 [00:53<1:22:32, 22.01s/it]                                                                                                                                                                                              [2024-01-08 05:08:53] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.30.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+  1%|█▎                                                                                                                                                     | 2/227 [00:54<1:22:32, 22.01s/it]                                                                                                                                                                                              [2024-01-08 05:08:53] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.30.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+  1%|█▎                                                                                                                                                     | 2/227 [00:54<1:22:32, 22.01s/it]  1%|█▉                                                                                                                                                     | 3/227 [00:54<1:10:19, 18.83s/it]                                                                                                                                                                                              [2024-01-08 05:08:53] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.30.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+  1%|█▉                                                                                                                                                     | 3/227 [00:54<1:10:19, 18.83s/it]                                                                                                                                                                                              [2024-01-08 05:08:53] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.30.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+  1%|█▉                                                                                                                                                     | 3/227 [00:54<1:10:19, 18.83s/it]                                                                                                                                                                                              [2024-01-08 05:09:41] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.31.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+  1%|█▉                                                                                                                                                     | 3/227 [01:41<1:10:19, 18.83s/it]                                                                                                                                                                                              [2024-01-08 05:09:41] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.31.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+  1%|█▉                                                                                                                                                     | 3/227 [01:42<1:10:19, 18.83s/it]  3%|███▉                                                                                                                                                   | 6/227 [01:42<1:02:07, 16.86s/it]                                                                                                                                                                                              [2024-01-08 05:09:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.31.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+  3%|███▉                                                                                                                                                   | 6/227 [01:50<1:02:07, 16.86s/it]                                                                                                                                                                                              [2024-01-08 05:09:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.31.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+  3%|███▉                                                                                                                                                   | 6/227 [01:50<1:02:07, 16.86s/it]  3%|████▋                                                                                                                                                    | 7/227 [01:50<54:17, 14.81s/it]                                                                                                                                                                                              [2024-01-08 05:09:49] INFO group_quantization.py:212: Compiling quantize function for key: (8, 4096, 'float16', 'cuda')
+  3%|████▋                                                                                                                                                    | 7/227 [01:50<54:17, 14.81s/it]                                                                                                                                                                                              [2024-01-08 05:09:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.31.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+  3%|████▋                                                                                                                                                    | 7/227 [01:51<54:17, 14.81s/it]                                                                                                                                                                                              [2024-01-08 05:09:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.31.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+  3%|████▋                                                                                                                                                    | 7/227 [01:51<54:17, 14.81s/it]  4%|█████▍                                                                                                                                                   | 8/227 [01:51<40:45, 11.17s/it]                                                                                                                                                                                              [2024-01-08 05:09:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.31.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+  4%|█████▍                                                                                                                                                   | 8/227 [01:51<40:45, 11.17s/it]                                                                                                                                                                                              [2024-01-08 05:09:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.31.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+  4%|█████▍                                                                                                                                                   | 8/227 [01:51<40:45, 11.17s/it]                                                                                                                                                                                              [2024-01-08 05:09:50] INFO group_quantization.py:212: Compiling quantize function for key: (6144, 4096, 'float16', 'cuda')
+  4%|█████▍                                                                                                                                                   | 8/227 [01:51<40:45, 11.17s/it]                                                                                                                                                                                              [2024-01-08 05:09:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.31.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+  4%|█████▍                                                                                                                                                   | 8/227 [01:52<40:45, 11.17s/it]                                                                                                                                                                                              [2024-01-08 05:09:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.31.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+  4%|█████▍                                                                                                                                                   | 8/227 [01:52<40:45, 11.17s/it]  5%|███████▎                                                                                                                                                | 11/227 [01:52<19:54,  5.53s/it]                                                                                                                                                                                              [2024-01-08 05:09:51] INFO group_quantization.py:212: Compiling quantize function for key: (4096, 4096, 'float16', 'cuda')
+  5%|███████▎                                                                                                                                                | 11/227 [01:52<19:54,  5.53s/it]                                                                                                                                                                                              [2024-01-08 05:09:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.31.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+  5%|███████▎                                                                                                                                                | 11/227 [01:52<19:54,  5.53s/it]                                                                                                                                                                                              [2024-01-08 05:09:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.31.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+  5%|███████▎                                                                                                                                                | 11/227 [01:52<19:54,  5.53s/it]  5%|████████                                                                                                                                                | 12/227 [01:52<16:20,  4.56s/it]                                                                                                                                                                                              [2024-01-08 05:09:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.norm.weight[0m", shape: (4096,), dtype: float16
+  5%|████████                                                                                                                                                | 12/227 [01:52<16:20,  4.56s/it]                                                                                                                                                                                              [2024-01-08 05:09:51] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00018-of-00019.safetensors
+  5%|████████                                                                                                                                                | 12/227 [01:52<16:20,  4.56s/it]                                                                                                                                                                                              [2024-01-08 05:09:52] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00019-of-00019.safetensors
+  5%|████████                                                                                                                                                | 12/227 [01:53<16:20,  4.56s/it]                                                                                                                                                                                              [2024-01-08 05:09:52] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00001-of-00019.safetensors
+  5%|████████                                                                                                                                                | 12/227 [01:53<16:20,  4.56s/it]                                                                                                                                                                                              [2024-01-08 05:09:54] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.embed_tokens.q_weight[0m", shape: (32000, 412), dtype: uint32
+  5%|████████                                                                                                                                                | 12/227 [01:55<16:20,  4.56s/it]                                                                                                                                                                                              [2024-01-08 05:09:54] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.embed_tokens.q_scale[0m", shape: (32000, 103), dtype: float16
+  5%|████████                                                                                                                                                | 12/227 [01:55<16:20,  4.56s/it]  6%|█████████▎                                                                                                                                              | 14/227 [01:55<11:56,  3.36s/it]                                                                                                                                                                                              [2024-01-08 05:10:16] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.0.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+  6%|█████████▎                                                                                                                                              | 14/227 [02:16<11:56,  3.36s/it]                                                                                                                                                                                              [2024-01-08 05:10:16] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.0.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+  6%|█████████▎                                                                                                                                              | 14/227 [02:17<11:56,  3.36s/it]  7%|██████████                                                                                                                                              | 15/227 [02:17<25:21,  7.18s/it]                                                                                                                                                                                              [2024-01-08 05:10:20] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.0.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+  7%|██████████                                                                                                                                              | 15/227 [02:21<25:21,  7.18s/it]                                                                                                                                                                                              [2024-01-08 05:10:20] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.0.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+  7%|██████████                                                                                                                                              | 15/227 [02:21<25:21,  7.18s/it]  7%|██████████▋                                                                                                                                             | 16/227 [02:21<23:09,  6.59s/it]                                                                                                                                                                                              [2024-01-08 05:10:20] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.0.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+  7%|██████████▋                                                                                                                                             | 16/227 [02:21<23:09,  6.59s/it]                                                                                                                                                                                              [2024-01-08 05:10:20] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.0.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+  7%|██████████▋                                                                                                                                             | 16/227 [02:21<23:09,  6.59s/it]                                                                                                                                                                                              [2024-01-08 05:10:20] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.0.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+  7%|██████████▋                                                                                                                                             | 16/227 [02:21<23:09,  6.59s/it]                                                                                                                                                                                              [2024-01-08 05:10:20] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.0.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+  7%|██████████▋                                                                                                                                             | 16/227 [02:21<23:09,  6.59s/it]                                                                                                                                                                                              [2024-01-08 05:10:21] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.0.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+  7%|██████████▋                                                                                                                                             | 16/227 [02:22<23:09,  6.59s/it]                                                                                                                                                                                              [2024-01-08 05:10:21] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.0.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+  7%|██████████▋                                                                                                                                             | 16/227 [02:22<23:09,  6.59s/it]  9%|█████████████▍                                                                                                                                          | 20/227 [02:22<10:06,  2.93s/it]                                                                                                                                                                                              [2024-01-08 05:10:21] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.0.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+  9%|█████████████▍                                                                                                                                          | 20/227 [02:22<10:06,  2.93s/it]                                                                                                                                                                                              [2024-01-08 05:10:21] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.0.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+  9%|█████████████▍                                                                                                                                          | 20/227 [02:22<10:06,  2.93s/it]                                                                                                                                                                                              [2024-01-08 05:10:21] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00002-of-00019.safetensors
+  9%|█████████████▍                                                                                                                                          | 20/227 [02:22<10:06,  2.93s/it]                                                                                                                                                                                              [2024-01-08 05:10:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.1.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+  9%|█████████████▍                                                                                                                                          | 20/227 [02:46<10:06,  2.93s/it]                                                                                                                                                                                              [2024-01-08 05:10:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.1.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+  9%|█████████████▍                                                                                                                                          | 20/227 [02:46<10:06,  2.93s/it] 10%|██████████████▋                                                                                                                                         | 22/227 [02:46<19:02,  5.57s/it]                                                                                                                                                                                              [2024-01-08 05:10:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.1.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 10%|██████████████▋                                                                                                                                         | 22/227 [02:52<19:02,  5.57s/it]                                                                                                                                                                                              [2024-01-08 05:10:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.1.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 10%|██████████████▋                                                                                                                                         | 22/227 [02:52<19:02,  5.57s/it] 10%|███████████████▍                                                                                                                                        | 23/227 [02:52<19:12,  5.65s/it]                                                                                                                                                                                              [2024-01-08 05:10:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.1.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 10%|███████████████▍                                                                                                                                        | 23/227 [02:52<19:12,  5.65s/it]                                                                                                                                                                                              [2024-01-08 05:10:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.1.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 10%|███████████████▍                                                                                                                                        | 23/227 [02:52<19:12,  5.65s/it]                                                                                                                                                                                              [2024-01-08 05:10:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.1.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 10%|███████████████▍                                                                                                                                        | 23/227 [02:52<19:12,  5.65s/it]                                                                                                                                                                                              [2024-01-08 05:10:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.1.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 10%|███████████████▍                                                                                                                                        | 23/227 [02:52<19:12,  5.65s/it] 11%|████████████████▋                                                                                                                                       | 25/227 [02:52<12:59,  3.86s/it]                                                                                                                                                                                              [2024-01-08 05:10:52] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.1.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 11%|████████████████▋                                                                                                                                       | 25/227 [02:53<12:59,  3.86s/it]                                                                                                                                                                                              [2024-01-08 05:10:52] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.1.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 11%|████████████████▋                                                                                                                                       | 25/227 [02:53<12:59,  3.86s/it] 11%|█████████████████▍                                                                                                                                      | 26/227 [02:53<10:38,  3.18s/it]                                                                                                                                                                                              [2024-01-08 05:10:52] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.1.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 11%|█████████████████▍                                                                                                                                      | 26/227 [02:53<10:38,  3.18s/it]                                                                                                                                                                                              [2024-01-08 05:10:52] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.1.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 11%|█████████████████▍                                                                                                                                      | 26/227 [02:53<10:38,  3.18s/it]                                                                                                                                                                                              [2024-01-08 05:11:14] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.2.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 11%|█████████████████▍                                                                                                                                      | 26/227 [03:15<10:38,  3.18s/it]                                                                                                                                                                                              [2024-01-08 05:11:15] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.2.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 11%|█████████████████▍                                                                                                                                      | 26/227 [03:16<10:38,  3.18s/it] 13%|███████████████████▍                                                                                                                                    | 29/227 [03:16<17:08,  5.20s/it]                                                                                                                                                                                              [2024-01-08 05:11:21] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.2.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 13%|███████████████████▍                                                                                                                                    | 29/227 [03:22<17:08,  5.20s/it]                                                                                                                                                                                              [2024-01-08 05:11:21] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.2.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 13%|███████████████████▍                                                                                                                                    | 29/227 [03:22<17:08,  5.20s/it] 13%|████████████████████                                                                                                                                    | 30/227 [03:22<17:30,  5.33s/it]                                                                                                                                                                                              [2024-01-08 05:11:21] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.2.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 13%|████████████████████                                                                                                                                    | 30/227 [03:22<17:30,  5.33s/it]                                                                                                                                                                                              [2024-01-08 05:11:21] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.2.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 13%|████████████████████                                                                                                                                    | 30/227 [03:22<17:30,  5.33s/it]                                                                                                                                                                                              [2024-01-08 05:11:21] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.2.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 13%|████████████████████                                                                                                                                    | 30/227 [03:22<17:30,  5.33s/it]                                                                                                                                                                                              [2024-01-08 05:11:21] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.2.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 13%|████████████████████                                                                                                                                    | 30/227 [03:22<17:30,  5.33s/it]                                                                                                                                                                                              [2024-01-08 05:11:22] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.2.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 13%|████████████████████                                                                                                                                    | 30/227 [03:22<17:30,  5.33s/it]                                                                                                                                                                                              [2024-01-08 05:11:22] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.2.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 13%|████████████████████                                                                                                                                    | 30/227 [03:22<17:30,  5.33s/it] 15%|██████████████████████▊                                                                                                                                 | 34/227 [03:22<08:53,  2.77s/it]                                                                                                                                                                                              [2024-01-08 05:11:22] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.2.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 15%|██████████████████████▊                                                                                                                                 | 34/227 [03:23<08:53,  2.77s/it]                                                                                                                                                                                              [2024-01-08 05:11:22] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.2.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 15%|██████████████████████▊                                                                                                                                 | 34/227 [03:23<08:53,  2.77s/it]                                                                                                                                                                                              [2024-01-08 05:11:22] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00001-of-00019.safetensors
+ 15%|██████████████████████▊                                                                                                                                 | 34/227 [03:23<08:53,  2.77s/it]                                                                                                                                                                                              [2024-01-08 05:11:22] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00003-of-00019.safetensors
+ 15%|██████████████████████▊                                                                                                                                 | 34/227 [03:23<08:53,  2.77s/it]                                                                                                                                                                                              [2024-01-08 05:11:43] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.3.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 15%|██████████████████████▊                                                                                                                                 | 34/227 [03:44<08:53,  2.77s/it]                                                                                                                                                                                              [2024-01-08 05:11:43] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.3.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 15%|██████████████████████▊                                                                                                                                 | 34/227 [03:44<08:53,  2.77s/it] 16%|████████████████████████                                                                                                                                | 36/227 [03:44<15:33,  4.89s/it]                                                                                                                                                                                              [2024-01-08 05:11:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.3.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 16%|████████████████████████                                                                                                                                | 36/227 [03:50<15:33,  4.89s/it]                                                                                                                                                                                              [2024-01-08 05:11:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.3.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 16%|████████████████████████                                                                                                                                | 36/227 [03:50<15:33,  4.89s/it] 16%|████████████████████████▊                                                                                                                               | 37/227 [03:50<16:05,  5.08s/it]                                                                                                                                                                                              [2024-01-08 05:11:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.3.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 16%|████████████████████████▊                                                                                                                               | 37/227 [03:50<16:05,  5.08s/it]                                                                                                                                                                                              [2024-01-08 05:11:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.3.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 16%|████████████████████████▊                                                                                                                               | 37/227 [03:50<16:05,  5.08s/it]                                                                                                                                                                                              [2024-01-08 05:11:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.3.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 16%|████████████████████████▊                                                                                                                               | 37/227 [03:51<16:05,  5.08s/it]                                                                                                                                                                                              [2024-01-08 05:11:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.3.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 16%|████████████████████████▊                                                                                                                               | 37/227 [03:51<16:05,  5.08s/it] 17%|██████████████████████████                                                                                                                              | 39/227 [03:51<11:08,  3.56s/it]                                                                                                                                                                                              [2024-01-08 05:11:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.3.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 17%|██████████████████████████                                                                                                                              | 39/227 [03:51<11:08,  3.56s/it]                                                                                                                                                                                              [2024-01-08 05:11:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.3.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 17%|██████████████████████████                                                                                                                              | 39/227 [03:51<11:08,  3.56s/it] 18%|██████████████████████████▊                                                                                                                             | 40/227 [03:51<09:07,  2.93s/it]                                                                                                                                                                                              [2024-01-08 05:11:50] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00002-of-00019.safetensors
+ 18%|██████████████████████████▊                                                                                                                             | 40/227 [03:51<09:07,  2.93s/it]                                                                                                                                                                                              [2024-01-08 05:11:50] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00003-of-00019.safetensors
+ 18%|██████████████████████████▊                                                                                                                             | 40/227 [03:51<09:07,  2.93s/it]                                                                                                                                                                                              [2024-01-08 05:11:51] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00006-of-00019.safetensors
+ 18%|██████████████████████████▊                                                                                                                             | 40/227 [03:52<09:07,  2.93s/it]                                                                                                                                                                                              [2024-01-08 05:11:52] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00007-of-00019.safetensors
+ 18%|██████████████████████████▊                                                                                                                             | 40/227 [03:53<09:07,  2.93s/it]                                                                                                                                                                                              [2024-01-08 05:12:01] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.10.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 18%|██████████████████████████▊                                                                                                                             | 40/227 [04:02<09:07,  2.93s/it]                                                                                                                                                                                              [2024-01-08 05:12:02] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.10.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 18%|██████████████████████████▊                                                                                                                             | 40/227 [04:02<09:07,  2.93s/it] 18%|█████████████████████���█████▍                                                                                                                            | 41/227 [04:03<14:45,  4.76s/it]                                                                                                                                                                                              [2024-01-08 05:12:06] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.10.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 18%|███████████████████████████▍                                                                                                                            | 41/227 [04:07<14:45,  4.76s/it]                                                                                                                                                                                              [2024-01-08 05:12:06] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.10.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 18%|███████████████████████████▍                                                                                                                            | 41/227 [04:07<14:45,  4.76s/it] 19%|████████████████████████████                                                                                                                            | 42/227 [04:07<14:17,  4.63s/it]                                                                                                                                                                                              [2024-01-08 05:12:06] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.10.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 19%|████████████████████████████                                                                                                                            | 42/227 [04:07<14:17,  4.63s/it]                                                                                                                                                                                              [2024-01-08 05:12:06] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.10.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 19%|████████████████████████████                                                                                                                            | 42/227 [04:07<14:17,  4.63s/it]                                                                                                                                                                                              [2024-01-08 05:12:06] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.10.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 19%|████████████████████████████                                                                                                                            | 42/227 [04:07<14:17,  4.63s/it]                                                                                                                                                                                              [2024-01-08 05:12:06] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.10.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 19%|████████████████████████████                                                                                                                            | 42/227 [04:07<14:17,  4.63s/it] 19%|█████████████████████████████▍                                                                                                                          | 44/227 [04:07<08:41,  2.85s/it]                                                                                                                                                                                              [2024-01-08 05:12:06] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.10.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 19%|█████████████████████████████▍                                                                                                                          | 44/227 [04:07<08:41,  2.85s/it]                                                                                                                                                                                              [2024-01-08 05:12:06] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.10.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 19%|█████████████████████████████▍                                                                                                                          | 44/227 [04:07<08:41,  2.85s/it] 20%|██████████████████████████████▏                                                                                                                         | 45/227 [04:07<06:50,  2.25s/it]                                                                                                                                                                                              [2024-01-08 05:12:06] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00007-of-00019.safetensors
+ 20%|██████████████████████████████▏                                                                                                                         | 45/227 [04:07<06:50,  2.25s/it]                                                                                                                                                                                              [2024-01-08 05:12:07] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00005-of-00019.safetensors
+ 20%|██████████████████████████████▏                                                                                                                         | 45/227 [04:08<06:50,  2.25s/it]                                                                                                                                                                                              [2024-01-08 05:12:11] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.8.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 20%|██████████████████████████████▏                                                                                                                         | 45/227 [04:11<06:50,  2.25s/it]                                                                                                                                                                                              [2024-01-08 05:12:11] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.8.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 20%|██████████████████████████████▏                                                                                                                         | 45/227 [04:12<06:50,  2.25s/it] 20%|██████████████████████████████▊                                                                                                                         | 46/227 [04:12<08:27,  2.80s/it]                                                                                                                                                                                              [2024-01-08 05:12:23] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.8.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 20%|██████████████████████████████▊                                                                                                                         | 46/227 [04:24<08:27,  2.80s/it]                                                                                                                                                                                              [2024-01-08 05:12:23] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.8.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 20%|██████████████████████████████▊                                                                                                                         | 46/227 [04:24<08:27,  2.80s/it] 21%|███████████████████████████████▍                                                                                                                        | 47/227 [04:24<15:53,  5.29s/it]                                                                                                                                                                                              [2024-01-08 05:12:23] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.8.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 21%|█���█████████████████████████████▍                                                                                                                        | 47/227 [04:24<15:53,  5.29s/it]                                                                                                                                                                                              [2024-01-08 05:12:23] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.8.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 21%|███████████████████████████████▍                                                                                                                        | 47/227 [04:24<15:53,  5.29s/it]                                                                                                                                                                                              [2024-01-08 05:12:38] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.9.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 21%|███████████████████████████████▍                                                                                                                        | 47/227 [04:39<15:53,  5.29s/it]                                                                                                                                                                                              [2024-01-08 05:12:38] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.9.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 21%|███████████████████████████████▍                                                                                                                        | 47/227 [04:39<15:53,  5.29s/it] 22%|█████████████████████████████████▍                                                                                                                      | 50/227 [04:39<15:08,  5.13s/it]                                                                                                                                                                                              [2024-01-08 05:12:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.9.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 22%|█████████████████████████████████▍                                                                                                                      | 50/227 [04:45<15:08,  5.13s/it]                                                                                                                                                                                              [2024-01-08 05:12:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.9.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 22%|█████████████████████████████████▍                                                                                                                      | 50/227 [04:45<15:08,  5.13s/it] 22%|██████████████████████████████████▏                                                                                                                     | 51/227 [04:45<15:17,  5.21s/it]                                                                                                                                                                                              [2024-01-08 05:12:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.9.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 22%|██████████████████████████████████▏                                                                                                                     | 51/227 [04:45<15:17,  5.21s/it]                                                                                                                                                                                              [2024-01-08 05:12:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.9.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 22%|██████████████████████████████████▏                                                                                                                     | 51/227 [04:45<15:17,  5.21s/it]                                                                                                                                                                                              [2024-01-08 05:12:44] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.9.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 22%|██████████████████████████████████▏                                                                                                                     | 51/227 [04:45<15:17,  5.21s/it]                                                                                                                                                                                              [2024-01-08 05:12:44] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.9.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 22%|██████████████████████████████████▏                                                                                                                     | 51/227 [04:45<15:17,  5.21s/it]                                                                                                                                                                                              [2024-01-08 05:12:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.9.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 22%|██████████████████████████████████▏                                                                                                                     | 51/227 [04:45<15:17,  5.21s/it]                                                                                                                                                                                              [2024-01-08 05:12:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.9.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 22%|██████████████████████████████████▏                                                                                                                     | 51/227 [04:45<15:17,  5.21s/it] 24%|████████████████████████████████████▊                                                                                                                   | 55/227 [04:45<07:05,  2.48s/it]                                                                                                                                                                                              [2024-01-08 05:12:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.9.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 24%|████████████████████████████████████▊                                                                                                                   | 55/227 [04:45<07:05,  2.48s/it]                                                                                                                                                                                              [2024-01-08 05:12:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.9.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 24%|████████████████████████████████████▊                                                                                                                   | 55/227 [04:45<07:05,  2.48s/it] 25%|█████████████████████████████████████▍                                                                                                                  | 56/227 [04:45<06:00,  2.11s/it]                                                                                                                                                                                              [2024-01-08 05:12:44] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00005-of-00019.safetensors
+ 25%|█████████████████████████████████████▍                                                                                                                  | 56/227 [04:45<06:00,  2.11s/it]                                                                                                                                                                                              [2024-01-08 05:12:45] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00006-of-00019.safetensors
+ 25%|█████████████████████████████████████▍                                                                                                                  | 56/227 [04:46<06:00,  2.11s/it]                                                                                                                                                                                              [2024-01-08 05:12:45] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00007-of-00019.safetensors
+ 25%|█████████████████████████████████████▍                                                                                                                  | 56/227 [04:46<06:00,  2.11s/it]                                                                                                                                                                                              [2024-01-08 05:12:46] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.10.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 25%|█████████████████████████████████████▍                                                                                                                  | 56/227 [04:47<06:00,  2.11s/it] 25%|██████████████████████████████████████▏                                                                                                                 | 57/227 [04:47<05:50,  2.06s/it]                                                                                                                                                                                              [2024-01-08 05:12:46] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.10.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 25%|██████████████████████████████████████▏                                                                                                                 | 57/227 [04:47<05:50,  2.06s/it]                                                                                                                                                                                              [2024-01-08 05:12:46] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00008-of-00019.safetensors
+ 25%|██████████████████████████████████████▏                                                                                                                 | 57/227 [04:47<05:50,  2.06s/it]                                                                                                                                                                                              [2024-01-08 05:12:58] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.11.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 25%|██████████████████████████████████████▏                                                                                                                 | 57/227 [04:59<05:50,  2.06s/it]                                                                                                                                                                                              [2024-01-08 05:12:58] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.11.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 25%|██████████████████████████████████████▏                                                                                                                 | 57/227 [04:59<05:50,  2.06s/it] 26%|███████████████████████████████████████▌                                                                                                                | 59/227 [04:59<09:38,  3.44s/it]                                                                                                                                                                                              [2024-01-08 05:13:02] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.11.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 26%|███████████████████████████████████████▌                                                                                                                | 59/227 [05:03<09:38,  3.44s/it]                                                                                                                                                                                              [2024-01-08 05:13:02] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.11.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 26%|███████████████████████████████████████▌                                                                                                                | 59/227 [05:03<09:38,  3.44s/it] 26%|████████████████████████████████████████▏                                                                                                               | 60/227 [05:03<10:03,  3.62s/it]                                                                                                                                                                                              [2024-01-08 05:13:02] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.11.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 26%|████████████████████████████████████████▏                                                                                                               | 60/227 [05:03<10:03,  3.62s/it]                                                                                                                                                                                              [2024-01-08 05:13:02] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.11.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 26%|████████████████████████████████████████▏                                                                                                               | 60/227 [05:03<10:03,  3.62s/it]                                                                                                                                                                                              [2024-01-08 05:13:03] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.11.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 26%|████████████████████████████████████████▏                                                                                                               | 60/227 [05:04<10:03,  3.62s/it]                                                                                                                                                                                              [2024-01-08 05:13:03] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.11.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 26%|████████████████████████████████████████▏                                                                                                               | 60/227 [05:04<10:03,  3.62s/it] 27%|█████████████████████████████████████████▌                                                                                                              | 62/227 [05:04<06:25,  2.34s/it]                                                                                                                                                                                              [2024-01-08 05:13:03] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.11.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 27%|█████████████████████████████████████████▌                                                                                                              | 62/227 [05:04<06:25,  2.34s/it]                                                                                                                                                                                              [2024-01-08 05:13:03] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.11.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 27%|█████████████████████████████████████████▌                                                                                                              | 62/227 [05:04<06:25,  2.34s/it]                                                                                                                                                                                              [2024-01-08 05:13:03] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.11.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 27%|█████████████████████████████████████████▌                                                                                                              | 62/227 [05:04<06:25,  2.34s/it]                                                                                                                                                                                              [2024-01-08 05:13:03] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.11.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 27%|█████████████████████████████████████████▌                                                                                                              | 62/227 [05:04<06:25,  2.34s/it]                                                                                                                                                                                              [2024-01-08 05:13:16] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.12.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 27%|█████████████████████████████████████████▌                                                                                                              | 62/227 [05:17<06:25,  2.34s/it]                                                                                                                                                                                              [2024-01-08 05:13:16] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.12.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 27%|█████████████████████████████████████████▌                                                                                                              | 62/227 [05:17<06:25,  2.34s/it] 29%|████████████████████████████████████████████▏                                                                                                           | 66/227 [05:17<07:43,  2.88s/it]                                                                                                                                                                                              [2024-01-08 05:13:22] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.12.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 29%|████████████████████████████████████████████▏                                                                                                           | 66/227 [05:23<07:43,  2.88s/it]                                                                                                                                                                                              [2024-01-08 05:13:22] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.12.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 29%|████████████████████████████████████████████▏                                                                                                           | 66/227 [05:23<07:43,  2.88s/it] 30%|████████████████████████████████████████████▊                                                                                                           | 67/227 [05:23<08:49,  3.31s/it]                                                                                                                                                                                              [2024-01-08 05:13:22] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.12.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 30%|████████████████████████████████████████████▊                                                                                                           | 67/227 [05:23<08:49,  3.31s/it]                                                                                                                                                                                              [2024-01-08 05:13:22] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.12.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 30%|████████████████████████████████████████████▊                                                                                                           | 67/227 [05:23<08:49,  3.31s/it]                                                                                                                                                                                              [2024-01-08 05:13:22] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.12.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 30%|████████████████████████████████████████████▊                                                                                                           | 67/227 [05:23<08:49,  3.31s/it]                                                                                                                                                                                              [2024-01-08 05:13:22] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.12.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 30%|████████████████████████████████████████████▊                                                                                                           | 67/227 [05:23<08:49,  3.31s/it]                                                                                                                                                                                              [2024-01-08 05:13:22] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.12.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 30%|████████████████████████████████████████████▊                                                                                                           | 67/227 [05:23<08:49,  3.31s/it]                                                                                                                                                                                              [2024-01-08 05:13:22] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.12.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 30%|████████████████████████████████████████████▊                                                                                                           | 67/227 [05:23<08:49,  3.31s/it] 31%|███████████████████████████████████████████████▌                                                                                                        | 71/227 [05:23<04:38,  1.78s/it]                                                                                                                                                                                              [2024-01-08 05:13:22] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.12.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 31%|███████████████████████████████████████████████▌                                                                                                        | 71/227 [05:23<04:38,  1.78s/it]                                                                                                                                                                                              [2024-01-08 05:13:22] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.12.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 31%|███████████████████████████████████████████████▌                                                                                                        | 71/227 [05:23<04:38,  1.78s/it]                                                                                                                                                                                              [2024-01-08 05:13:22] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00007-of-00019.safetensors
+ 31%|███████████████████████████████████████████████▌                                                                                                        | 71/227 [05:23<04:38,  1.78s/it]                                                                                                                                                                                              [2024-01-08 05:13:23] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00009-of-00019.safetensors
+ 31%|███████████████████████████████████████████████▌                                                                                                        | 71/227 [05:24<04:38,  1.78s/it]                                                                                                                                                                                              [2024-01-08 05:13:38] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.13.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 31%|███████████████████████████████████████████████▌                                                                                                        | 71/227 [05:39<04:38,  1.78s/it]                                                                                                                                                                                              [2024-01-08 05:13:38] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.13.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 31%|███████████████████████████████████████████████▌                                                                                                        | 71/227 [05:39<04:38,  1.78s/it] 32%|████████████████████████████████████████████████▉                                                                                                       | 73/227 [05:39<08:30,  3.32s/it]                                                                                                                                                                                              [2024-01-08 05:13:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.13.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 32%|████████████████████████████████████████████████▉                                                                                                       | 73/227 [05:45<08:30,  3.32s/it]                                                                                                                                                                                              [2024-01-08 05:13:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.13.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 32%|████████████████████████████████████████████████▉                                                                                                       | 73/227 [05:45<08:30,  3.32s/it] 33%|█████████████████████████████████████████████████▌                                                                                                      | 74/227 [05:45<09:35,  3.76s/it]                                                                                                                                                                                              [2024-01-08 05:13:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.13.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 33%|█████████████████████████████████████████████████▌                                                                                                      | 74/227 [05:45<09:35,  3.76s/it]                                                                                                                                                                                              [2024-01-08 05:13:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.13.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 33%|█████████████████████████████████████████████████▌                                                                                                      | 74/227 [05:45<09:35,  3.76s/it]                                                                                                                                                                                              [2024-01-08 05:13:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.13.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 33%|█████████████████████████████████████████████████▌                                                                                                      | 74/227 [05:45<09:35,  3.76s/it]                                                                                                                                                                                              [2024-01-08 05:13:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.13.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 33%|█████████████████████████████████████████████████▌                                                                                                      | 74/227 [05:45<09:35,  3.76s/it] 33%|██████████████████████████████████████████████████▉                                                                                                     | 76/227 [05:45<06:42,  2.66s/it]                                                                                                                                                                                              [2024-01-08 05:13:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.13.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 33%|██████████████████████████████████████████████████▉                                                                                                     | 76/227 [05:45<06:42,  2.66s/it]                                                                                                                                                                                              [2024-01-08 05:13:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.13.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 33%|██████████████████████████████████████████████████▉                                                                                                     | 76/227 [05:45<06:42,  2.66s/it]                                                                                                                                                                                              [2024-01-08 05:13:45] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.13.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 33%|██████████████████████████████████████████████████▉                                                                                                     | 76/227 [05:45<06:42,  2.66s/it]                                                                                                                                                                                              [2024-01-08 05:13:45] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.13.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 33%|██████████████████████████████████████████████████▉                                                                                                     | 76/227 [05:45<06:42,  2.66s/it]                                                                                                                                                                                              [2024-01-08 05:13:58] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.14.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 33%|██████████████████████████████████████████████████▉                                                                                                     | 76/227 [05:59<06:42,  2.66s/it]                                                                                                                                                                                              [2024-01-08 05:13:58] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.14.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 33%|██████████████████████████████████████████████████▉                                                                                                     | 76/227 [05:59<06:42,  2.66s/it] 35%|█████████████████████████████████████████████████████▌                                                                                                  | 80/227 [05:59<07:21,  3.00s/it]                                                                                                                                                                                              [2024-01-08 05:14:04] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.14.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 35%|█████████████████████████████████████████████████████▌                                                                                                  | 80/227 [06:05<07:21,  3.00s/it]                                                                                                                                                                                              [2024-01-08 05:14:04] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.14.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 35%|█████████████████████████████████████████████████████▌                                                                                                  | 80/227 [06:05<07:21,  3.00s/it] 36%|██████████████████████████████████████████████████████▏                                                                                                 | 81/227 [06:05<08:24,  3.46s/it]                                                                                                                                                                                              [2024-01-08 05:14:04] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.14.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 36%|██████████████████████████████████████████████████████▏                                                                                                 | 81/227 [06:05<08:24,  3.46s/it]                                                                                                                                                                                              [2024-01-08 05:14:04] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.14.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 36%|██████████████████████████████████████████████████████▏                                                                                                 | 81/227 [06:05<08:24,  3.46s/it]                                                                                                                                                                                              [2024-01-08 05:14:04] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.14.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 36%|██████████████████████████████████████████████████████▏                                                                                                 | 81/227 [06:05<08:24,  3.46s/it]                                                                                                                                                                                              [2024-01-08 05:14:04] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.14.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 36%|██████████████████████████████████████████████████████▏                                                                                                 | 81/227 [06:05<08:24,  3.46s/it]                                                                                                                                                                                              [2024-01-08 05:14:04] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.14.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 36%|██████████████████████████████████████████████████████▏                                                                                                 | 81/227 [06:05<08:24,  3.46s/it]                                                                                                                                                                                              [2024-01-08 05:14:04] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.14.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 36%|██████████████████████████████████████████████████████▏                                                                                                 | 81/227 [06:05<08:24,  3.46s/it] 37%|████████████████████████████████████████████████████████▉                                                                                               | 85/227 [06:05<04:35,  1.94s/it]                                                                                                                                                                                              [2024-01-08 05:14:05] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.14.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 37%|████████████████████████████████████████████████████████▉                                                                                               | 85/227 [06:06<04:35,  1.94s/it]                                                                                                                                                                                              [2024-01-08 05:14:05] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.14.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 37%|████████████████████████████████████████████████████████▉                                                                                               | 85/227 [06:06<04:35,  1.94s/it]                                                                                                                                                                                              [2024-01-08 05:14:05] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00008-of-00019.safetensors
+ 37%|████████████████████████████████████████████████████████▉                                                                                               | 85/227 [06:06<04:35,  1.94s/it]                                                                                                                                                                                              [2024-01-08 05:14:05] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00010-of-00019.safetensors
+ 37%|████████████████████████████████████████████████████████▉                                                                                               | 85/227 [06:06<04:35,  1.94s/it]                                                                                                                                                                                              [2024-01-08 05:14:19] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.15.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 37%|████████████████████████████████████████████████████████▉                                                                                               | 85/227 [06:20<04:35,  1.94s/it]                                                                                                                                                                                              [2024-01-08 05:14:19] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.15.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 37%|███████████████████████████████████���████████████████████▉                                                                                               | 85/227 [06:20<04:35,  1.94s/it] 38%|██████████████████████████████████████████████████████████▎                                                                                             | 87/227 [06:20<07:33,  3.24s/it]                                                                                                                                                                                              [2024-01-08 05:14:26] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.15.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 38%|██████████████████████████████████████████████████████████▎                                                                                             | 87/227 [06:27<07:33,  3.24s/it]                                                                                                                                                                                              [2024-01-08 05:14:26] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.15.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 38%|██████████████████████████████████████████████████████████▎                                                                                             | 87/227 [06:27<07:33,  3.24s/it] 39%|██████████████████████████████████████████████████████████▉                                                                                             | 88/227 [06:27<08:49,  3.81s/it]                                                                                                                                                                                              [2024-01-08 05:14:26] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.15.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 39%|██████████████████████████████████████████████████████████▉                                                                                             | 88/227 [06:27<08:49,  3.81s/it]                                                                                                                                                                                              [2024-01-08 05:14:26] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.15.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 39%|██████████████████████████████████████████████████████████▉                                                                                             | 88/227 [06:27<08:49,  3.81s/it]                                                                                                                                                                                              [2024-01-08 05:14:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.15.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 39%|██████████████████████████████████████████████████████████▉                                                                                             | 88/227 [06:27<08:49,  3.81s/it]                                                                                                                                                                                              [2024-01-08 05:14:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.15.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 39%|██████████████████████████████████████████████████████████▉                                                                                             | 88/227 [06:27<08:49,  3.81s/it] 40%|████████████████████████████████████████████████████████████▎                                                                                           | 90/227 [06:27<06:14,  2.73s/it]                                                                                                                                                                                              [2024-01-08 05:14:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.15.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 40%|████████████████████████████████████████████████████████████▎                                                                                           | 90/227 [06:28<06:14,  2.73s/it]                                                                                                                                                                                              [2024-01-08 05:14:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.15.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 40%|████████████████████████████████████████████████████████████▎                                                                                           | 90/227 [06:28<06:14,  2.73s/it] 40%|████████████████████████████████████████████████████████████▉                                                                                           | 91/227 [06:28<05:09,  2.27s/it]                                                                                                                                                                                              [2024-01-08 05:14:27] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.15.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 40%|████████████████████████████████████████████████████████████▉                                                                                           | 91/227 [06:28<05:09,  2.27s/it]                                                                                                                                                                                              [2024-01-08 05:14:27] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.15.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 40%|████████████████████████████████████████████████████████████▉                                                                                           | 91/227 [06:28<05:09,  2.27s/it]                                                                                                                                                                                              [2024-01-08 05:14:27] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00009-of-00019.safetensors
+ 40%|████████████████████████████████████████████████████████████▉                                                                                           | 91/227 [06:28<05:09,  2.27s/it]                                                                                                                                                                                              [2024-01-08 05:14:27] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00011-of-00019.safetensors
+ 40%|████████████████████████████████████████████████████████████▉                                                                                           | 91/227 [06:28<05:09,  2.27s/it]                                                                                                                                                                                              [2024-01-08 05:14:42] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.16.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 40%|████████████████████████████████████████████████████████████▉                                                                                           | 91/227 [06:43<05:09,  2.27s/it]                                                                                                                                                                                              [2024-01-08 05:14:42] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.16.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 40%|████████████████████████████████████████████████████████████▉                                                                                           | 91/227 [06:43<05:09,  2.27s/it] 41%|██████████████████████████████████████████████████████████████▉                                                                                         | 94/227 [06:43<07:48,  3.53s/it]                                                                                                                                                                                              [2024-01-08 05:14:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.16.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 41%|██████████████████████████████████████████████████████████████▉                                                                                         | 94/227 [06:48<07:48,  3.53s/it]                                                                                                                                                                                              [2024-01-08 05:14:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.16.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 41%|██████████████████████████████████████████████████████████████▉                                                                                         | 94/227 [06:48<07:48,  3.53s/it] 42%|███████████████████████████████████████████████████████████████▌                                                                                        | 95/227 [06:48<08:05,  3.68s/it]                                                                                                                                                                                              [2024-01-08 05:14:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.16.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 42%|███████████████████████████████████████████████████████████████▌                                                                                        | 95/227 [06:48<08:05,  3.68s/it]                                                                                                                                                                                              [2024-01-08 05:14:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.16.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 42%|███████████████████████████████████████████████████████████████▌                                                                                        | 95/227 [06:48<08:05,  3.68s/it]                                                                                                                                                                                              [2024-01-08 05:14:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.16.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 42%|███████████████████████████████████████████████████████████████▌                                                                                        | 95/227 [06:48<08:05,  3.68s/it]                                                                                                                                                                                              [2024-01-08 05:14:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.16.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 42%|███████████████████████████████████████████████████████████████▌                                                                                        | 95/227 [06:48<08:05,  3.68s/it] 43%|████████████████████████████████████████████████████████████████▉                                                                                       | 97/227 [06:48<05:29,  2.53s/it]                                                                                                                                                                                              [2024-01-08 05:14:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.16.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 43%|████████████████████████████████████████████████████████████████▉                                                                                       | 97/227 [06:48<05:29,  2.53s/it]                                                                                                                                                                                              [2024-01-08 05:14:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.16.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 43%|████████████████████████████████████████████████████████████████▉                                                                                       | 97/227 [06:48<05:29,  2.53s/it] 43%|█████████████████████████████████████████████████████████████████▌                                                                                      | 98/227 [06:48<04:27,  2.08s/it]                                                                                                                                                                                              [2024-01-08 05:14:47] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.16.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 43%|█████████████████████████████████████████████████████████████████▌                                                                                      | 98/227 [06:48<04:27,  2.08s/it]                                                                                                                                                                                              [2024-01-08 05:14:47] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.16.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 43%|█████████████████████████████████████████████████████████████████▌                                                                                      | 98/227 [06:48<04:27,  2.08s/it]                                                                                                                                                                                              [2024-01-08 05:15:02] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.17.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 43%|█████████████████████████████████████████████████████████████████▌                                                                                      | 98/227 [07:02<04:27,  2.08s/it]                                                                                                                                                                                              [2024-01-08 05:15:02] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.17.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 43%|█████████████████████████████████████████████████████████████████▌                                                                                      | 98/227 [07:03<04:27,  2.08s/it] 44%|███████████████████████████████████████████████████████████████████▏                                                                                   | 101/227 [07:03<06:59,  3.33s/it]                                                                                                                                                                                              [2024-01-08 05:15:08] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.17.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 44%|███████████████████████████████████████████████████████████████████▏                                                                                   | 101/227 [07:09<06:59,  3.33s/it]                                                                                                                                                                                              [2024-01-08 05:15:08] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.17.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 44%|███████████████████████████████████████████████████████████████████▏                                                                                   | 101/227 [07:09<06:59,  3.33s/it] 45%|███████████████████████████████████████████████████████████████████▊                                                                                   | 102/227 [07:09<07:56,  3.81s/it]                                                                                                                                                                                              [2024-01-08 05:15:08] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.17.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 45%|███████████████████████████████████████████████████████████████████▊                                                                                   | 102/227 [07:09<07:56,  3.81s/it]                                                                                                                                                                                              [2024-01-08 05:15:08] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.17.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 45%|███████████████████████████████████████████████████████████████████▊                                                                                   | 102/227 [07:09<07:56,  3.81s/it]                                                                                                                                                                                              [2024-01-08 05:15:08] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.17.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 45%|███████████████████████████████████████████████████████████████████▊                                                                                   | 102/227 [07:09<07:56,  3.81s/it]                                                                                                                                                                                              [2024-01-08 05:15:08] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.17.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 45%|███████████████████████████████████████████████████████████████████▊                                                                                   | 102/227 [07:09<07:56,  3.81s/it]                                                                                                                                                                                              [2024-01-08 05:15:08] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.17.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 45%|███████████████████████████████████████████████████████████████████▊                                                                                   | 102/227 [07:09<07:56,  3.81s/it]                                                                                                                                                                                              [2024-01-08 05:15:08] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.17.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 45%|███████████████████████████████████████████████████████████████████▊                                                                                   | 102/227 [07:09<07:56,  3.81s/it] 47%|██████████████████████████████████████████████████████████████████████▌                                                                                | 106/227 [07:09<03:53,  1.93s/it]                                                                                                                                                                                              [2024-01-08 05:15:08] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.17.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 47%|██████████████████████████████████████████████████████████████████████▌                                                                                | 106/227 [07:09<03:53,  1.93s/it]                                                                                                                                                                                              [2024-01-08 05:15:08] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.17.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 47%|██████████████████████████████████████████████████████████████████████▌                                                                                | 106/227 [07:09<03:53,  1.93s/it]                                                                                                                                                                                              [2024-01-08 05:15:08] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00010-of-00019.safetensors
+ 47%|██████████████████████████████████████████████████████████████████████▌                                                                                | 106/227 [07:09<03:53,  1.93s/it]                                                                                                                                                                                              [2024-01-08 05:15:09] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00012-of-00019.safetensors
+ 47%|██████████████████████████████████████████████████████████████████████▌                                                                                | 106/227 [07:10<03:53,  1.93s/it]                                                                                                                                                                                              [2024-01-08 05:15:22] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.18.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 47%|██████████████████████████████████████████████████████████████████████▌                                                                                | 106/227 [07:23<03:53,  1.93s/it]                                                                                                                                                                                              [2024-01-08 05:15:22] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.18.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 47%|██████████████████████████████████████████████████████████████████████▌                                                                                | 106/227 [07:23<03:53,  1.93s/it] 48%|███████���███████████████████████████████████████████████████████████████▊                                                                               | 108/227 [07:23<06:35,  3.32s/it]                                                                                                                                                                                              [2024-01-08 05:15:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.18.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 48%|███████████████████████████████████████████████████████████████████████▊                                                                               | 108/227 [07:29<06:35,  3.32s/it]                                                                                                                                                                                              [2024-01-08 05:15:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.18.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 48%|███████████████████████████████████████████████████████████████████████▊                                                                               | 108/227 [07:29<06:35,  3.32s/it] 48%|████████████████████████████████████████████████████████████████████████▌                                                                              | 109/227 [07:29<07:19,  3.73s/it]                                                                                                                                                                                              [2024-01-08 05:15:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.18.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 48%|████████████████████████████████████████████████████████████████████████▌                                                                              | 109/227 [07:29<07:19,  3.73s/it]                                                                                                                                                                                              [2024-01-08 05:15:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.18.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 48%|████████████████████████████████████████████████████████████████████████▌                                                                              | 109/227 [07:29<07:19,  3.73s/it]                                                                                                                                                                                              [2024-01-08 05:15:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.18.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 48%|████████████████████████████████████████████████████████████████████████▌                                                                              | 109/227 [07:30<07:19,  3.73s/it]                                                                                                                                                                                              [2024-01-08 05:15:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.18.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 48%|████████████████████████████████████████████████████████████████████████▌                                                                              | 109/227 [07:30<07:19,  3.73s/it] 49%|█████████████████████████████████████████████████████████████████████████▊                                                                             | 111/227 [07:30<05:05,  2.63s/it]                                                                                                                                                                                              [2024-01-08 05:15:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.18.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 49%|█████████████████████████████████████████████████████████████████████████▊                                                                             | 111/227 [07:30<05:05,  2.63s/it]                                                                                                                                                                                              [2024-01-08 05:15:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.18.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 49%|█████████████████████████████████████████████████████████████████████████▊                                                                             | 111/227 [07:30<05:05,  2.63s/it]                                                                                                                                                                                              [2024-01-08 05:15:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.18.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 49%|█████████████████████████████████████████████████████████████████████████▊                                                                             | 111/227 [07:30<05:05,  2.63s/it]                                                                                                                                                                                              [2024-01-08 05:15:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.18.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 49%|█████████████████████████████████████████████████████████████████████████▊                                                                             | 111/227 [07:30<05:05,  2.63s/it]                                                                                                                                                                                              [2024-01-08 05:15:41] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.19.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 49%|█████████████████████████████████████████████████████████████████████████▊                                                                             | 111/227 [07:42<05:05,  2.63s/it]                                                                                                                                                                                              [2024-01-08 05:15:42] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.19.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 49%|█████████████████████████████████████████████████████████████████████████▊                                                                             | 111/227 [07:43<05:05,  2.63s/it] 51%|████████████████████████████████████████████████████████████████████████████▍                                                                          | 115/227 [07:43<05:26,  2.91s/it]                                                                                                                                                                                              [2024-01-08 05:15:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.19.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 51%|████████████████████████████████████████████████████████████████████████████▍                                                                          | 115/227 [07:48<05:26,  2.91s/it]                                                                                                                                                                                              [2024-01-08 05:15:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.19.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 51%|████████████████████████████████████████████████████████████████████████████▍                                                                          | 115/227 [07:48<05:26,  2.91s/it] 51%|█████████████████████████████████████████████████████████████████████████████▏                                                                         | 116/227 [07:48<05:57,  3.22s/it]                                                                                                                                                                                              [2024-01-08 05:15:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.19.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 51%|█████████████████████████████████████████████████████████████████████████████▏                                                                         | 116/227 [07:48<05:57,  3.22s/it]                                                                                                                                                                                              [2024-01-08 05:15:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.19.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 51%|█████████████████████████████████████████████████████████████████████████████▏                                                                         | 116/227 [07:48<05:57,  3.22s/it]                                                                                                                                                                                              [2024-01-08 05:15:47] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.19.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 51%|█████████████████████████████████████████████████████████████████████████████▏                                                                         | 116/227 [07:48<05:57,  3.22s/it]                                                                                                                                                                                              [2024-01-08 05:15:47] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.19.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 51%|█████████████████████████████████████████████████████████████████████████████▏                                                                         | 116/227 [07:48<05:57,  3.22s/it]                                                                                                                                                                                              [2024-01-08 05:15:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.19.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 51%|█████████████████████████████████████████████████████████████████████████████▏                                                                         | 116/227 [07:48<05:57,  3.22s/it]                                                                                                                                                                                              [2024-01-08 05:15:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.19.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 51%|█████████████████████████████████████████████████████████████████████████████▏                                                                         | 116/227 [07:48<05:57,  3.22s/it] 53%|███████████████████████████████████████████████████████████████████████████████▊                                                                       | 120/227 [07:48<03:12,  1.80s/it]                                                                                                                                                                                              [2024-01-08 05:15:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.19.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 53%|███████████████████████████████████████████████████████████████████████████████▊                                                                       | 120/227 [07:48<03:12,  1.80s/it]                                                                                                                                                                                              [2024-01-08 05:15:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.19.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 53%|███████████████████████████████████████████████████████████████████████████████▊                                                                       | 120/227 [07:48<03:12,  1.80s/it] 53%|████████████████████████████████████████████████████████████████████████████████▍                                                                      | 121/227 [07:48<02:45,  1.56s/it]                                                                                                                                                                                              [2024-01-08 05:15:47] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00011-of-00019.safetensors
+ 53%|████████████████████████████████████████████████████████████████████████████████▍                                                                      | 121/227 [07:48<02:45,  1.56s/it]                                                                                                                                                                                              [2024-01-08 05:15:48] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00013-of-00019.safetensors
+ 53%|████████████████████████████████████████████████████████████████████████████████▍                                                                      | 121/227 [07:49<02:45,  1.56s/it]                                                                                                                                                                                              [2024-01-08 05:15:59] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.20.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 53%|████████████████████████████████████████████████████████████████████████████████▍                                                                      | 121/227 [08:00<02:45,  1.56s/it]                                                                                                                                                                                              [2024-01-08 05:15:59] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.20.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 53%|████████████████████████████████████████████████████████████████████████████████▍                                                                      | 121/227 [08:00<02:45,  1.56s/it] 54%|█████████████████████████████████████████████████████████████████████████████████▏                                                                     | 122/227 [08:00<05:50,  3.34s/it]                                                                                                                                                                                              [2024-01-08 05:16:04] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.20.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 54%|█████████████████████████████████████████████████████████████████████████████████▏                                                                     | 122/227 [08:05<05:50,  3.34s/it]                                                                                                                                                                                              [2024-01-08 05:16:05] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.20.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 54%|█████████████████████████████████████████████████████████████████████████████████▏                                                                     | 122/227 [08:06<05:50,  3.34s/it] 54%|█████████████████████████████████████████████████████████████████████████████████▊                                                                     | 123/227 [08:06<06:23,  3.69s/it]                                                                                                                                                                                              [2024-01-08 05:16:05] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.20.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 54%|█████████████████████████████████████████████████████████████████████████████████▊                                                                     | 123/227 [08:06<06:23,  3.69s/it]                                                                                                                                                                                              [2024-01-08 05:16:05] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.20.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 54%|█████████████████████████████████████████████████████████████████████████████████▊                                                                     | 123/227 [08:06<06:23,  3.69s/it]                                                                                                                                                                                              [2024-01-08 05:16:05] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.20.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 54%|█████████████████████████████████████████████████████████████████████████████████▊                                                                     | 123/227 [08:06<06:23,  3.69s/it]                                                                                                                                                                                              [2024-01-08 05:16:05] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.20.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 54%|█████████████████████████████████████████████████████████████████████████████████▊                                                                     | 123/227 [08:06<06:23,  3.69s/it] 55%|███████████████████████████████████████████████████████████████████████████████████▏                                                                   | 125/227 [08:06<04:08,  2.43s/it]                                                                                                                                                                                              [2024-01-08 05:16:05] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.20.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 55%|███████████████████████████████████████████████████████████████████████████████████▏                                                                   | 125/227 [08:06<04:08,  2.43s/it]                                                                                                                                                                                              [2024-01-08 05:16:05] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.20.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 55%|███████████████████████████████████████████████████████████████████████████████████▏                                                                   | 125/227 [08:06<04:08,  2.43s/it] 56%|███████████████████████████████████████████████████████████████████████████████████▊                                                                   | 126/227 [08:06<03:18,  1.97s/it]                                                                                                                                                                                              [2024-01-08 05:16:05] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.20.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 56%|███████████████████████████████████████████████████████████████████████████████████▊                                                                   | 126/227 [08:06<03:18,  1.97s/it]                                                                                                                                                                                              [2024-01-08 05:16:05] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.20.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 56%|███████████████████████████████████████████████████████████████████████████████████▊                                                                   | 126/227 [08:06<03:18,  1.97s/it]                                                                                                                                                                                              [2024-01-08 05:16:21] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.21.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 56%|███████████████████████████████████████████████████████████████████████████████████▊                                                                   | 126/227 [08:22<03:18,  1.97s/it]                                                                                                                                                                                              [2024-01-08 05:16:21] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.21.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 56%|███████████████████████████████████████████████████████████████████████████████████▊                                                                   | 126/227 [08:22<03:18,  1.97s/it] 57%|█████████████████████████████████████████████████████████████████████████��███████████▊                                                                 | 129/227 [08:22<05:44,  3.51s/it]                                                                                                                                                                                              [2024-01-08 05:16:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.21.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 57%|█████████████████████████████████████████████████████████████████████████████████████▊                                                                 | 129/227 [08:27<05:44,  3.51s/it]                                                                                                                                                                                              [2024-01-08 05:16:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.21.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 57%|█████████████████████████████████████████████████████████████████████████████████████▊                                                                 | 129/227 [08:28<05:44,  3.51s/it] 57%|██████████████████████████████████████████████████████████████████████████████████████▍                                                                | 130/227 [08:28<06:19,  3.92s/it]                                                                                                                                                                                              [2024-01-08 05:16:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.21.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 57%|██████████████████████████████████████████████████████████████████████████████████████▍                                                                | 130/227 [08:28<06:19,  3.92s/it]                                                                                                                                                                                              [2024-01-08 05:16:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.21.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 57%|██████████████████████████████████████████████████████████████████████████████████████▍                                                                | 130/227 [08:28<06:19,  3.92s/it]                                                                                                                                                                                              [2024-01-08 05:16:27] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.21.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 57%|██████████████████████████████████████████████████████████████████████████████████████▍                                                                | 130/227 [08:28<06:19,  3.92s/it]                                                                                                                                                                                              [2024-01-08 05:16:27] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.21.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 57%|██████████████████████████████████████████████████████████████████████████████████████▍                                                                | 130/227 [08:28<06:19,  3.92s/it]                                                                                                                                                                                              [2024-01-08 05:16:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.21.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 57%|██████████████████████████████████████████████████████████████████████████████████████▍                                                                | 130/227 [08:28<06:19,  3.92s/it]                                                                                                                                                                                              [2024-01-08 05:16:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.21.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 57%|██████████████████████████████████████████████████████████████████████████████████████▍                                                                | 130/227 [08:28<06:19,  3.92s/it] 59%|█████████████████████████████████████████████████████████████████████████████████████████▏                                                             | 134/227 [08:28<03:02,  1.96s/it]                                                                                                                                                                                              [2024-01-08 05:16:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.21.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 59%|█████████████████████████████████████████████████████████████████████████████████████████▏                                                             | 134/227 [08:28<03:02,  1.96s/it]                                                                                                                                                                                              [2024-01-08 05:16:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.21.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 59%|█████████████████████████████████████████████████████████████████████████████████████████▏                                                             | 134/227 [08:28<03:02,  1.96s/it]                                                                                                                                                                                              [2024-01-08 05:16:27] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00012-of-00019.safetensors
+ 59%|█████████████████████████████████████████████████████████████████████████████████████████▏                                                             | 134/227 [08:28<03:02,  1.96s/it]                                                                                                                                                                                              [2024-01-08 05:16:28] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00014-of-00019.safetensors
+ 59%|█████████████████████████████████████████████████████████████████████████████████████████▏                                                             | 134/227 [08:28<03:02,  1.96s/it]                                                                                                                                                                                              [2024-01-08 05:16:40] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.22.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 59%|█████████████████████████████████████████████████████████████████████████████████████████▏                                                             | 134/227 [08:41<03:02,  1.96s/it]                                                                                                                                                                                              [2024-01-08 05:16:40] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.22.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 59%|█████████████████████████████████████████████████████████████████████████████████████████▏                                                             | 134/227 [08:41<03:02,  1.96s/it] 60%|██████████████████████████████████████████████████████████████████████████████████████████▍                                                            | 136/227 [08:41<04:55,  3.25s/it]                                                                                                                                                                                              [2024-01-08 05:16:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.22.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 60%|██████████████████████████████████████████████████████████████████████████████████████████▍                                                            | 136/227 [08:47<04:55,  3.25s/it]                                                                                                                                                                                              [2024-01-08 05:16:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.22.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 60%|██████████████████████████████████████████████████████████████████████████████████████████▍                                                            | 136/227 [08:47<04:55,  3.25s/it] 60%|███████████████████████████████████████████████████████████████████████████████████████████▏                                                           | 137/227 [08:47<05:21,  3.57s/it]                                                                                                                                                                                              [2024-01-08 05:16:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.22.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 60%|███████████████████████████████████████████████████████████████████████████████████████████▏                                                           | 137/227 [08:47<05:21,  3.57s/it]                                                                                                                                                                                              [2024-01-08 05:16:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.22.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 60%|███████████████████████████████████████████████████████████████████████████████████████████▏                                                           | 137/227 [08:47<05:21,  3.57s/it]                                                                                                                                                                                              [2024-01-08 05:16:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.22.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 60%|███████████████████████████████████████████████████████████████████████████████████████████▏                                                           | 137/227 [08:47<05:21,  3.57s/it]                                                                                                                                                                                              [2024-01-08 05:16:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.22.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 60%|███████████████████████████████████████████████████████████████████████████████████████████▏                                                           | 137/227 [08:47<05:21,  3.57s/it] 61%|████████████████████████████████████████████████████████████████████████████████████████████▍                                                          | 139/227 [08:47<03:42,  2.53s/it]                                                                                                                                                                                              [2024-01-08 05:16:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.22.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 61%|████████████████████████████████████████████████████████████████████████████████████████████▍                                                          | 139/227 [08:47<03:42,  2.53s/it]                                                                                                                                                                                              [2024-01-08 05:16:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.22.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 61%|████████████████████████████████████████████████████████████████████████████████████████████▍                                                          | 139/227 [08:47<03:42,  2.53s/it] 62%|█████████████████████████████████████████████████████████████████████████████████████████████▏                                                         | 140/227 [08:47<03:04,  2.12s/it]                                                                                                                                                                                              [2024-01-08 05:16:47] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.22.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 62%|█████████████████████████████████████████████████████████████████████████████████████████████▏                                                         | 140/227 [08:47<03:04,  2.12s/it]                                                                                                                                                                                              [2024-01-08 05:16:47] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.22.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 62%|█████████████████████████████████████████████████████████████████████████████████████████████▏                                                         | 140/227 [08:47<03:04,  2.12s/it]                                                                                                                                                                                              [2024-01-08 05:16:47] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00013-of-00019.safetensors
+ 62%|█████████████████████████████████████████████████████████████████████████████████████████████▏                                                         | 140/227 [08:47<03:04,  2.12s/it]                                                                                                                                                                                              [2024-01-08 05:16:47] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00015-of-00019.safetensors
+ 62%|█████████████████████████████████████████████████████████████████████████████████████████████▏                                                         | 140/227 [08:48<03:04,  2.12s/it]                                                                                                                                                                                              [2024-01-08 05:17:01] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.23.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 62%|█████████████████████████████████████████████████████████████████████████████████████████████▏                                                         | 140/227 [09:02<03:04,  2.12s/it]                                                                                                                                                                                              [2024-01-08 05:17:02] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.23.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 62%|█████████████████████████████████████████████████████████████████████████████████████████████▏                                                         | 140/227 [09:03<03:04,  2.12s/it] 63%|███████████████████████████████████████████████████████████████████████████████████████████████                                                        | 143/227 [09:03<04:45,  3.40s/it]                                                                                                                                                                                              [2024-01-08 05:17:07] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.23.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 63%|███████████████████████████████████████████████████████████████████████████████████████████████                                                        | 143/227 [09:08<04:45,  3.40s/it]                                                                                                                                                                                              [2024-01-08 05:17:07] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.23.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 63%|███████████████████████████████████████████████████████████████████████████████████████████████                                                        | 143/227 [09:08<04:45,  3.40s/it] 63%|███████████████████████████████████████████████████████████████████████████████████████████████▊                                                       | 144/227 [09:08<05:13,  3.77s/it]                                                                                                                                                                                              [2024-01-08 05:17:07] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.23.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 63%|███████████████████████████████████████████████████████████████████████████████████████████████▊                                                       | 144/227 [09:08<05:13,  3.77s/it]                                                                                                                                                                                              [2024-01-08 05:17:07] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.23.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 63%|███████████████████████████████████████████████████████████████████████████████████████████████▊                                                       | 144/227 [09:08<05:13,  3.77s/it]                                                                                                                                                                                              [2024-01-08 05:17:07] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.23.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 63%|███████████████████████████████████████████████████████████████████████████████████████████████▊                                                       | 144/227 [09:08<05:13,  3.77s/it]                                                                                                                                                                                              [2024-01-08 05:17:07] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.23.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 63%|███████████████████████████████████████████████████████████████████████████████████████████████▊                                                       | 144/227 [09:08<05:13,  3.77s/it] 64%|█████████████████████████████████████████████████████████████████████████████████████████████████                                                      | 146/227 [09:08<03:28,  2.57s/it]                                                                                                                                                                                              [2024-01-08 05:17:08] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.23.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 64%|█████████████████████████████████████████████████████████████████████████████████████████████████                                                      | 146/227 [09:09<03:28,  2.57s/it]                                                                                                                                                                                              [2024-01-08 05:17:08] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.23.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 64%|█████████████████████████████████████████████████████████████████████████████████████████████████                                                      | 146/227 [09:09<03:28,  2.57s/it] 65%|█████████████████████████████████████████████████████████████████████████████████████████████████▊                                                     | 147/227 [09:09<02:49,  2.12s/it]                                                                                                                                                                                              [2024-01-08 05:17:08] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.23.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 65%|█████████████████████████████████████████████████████████████████████████████████████████████████▊                                                     | 147/227 [09:09<02:49,  2.12s/it]                                                                                                                                                                                              [2024-01-08 05:17:08] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.23.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 65%|█████████████████████████████████████████████████████████████████████████████████████████████████▊                                                     | 147/227 [09:09<02:49,  2.12s/it]                                                                                                                                                                                              [2024-01-08 05:17:19] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.24.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 65%|█████████████████████████████████████████████████████████████████████████████████████████████████▊                                                     | 147/227 [09:20<02:49,  2.12s/it]                                                                                                                                                                                              [2024-01-08 05:17:19] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.24.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 65%|█████████████████████████████████████████████████████████████████████████████████████████████████▊                                                     | 147/227 [09:20<02:49,  2.12s/it] 66%|███████████████████████████████████████████████████████████████████████████████████████████████████▊                                                   | 150/227 [09:20<03:41,  2.88s/it]                                                                                                                                                                                              [2024-01-08 05:17:25] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.24.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 66%|███████████████████████████████████████████████████████████████████████████████████████████████████▊                                                   | 150/227 [09:26<03:41,  2.88s/it]                                                                                                                                                                                              [2024-01-08 05:17:25] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.24.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 66%|███████████████████████████████████████████████████████████████████████████████████████████████████▊                                                   | 150/227 [09:26<03:41,  2.88s/it] 67%|████████████████████████████████████████████████████████████████████████████████████████████████████▍                                                  | 151/227 [09:26<04:19,  3.42s/it]                                                                                                                                                                                              [2024-01-08 05:17:25] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.24.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 67%|████████████████████████████████████���███████████████████████████████████████████████████████████████▍                                                  | 151/227 [09:26<04:19,  3.42s/it]                                                                                                                                                                                              [2024-01-08 05:17:25] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.24.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 67%|████████████████████████████████████████████████████████████████████████████████████████████████████▍                                                  | 151/227 [09:26<04:19,  3.42s/it]                                                                                                                                                                                              [2024-01-08 05:17:25] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.24.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 67%|████████████████████████████████████████████████████████████████████████████████████████████████████▍                                                  | 151/227 [09:26<04:19,  3.42s/it]                                                                                                                                                                                              [2024-01-08 05:17:25] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.24.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 67%|████████████████████████████████████████████████████████████████████████████████████████████████████▍                                                  | 151/227 [09:26<04:19,  3.42s/it]                                                                                                                                                                                              [2024-01-08 05:17:25] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.24.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 67%|████████████████████████████████████████████████████████████████████████████████████████████████████▍                                                  | 151/227 [09:26<04:19,  3.42s/it]                                                                                                                                                                                              [2024-01-08 05:17:25] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.24.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 67%|████████████████████████████████████████████████████████████████████████████████████████████████████▍                                                  | 151/227 [09:26<04:19,  3.42s/it] 68%|███████████████████████████████████████████████████████████████████████████████████████████████████████                                                | 155/227 [09:26<02:05,  1.74s/it]                                                                                                                                                                                              [2024-01-08 05:17:25] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.24.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 68%|███████████████████████████████████████████████████████████████████████████████████████████████████████                                                | 155/227 [09:26<02:05,  1.74s/it]                                                                                                                                                                                              [2024-01-08 05:17:25] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.24.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 68%|███████████████████████████████████████████████████████████████████████████████████████████████████████                                                | 155/227 [09:26<02:05,  1.74s/it]                                                                                                                                                                                              [2024-01-08 05:17:25] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00014-of-00019.safetensors
+ 68%|███████████████████████████████████████████████████████████████████████████████████████████████████████                                                | 155/227 [09:26<02:05,  1.74s/it]                                                                                                                                                                                              [2024-01-08 05:17:26] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00016-of-00019.safetensors
+ 68%|███████████████████████████████████████████████████████████████████████████████████████████████████████                                                | 155/227 [09:27<02:05,  1.74s/it]                                                                                                                                                                                              [2024-01-08 05:17:39] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.25.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 68%|███████████████████████████████████████████████████████████████████████████████████████████████████████                                                | 155/227 [09:40<02:05,  1.74s/it]                                                                                                                                                                                              [2024-01-08 05:17:39] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.25.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 68%|███████████████████████████████████████████████████████████████████████████████████████████████████████                                                | 155/227 [09:40<02:05,  1.74s/it] 69%|████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                              | 157/227 [09:40<03:38,  3.12s/it]                                                                                                                                                                                              [2024-01-08 05:17:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.25.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 69%|████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                              | 157/227 [09:46<03:38,  3.12s/it]                                                                                                                                                                                              [2024-01-08 05:17:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.25.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 69%|████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                              | 157/227 [09:46<03:38,  3.12s/it] 70%|█████████████████████████████████████████████████████████████████████████████████████████████████████████                                              | 158/227 [09:46<04:00,  3.49s/it]                                                                                                                                                                                              [2024-01-08 05:17:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.25.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 70%|█████████████████████████████████████████████████████████████████████████████████████████████████████████                                              | 158/227 [09:46<04:00,  3.49s/it]                                                                                                                                                                                              [2024-01-08 05:17:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.25.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 70%|█████████████████████████████████████████████████████████████████████████████████████████████████████████                                              | 158/227 [09:46<04:00,  3.49s/it]                                                                                                                                                                                              [2024-01-08 05:17:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.25.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 70%|█████████████████████████████████████████████████████████████████████████████████████████████████████████                                              | 158/227 [09:46<04:00,  3.49s/it]                                                                                                                                                                                              [2024-01-08 05:17:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.25.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 70%|█████████████████████████████████████████████████████████████████████████████████████████████████████████                                              | 158/227 [09:46<04:00,  3.49s/it] 70%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                            | 160/227 [09:46<02:43,  2.44s/it]                                                                                                                                                                                              [2024-01-08 05:17:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.25.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 70%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                            | 160/227 [09:46<02:43,  2.44s/it]                                                                                                                                                                                              [2024-01-08 05:17:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.25.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 70%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                            | 160/227 [09:46<02:43,  2.44s/it]                                                                                                                                                                                              [2024-01-08 05:17:45] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.25.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 70%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                            | 160/227 [09:46<02:43,  2.44s/it]                                                                                                                                                                                              [2024-01-08 05:17:45] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.25.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 70%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                            | 160/227 [09:46<02:43,  2.44s/it]                                                                                                                                                                                              [2024-01-08 05:17:58] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.26.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 70%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                            | 160/227 [09:59<02:43,  2.44s/it]                                                                                                                                                                                              [2024-01-08 05:17:58] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.26.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 70%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                            | 160/227 [09:59<02:43,  2.44s/it] 72%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████                                          | 164/227 [09:59<02:59,  2.84s/it]                                                                                                                                                                                              [2024-01-08 05:18:03] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.26.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 72%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████                                          | 164/227 [10:04<02:59,  2.84s/it]                                                                                                                                                                                              [2024-01-08 05:18:03] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.26.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 72%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████                                          | 164/227 [10:04<02:59,  2.84s/it] 73%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                                         | 165/227 [10:04<03:17,  3.19s/it]                                                                                                                                                                                              [2024-01-08 05:18:03] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.26.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 73%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                                         | 165/227 [10:04<03:17,  3.19s/it]                                                                                                                                                                                              [2024-01-08 05:18:03] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.26.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 73%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                                         | 165/227 [10:04<03:17,  3.19s/it]                                                                                                                                                                                              [2024-01-08 05:18:03] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.26.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 73%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                                         | 165/227 [10:04<03:17,  3.19s/it]                                                                                                                                                                                              [2024-01-08 05:18:03] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.26.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 73%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                                         | 165/227 [10:04<03:17,  3.19s/it]                                                                                                                                                                                              [2024-01-08 05:18:04] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.26.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 73%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                                         | 165/227 [10:05<03:17,  3.19s/it]                                                                                                                                                                                              [2024-01-08 05:18:04] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.26.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 73%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                                         | 165/227 [10:05<03:17,  3.19s/it] 74%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                      | 169/227 [10:05<01:44,  1.80s/it]                                                                                                                                                                                              [2024-01-08 05:18:04] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.26.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 74%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                      | 169/227 [10:05<01:44,  1.80s/it]                                                                                                                                                                                              [2024-01-08 05:18:04] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.26.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 74%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                      | 169/227 [10:05<01:44,  1.80s/it]                                                                                                                                                                                              [2024-01-08 05:18:04] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00015-of-00019.safetensors
+ 74%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                      | 169/227 [10:05<01:44,  1.80s/it]                                                                                                                                                                                              [2024-01-08 05:18:04] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00017-of-00019.safetensors
+ 74%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                      | 169/227 [10:05<01:44,  1.80s/it]                                                                                                                                                                                              [2024-01-08 05:18:13] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.27.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 74%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                      | 169/227 [10:14<01:44,  1.80s/it]                                                                                                                                                                                              [2024-01-08 05:18:13] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.27.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 74%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                      | 169/227 [10:14<01:44,  1.80s/it] 75%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                     | 171/227 [10:14<02:20,  2.50s/it]                                                                                                                                                                                              [2024-01-08 05:18:16] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.27.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 75%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                     | 171/227 [10:17<02:20,  2.50s/it]                                                                                                                                                                                              [2024-01-08 05:18:16] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.27.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 75%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                     | 171/227 [10:17<02:20,  2.50s/it] 76%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                    | 172/227 [10:17<02:23,  2.61s/it]                                                                                                                                                                                              [2024-01-08 05:18:16] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.27.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 76%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                    | 172/227 [10:17<02:23,  2.61s/it]                                                                                                                                                                                              [2024-01-08 05:18:16] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.27.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 76%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                    | 172/227 [10:17<02:23,  2.61s/it]                                                                                                                                                                                              [2024-01-08 05:18:17] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.27.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 76%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                    | 172/227 [10:18<02:23,  2.61s/it]                                                                                                                                                                                              [2024-01-08 05:18:17] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.27.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 76%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                    | 172/227 [10:18<02:23,  2.61s/it] 77%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                   | 174/227 [10:18<01:38,  1.86s/it]                                                                                                                                                                                              [2024-01-08 05:18:17] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.27.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 77%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                   | 174/227 [10:18<01:38,  1.86s/it]                                                                                                                                                                                              [2024-01-08 05:18:17] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.27.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 77%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                   | 174/227 [10:18<01:38,  1.86s/it]                                                                                                                                                                                              [2024-01-08 05:18:17] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.27.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 77%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                   | 174/227 [10:18<01:38,  1.86s/it]                                                                                                                                                                                              [2024-01-08 05:18:17] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.27.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 77%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                   | 174/227 [10:18<01:38,  1.86s/it]                                                                                                                                                                                              [2024-01-08 05:18:17] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00016-of-00019.safetensors
+ 77%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                   | 174/227 [10:18<01:38,  1.86s/it]                                                                                                                                                                                              [2024-01-08 05:18:17] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00018-of-00019.safetensors
+ 77%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                   | 174/227 [10:18<01:38,  1.86s/it]                                                                                                                                                                                              [2024-01-08 05:18:26] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.28.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 77%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                   | 174/227 [10:26<01:38,  1.86s/it]                                                                                                                                                                                              [2024-01-08 05:18:26] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.28.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 77%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                   | 174/227 [10:27<01:38,  1.86s/it] 78%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                | 178/227 [10:27<01:40,  2.06s/it]                                                                                                                                                                                              [2024-01-08 05:18:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.28.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 78%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                | 178/227 [10:30<01:40,  2.06s/it]                                                                                                                                                                                              [2024-01-08 05:18:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.28.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 78%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                | 178/227 [10:30<01:40,  2.06s/it] 79%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                | 179/227 [10:30<01:46,  2.22s/it]                                                                                                                                                                                              [2024-01-08 05:18:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.28.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 79%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                | 179/227 [10:30<01:46,  2.22s/it]                                                                                                                                                                                              [2024-01-08 05:18:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.28.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 79%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                | 179/227 [10:30<01:46,  2.22s/it]                                                                                                                                                                                              [2024-01-08 05:18:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.28.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 79%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                | 179/227 [10:30<01:46,  2.22s/it]                                                                                                                                                                                              [2024-01-08 05:18:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.28.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 79%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                | 179/227 [10:30<01:46,  2.22s/it] 80%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                              | 181/227 [10:30<01:13,  1.61s/it]                                                                                                                                                                                              [2024-01-08 05:18:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.28.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 80%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                              | 181/227 [10:30<01:13,  1.61s/it]                                                                                                                                                                                              [2024-01-08 05:18:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.28.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 80%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                              | 181/227 [10:30<01:13,  1.61s/it]                                                                                                                                                                                              [2024-01-08 05:18:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.28.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 80%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                              | 181/227 [10:30<01:13,  1.61s/it]                                                                                                                                                                                              [2024-01-08 05:18:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.28.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 80%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                              | 181/227 [10:30<01:13,  1.61s/it]                                                                                                                                                                                              [2024-01-08 05:18:40] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.29.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 80%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                              | 181/227 [10:41<01:13,  1.61s/it]                                                                                                                                                                                              [2024-01-08 05:18:41] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.29.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 80%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                              | 181/227 [10:42<01:13,  1.61s/it] 81%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                            | 185/227 [10:42<01:31,  2.19s/it]                                                                                                                                                                                              [2024-01-08 05:18:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.29.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 81%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                            | 185/227 [10:47<01:31,  2.19s/it]                                                                                                                                                                                              [2024-01-08 05:18:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.29.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 81%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                            | 185/227 [10:47<01:31,  2.19s/it] 82%|█████████████████████████████████████████████████████████████████████████████████��█████████████████████████████████████████▋                           | 186/227 [10:47<01:45,  2.57s/it]                                                                                                                                                                                              [2024-01-08 05:18:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.29.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 82%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                           | 186/227 [10:47<01:45,  2.57s/it]                                                                                                                                                                                              [2024-01-08 05:18:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.29.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 82%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                           | 186/227 [10:47<01:45,  2.57s/it]                                                                                                                                                                                              [2024-01-08 05:18:46] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.29.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 82%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                           | 186/227 [10:47<01:45,  2.57s/it]                                                                                                                                                                                              [2024-01-08 05:18:46] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.29.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 82%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                           | 186/227 [10:47<01:45,  2.57s/it]                                                                                                                                                                                              [2024-01-08 05:18:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.29.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 82%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                           | 186/227 [10:47<01:45,  2.57s/it]                                                                                                                                                                                              [2024-01-08 05:18:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.29.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 82%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                           | 186/227 [10:47<01:45,  2.57s/it] 84%|██████████████████████████████████████████████████████���███████████████████████████████████████████████████████████████████████▍                        | 190/227 [10:47<00:54,  1.47s/it]                                                                                                                                                                                              [2024-01-08 05:18:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.29.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 84%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                        | 190/227 [10:47<00:54,  1.47s/it]                                                                                                                                                                                              [2024-01-08 05:18:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.29.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 84%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                        | 190/227 [10:47<00:54,  1.47s/it] 84%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                        | 191/227 [10:47<00:46,  1.30s/it]                                                                                                                                                                                              [2024-01-08 05:18:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.30.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 84%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                        | 191/227 [10:47<00:46,  1.30s/it]                                                                                                                                                                                              [2024-01-08 05:18:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.30.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 84%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                        | 191/227 [10:47<00:46,  1.30s/it]                                                                                                                                                                                              [2024-01-08 05:18:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.30.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 84%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                        | 191/227 [10:47<00:46,  1.30s/it]                                                                                                                                                                                              [2024-01-08 05:18:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.30.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 84%|██████████████��████████████████████████████████████████████████████████████████████████████████████████████████████████████████                        | 191/227 [10:47<00:46,  1.30s/it] 85%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                      | 193/227 [10:47<00:33,  1.03it/s]                                                                                                                                                                                              [2024-01-08 05:18:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.30.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 85%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                      | 193/227 [10:47<00:33,  1.03it/s]                                                                                                                                                                                              [2024-01-08 05:18:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.30.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 85%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                      | 193/227 [10:47<00:33,  1.03it/s]                                                                                                                                                                                              [2024-01-08 05:18:47] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00018-of-00019.safetensors
+ 85%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                      | 193/227 [10:48<00:33,  1.03it/s]                                                                                                                                                                                              [2024-01-08 05:18:47] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00017-of-00019.safetensors
+ 85%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                      | 193/227 [10:48<00:33,  1.03it/s]                                                                                                                                                                                              [2024-01-08 05:18:48] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00003-of-00019.safetensors
+ 85%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                      | 193/227 [10:49<00:33,  1.03it/s]                                                                                                                                                                                              [2024-01-08 05:18:48] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.3.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 85%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                      | 193/227 [10:49<00:33,  1.03it/s] 86%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                     | 195/227 [10:49<00:30,  1.05it/s]                                                                                                                                                                                              [2024-01-08 05:18:48] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.3.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 86%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                     | 195/227 [10:49<00:30,  1.05it/s]                                                                                                                                                                                              [2024-01-08 05:19:04] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.4.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 86%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                     | 195/227 [11:05<00:30,  1.05it/s]                                                                                                                                                                                              [2024-01-08 05:19:04] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.4.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 86%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                     | 195/227 [11:05<00:30,  1.05it/s] 87%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                    | 197/227 [11:05<01:30,  3.02s/it]                                                                                                                                                                                              [2024-01-08 05:19:09] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.4.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 87%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                    | 197/227 [11:09<01:30,  3.02s/it]                                                                                                                                                                                              [2024-01-08 05:19:09] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.4.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 87%|█████████████████████████████��█████████████████████████████████████████████████████████████████████████████████████████████████████                    | 197/227 [11:10<01:30,  3.02s/it] 87%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                   | 198/227 [11:10<01:33,  3.22s/it]                                                                                                                                                                                              [2024-01-08 05:19:09] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.4.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 87%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                   | 198/227 [11:10<01:33,  3.22s/it]                                                                                                                                                                                              [2024-01-08 05:19:09] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.4.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 87%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                   | 198/227 [11:10<01:33,  3.22s/it]                                                                                                                                                                                              [2024-01-08 05:19:09] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.4.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 87%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                   | 198/227 [11:10<01:33,  3.22s/it]                                                                                                                                                                                              [2024-01-08 05:19:09] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.4.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 87%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                   | 198/227 [11:10<01:33,  3.22s/it]                                                                                                                                                                                              [2024-01-08 05:19:09] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.4.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 87%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                   | 198/227 [11:10<01:33,  3.22s/it]                                                                                                                                                                                              [2024-01-08 05:19:09] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.4.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 87%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                   | 198/227 [11:10<01:33,  3.22s/it] 89%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                | 202/227 [11:10<00:41,  1.65s/it]                                                                                                                                                                                              [2024-01-08 05:19:09] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.4.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 89%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                | 202/227 [11:10<00:41,  1.65s/it]                                                                                                                                                                                              [2024-01-08 05:19:09] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.4.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 89%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                | 202/227 [11:10<00:41,  1.65s/it] 89%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                | 203/227 [11:10<00:34,  1.42s/it]                                                                                                                                                                                              [2024-01-08 05:19:09] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.5.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 89%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                | 203/227 [11:10<00:34,  1.42s/it]                                                                                                                                                                                              [2024-01-08 05:19:09] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.5.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 89%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                | 203/227 [11:10<00:34,  1.42s/it]                                                                                                                                                                                              [2024-01-08 05:19:09] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.5.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 89%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                | 203/227 [11:10<00:34,  1.42s/it]                                                                                                                                                                                              [2024-01-08 05:19:09] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.5.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 89%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                | 203/227 [11:10<00:34,  1.42s/it] 90%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎              | 205/227 [11:10<00:22,  1.02s/it]                                                                                                                                                                                              [2024-01-08 05:19:09] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.5.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 90%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎              | 205/227 [11:10<00:22,  1.02s/it]                                                                                                                                                                                              [2024-01-08 05:19:09] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.5.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 90%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎              | 205/227 [11:10<00:22,  1.02s/it]                                                                                                                                                                                              [2024-01-08 05:19:09] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00003-of-00019.safetensors
+ 90%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎              | 205/227 [11:10<00:22,  1.02s/it]                                                                                                                                                                                              [2024-01-08 05:19:10] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00004-of-00019.safetensors
+ 90%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎              | 205/227 [11:11<00:22,  1.02s/it]                                                                                                                                                                                              [2024-01-08 05:19:20] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.5.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 90%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎              | 205/227 [11:21<00:22,  1.02s/it]                                                                                                                                                                                              [2024-01-08 05:19:20] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.5.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 90%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎              | 205/227 [11:21<00:22,  1.02s/it] 91%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋             | 207/227 [11:21<00:47,  2.36s/it]                                                                                                                                                                                              [2024-01-08 05:19:26] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.5.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 91%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋             | 207/227 [11:27<00:47,  2.36s/it]                                                                                                                                                                                              [2024-01-08 05:19:26] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.5.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 91%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋             | 207/227 [11:27<00:47,  2.36s/it] 92%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎            | 208/227 [11:27<00:57,  3.03s/it]                                                                                                                                                                                              [2024-01-08 05:19:26] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.5.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 92%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎            | 208/227 [11:27<00:57,  3.03s/it]                                                                                                                                                                                              [2024-01-08 05:19:26] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.5.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 92%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎            | 208/227 [11:27<00:57,  3.03s/it]                                                                                                                                                                                              [2024-01-08 05:19:26] INFO huggingface_loader.py:169: Loading HF parameters from: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00005-of-00019.safetensors
+ 92%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎            | 208/227 [11:27<00:57,  3.03s/it]                                                                                                                                                                                              [2024-01-08 05:19:36] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.6.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 92%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎            | 208/227 [11:37<00:57,  3.03s/it]                                                                                                                                                                                              [2024-01-08 05:19:36] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.6.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 92%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎            | 208/227 [11:37<00:57,  3.03s/it] 93%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎          | 211/227 [11:37<00:50,  3.13s/it]                                                                                                                                                                                              [2024-01-08 05:19:39] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.6.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 93%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎          | 211/227 [11:40<00:50,  3.13s/it]                                                                                                                                                                                              [2024-01-08 05:19:40] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.6.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 93%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████��█████▎          | 211/227 [11:41<00:50,  3.13s/it] 93%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████          | 212/227 [11:41<00:48,  3.21s/it]                                                                                                                                                                                              [2024-01-08 05:19:40] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.6.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 93%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████          | 212/227 [11:41<00:48,  3.21s/it]                                                                                                                                                                                              [2024-01-08 05:19:40] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.6.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 93%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████          | 212/227 [11:41<00:48,  3.21s/it]                                                                                                                                                                                              [2024-01-08 05:19:40] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.6.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 93%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████          | 212/227 [11:41<00:48,  3.21s/it]                                                                                                                                                                                              [2024-01-08 05:19:40] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.6.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 93%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████          | 212/227 [11:41<00:48,  3.21s/it] 94%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎        | 214/227 [11:41<00:28,  2.19s/it]                                                                                                                                                                                              [2024-01-08 05:19:40] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.6.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 94%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎        | 214/227 [11:41<00:28,  2.19s/it]                                                                                                                                                                                              [2024-01-08 05:19:40] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.6.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 94%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎        | 214/227 [11:41<00:28,  2.19s/it]                                                                                                                                                                                              [2024-01-08 05:19:40] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.6.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 94%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎        | 214/227 [11:41<00:28,  2.19s/it]                                                                                                                                                                                              [2024-01-08 05:19:40] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.6.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 94%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎        | 214/227 [11:41<00:28,  2.19s/it]                                                                                                                                                                                              [2024-01-08 05:19:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.7.moe.e1_e3.q_weight[0m", shape: (8, 28672, 412), dtype: uint32
+ 94%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎        | 214/227 [11:48<00:28,  2.19s/it]                                                                                                                                                                                              [2024-01-08 05:19:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.7.moe.e1_e3.q_scale[0m", shape: (8, 28672, 103), dtype: float16
+ 94%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎        | 214/227 [11:49<00:28,  2.19s/it] 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████      | 218/227 [11:49<00:18,  2.10s/it]                                                                                                                                                                                              [2024-01-08 05:19:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.7.moe.e2.q_weight[0m", shape: (8, 4096, 1436), dtype: uint32
+ 96%|███████████████████████████████��█████████████████████████████████████████████████████████████████████████████████████████████████████████████████      | 218/227 [11:52<00:18,  2.10s/it]                                                                                                                                                                                              [2024-01-08 05:19:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.7.moe.e2.q_scale[0m", shape: (8, 4096, 359), dtype: float16
+ 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████      | 218/227 [11:52<00:18,  2.10s/it] 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋     | 219/227 [11:52<00:18,  2.32s/it]                                                                                                                                                                                              [2024-01-08 05:19:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.7.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋     | 219/227 [11:52<00:18,  2.32s/it]                                                                                                                                                                                              [2024-01-08 05:19:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.7.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋     | 219/227 [11:52<00:18,  2.32s/it]                                                                                                                                                                                              [2024-01-08 05:19:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.7.input_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋     | 219/227 [11:52<00:18,  2.32s/it]                                                                                                                                                                                              [2024-01-08 05:19:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.7.post_attention_layernorm.weight[0m", shape: (4096,), dtype: float16
+ 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋     | 219/227 [11:52<00:18,  2.32s/it]                                                                                                                                                                                              [2024-01-08 05:19:52] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.7.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋     | 219/227 [11:53<00:18,  2.32s/it]                                                                                                                                                                                              [2024-01-08 05:19:52] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.7.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋     | 219/227 [11:53<00:18,  2.32s/it] 98%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎  | 223/227 [11:53<00:05,  1.28s/it]                                                                                                                                                                                              [2024-01-08 05:19:52] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.7.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+ 98%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎  | 223/227 [11:53<00:05,  1.28s/it]                                                                                                                                                                                              [2024-01-08 05:19:52] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.7.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+ 98%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎  | 223/227 [11:53<00:05,  1.28s/it]                                                                                                                                                                                              [2024-01-08 05:19:52] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.8.moe.gate.q_weight[0m", shape: (8, 412), dtype: uint32
+ 98%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎  | 223/227 [11:53<00:05,  1.28s/it]                                                                                                                                                                                              [2024-01-08 05:19:52] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.8.moe.gate.q_scale[0m", shape: (8, 103), dtype: float16
+ 98%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎  | 223/227 [11:53<00:05,  1.28s/it]                                                                                                                                                                                              [2024-01-08 05:19:52] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.8.self_attn.qkv_proj.q_weight[0m", shape: (6144, 412), dtype: uint32
+ 98%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎  | 223/227 [11:53<00:05,  1.28s/it]                                                                                                                                                                                              [2024-01-08 05:19:52] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.8.self_attn.qkv_proj.q_scale[0m", shape: (6144, 103), dtype: float16
+ 98%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎  | 223/227 [11:53<00:05,  1.28s/it]100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎| 226/227 [11:53<00:00,  1.12it/s]                                                                                                                                                                                              [2024-01-08 05:19:52] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.8.self_attn.o_proj.q_weight[0m", shape: (4096, 412), dtype: uint32
+100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎| 226/227 [11:53<00:00,  1.12it/s]                                                                                                                                                                                              [2024-01-08 05:19:52] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.8.self_attn.o_proj.q_scale[0m", shape: (4096, 103), dtype: float16
+100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎| 226/227 [11:53<00:00,  1.12it/s]100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 227/227 [11:53<00:00,  3.14s/it]
+[2024-01-08 05:19:52] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00004-of-00019.safetensors
+[2024-01-08 05:19:53] INFO huggingface_loader.py:179: Unloading HF weight file: /opt/scratch/assets/Mixtral-8x7B-Instruct-v0.1/model-00005-of-00019.safetensors
+[2024-01-08 05:19:53] INFO stats.py:71: [92mTime usage[0m: HF loading: 29.189 sec; Pre-quantization mapping: 654.965 sec; Quantization: 4.403 sec
+[2024-01-08 05:19:53] INFO stats.py:85: [92mRAM usage[0m: Peak RAM: 18.563 GB. Total bytes loaded from disk: 210.951 GB
+[2024-01-08 05:19:53] INFO convert_weight.py:119: [92mParameter size[0m after quantization: 19.662 GB
+[2024-01-08 05:19:53] INFO convert_weight.py:124: [92mTotal parameters[0m: 46,702,792,704
+[2024-01-08 05:19:53] INFO convert_weight.py:125: [92mBits per parameter[0m: 3.616
+Start storing to cache /home/junrushao/tmp/tmpoazjl9lj
+[0001/0389] saving lm_head.q_weight                                   [0002/0389] saving lm_head.q_scale                                   [0003/0389] saving model.layers.30.moe.e1_e3.q_weight                                                     [0004/0389] saving model.layers.30.moe.e1_e3.q_scale                                                     [0005/0389] saving model.layers.30.moe.e2.q_weight                                                     [0006/0389] saving model.layers.30.moe.e2.q_scale                                                     [0007/0389] saving model.layers.30.input_layernorm.weight                                                         [0008/0389] saving model.layers.30.post_attention_layernorm.weight                                                                  [0009/0389] saving model.layers.31.moe.e1_e3.q_weight                                                                  [0010/0389] saving model.layers.31.moe.e1_e3.q_scale                                                                  [0011/0389] saving model.layers.31.moe.e2.q_weight                                                                  [0012/0389] saving model.layers.31.moe.e2.q_scale                                                                  [0013/0389] saving model.layers.31.moe.gate.q_weight                                                                  [0014/0389] saving model.layers.31.moe.gate.q_scale                                                                  [0015/0389] saving model.layers.31.input_layernorm.weight                                                                  [0016/0389] saving model.layers.31.post_attention_layernorm.weight                                                                  [0017/0389] saving model.layers.31.self_attn.qkv_proj.q_weight                                                                  [0018/0389] saving model.layers.31.self_attn.qkv_proj.q_scale                                                                  [0019/0389] saving model.layers.31.self_attn.o_proj.q_weight                                                                  [0020/0389] saving model.layers.31.self_attn.o_proj.q_scale                                                                  [0021/0389] saving model.norm.weight                                                                  [0022/0389] saving model.embed_tokens.q_weight                                                                  [0023/0389] saving model.embed_tokens.q_scale                                                                  [0024/0389] saving model.layers.0.moe.e1_e3.q_weight                                                                  [0025/0389] saving model.layers.0.moe.e1_e3.q_scale                                                                  [0026/0389] saving model.layers.0.moe.e2.q_weight                                                                  [0027/0389] saving model.layers.0.moe.e2.q_scale                                                                  [0028/0389] saving model.layers.0.moe.gate.q_weight                                                                  [0029/0389] saving model.layers.0.moe.gate.q_scale                                                                  [0030/0389] saving model.layers.0.input_layernorm.weight                                                                  [0031/0389] saving model.layers.0.post_attention_layernorm.weight                                                                  [0032/0389] saving model.layers.0.self_attn.qkv_proj.q_weight                                                                  [0033/0389] saving model.layers.0.self_attn.qkv_proj.q_scale                                                                  [0034/0389] saving model.layers.0.self_attn.o_proj.q_weight                                                                  [0035/0389] saving model.layers.0.self_attn.o_proj.q_scale                                                                  [0036/0389] saving model.layers.1.moe.e1_e3.q_weight                                                                  [0037/0389] saving model.layers.1.moe.e1_e3.q_scale                                                                  [0038/0389] saving model.layers.1.moe.e2.q_weight                                                                  [0039/0389] saving model.layers.1.moe.e2.q_scale                                                                  [0040/0389] saving model.layers.1.moe.gate.q_weight                                                                  [0041/0389] saving model.layers.1.moe.gate.q_scale                                                                  [0042/0389] saving model.layers.1.self_attn.qkv_proj.q_weight                                                                  [0043/0389] saving model.layers.1.self_attn.qkv_proj.q_scale                                                                  [0044/0389] saving model.layers.1.self_attn.o_proj.q_weight                                                                  [0045/0389] saving model.layers.1.self_attn.o_proj.q_scale                                                                  [0046/0389] saving model.layers.1.input_layernorm.weight                                                                  [0047/0389] saving model.layers.1.post_attention_layernorm.weight                                                                  [0048/0389] saving model.layers.2.moe.e1_e3.q_weight                                                                  [0049/0389] saving model.layers.2.moe.e1_e3.q_scale                                                                  [0050/0389] saving model.layers.2.moe.e2.q_weight                                                                  [0051/0389] saving model.layers.2.moe.e2.q_scale                                                                  [0052/0389] saving model.layers.2.moe.gate.q_weight                                                                  [0053/0389] saving model.layers.2.moe.gate.q_scale                                                                  [0054/0389] saving model.layers.2.input_layernorm.weight                                                                  [0055/0389] saving model.layers.2.post_attention_layernorm.weight                                                                  [0056/0389] saving model.layers.2.self_attn.qkv_proj.q_weight                                                                  [0057/0389] saving model.layers.2.self_attn.qkv_proj.q_scale                                                                  [0058/0389] saving model.layers.2.self_attn.o_proj.q_weight                                                                  [0059/0389] saving model.layers.2.self_attn.o_proj.q_scale                                                                  [0060/0389] saving model.layers.3.moe.e1_e3.q_weight                                                                  [0061/0389] saving model.layers.3.moe.e1_e3.q_scale                                                                  [0062/0389] saving model.layers.3.moe.e2.q_weight                                                                  [0063/0389] saving model.layers.3.moe.e2.q_scale                                                                  [0064/0389] saving model.layers.3.moe.gate.q_weight                                                                  [0065/0389] saving model.layers.3.moe.gate.q_scale                                                                  [0066/0389] saving model.layers.3.self_attn.qkv_proj.q_weight                                                                  [0067/0389] saving model.layers.3.self_attn.qkv_proj.q_scale                                                                  [0068/0389] saving model.layers.3.self_attn.o_proj.q_weight                                                                  [0069/0389] saving model.layers.3.self_attn.o_proj.q_scale                                                                  [0070/0389] saving model.layers.10.moe.e1_e3.q_weight                                                                  [0071/0389] saving model.layers.10.moe.e1_e3.q_scale                                                                  [0072/0389] saving model.layers.10.moe.e2.q_weight                                                                  [0073/0389] saving model.layers.10.moe.e2.q_scale                                                                  [0074/0389] saving model.layers.10.moe.gate.q_weight                                                                  [0075/0389] saving model.layers.10.moe.gate.q_scale                                                                  [0076/0389] saving model.layers.10.self_attn.qkv_proj.q_weight                                                                  [0077/0389] saving model.layers.10.self_attn.qkv_proj.q_scale                                                                  [0078/0389] saving model.layers.10.self_attn.o_proj.q_weight                                                                  [0079/0389] saving model.layers.10.self_attn.o_proj.q_scale                                                                  [0080/0389] saving model.layers.8.moe.e2.q_weight                                                                  [0081/0389] saving model.layers.8.moe.e2.q_scale                                                                  [0082/0389] saving model.layers.8.moe.e1_e3.q_weight                                                                  [0083/0389] saving model.layers.8.moe.e1_e3.q_scale                                                                  [0084/0389] saving model.layers.8.input_layernorm.weight                                                                  [0085/0389] saving model.layers.8.post_attention_layernorm.weight                                                                  [0086/0389] saving model.layers.9.moe.e1_e3.q_weight                                                                  [0087/0389] saving model.layers.9.moe.e1_e3.q_scale                                                                  [0088/0389] saving model.layers.9.moe.e2.q_weight                                                                  [0089/0389] saving model.layers.9.moe.e2.q_scale                                                                  [0090/0389] saving model.layers.9.moe.gate.q_weight                                                                  [0091/0389] saving model.layers.9.moe.gate.q_scale                                                                  [0092/0389] saving model.layers.9.input_layernorm.weight                                                                  [0093/0389] saving model.layers.9.post_attention_layernorm.weight                                                                  [0094/0389] saving model.layers.9.self_attn.qkv_proj.q_weight                                                                  [0095/0389] saving model.layers.9.self_attn.qkv_proj.q_scale                                                                  [0096/0389] saving model.layers.9.self_attn.o_proj.q_weight                                                                  [0097/0389] saving model.layers.9.self_attn.o_proj.q_scale                                                                  [0098/0389] saving model.layers.10.input_layernorm.weight                                                                  [0099/0389] saving model.layers.10.post_attention_layernorm.weight                                                                  [0100/0389] saving model.layers.11.moe.e1_e3.q_weight                                                                  [0101/0389] saving model.layers.11.moe.e1_e3.q_scale                                                                  [0102/0389] saving model.layers.11.moe.e2.q_weight                                                                  [0103/0389] saving model.layers.11.moe.e2.q_scale                                                                  [0104/0389] saving model.layers.11.moe.gate.q_weight                                                                  [0105/0389] saving model.layers.11.moe.gate.q_scale                                                                  [0106/0389] saving model.layers.11.self_attn.qkv_proj.q_weight                                                                  [0107/0389] saving model.layers.11.self_attn.qkv_proj.q_scale                                                                  [0108/0389] saving model.layers.11.self_attn.o_proj.q_weight                                                                  [0109/0389] saving model.layers.11.self_attn.o_proj.q_scale                                                                  [0110/0389] saving model.layers.11.input_layernorm.weight                                                                  [0111/0389] saving model.layers.11.post_attention_layernorm.weight                                                                  [0112/0389] saving model.layers.12.moe.e1_e3.q_weight                                                                  [0113/0389] saving model.layers.12.moe.e1_e3.q_scale                                                                  [0114/0389] saving model.layers.12.moe.e2.q_weight                                                                  [0115/0389] saving model.layers.12.moe.e2.q_scale                                                                  [0116/0389] saving model.layers.12.moe.gate.q_weight                                                                  [0117/0389] saving model.layers.12.moe.gate.q_scale                                                                  [0118/0389] saving model.layers.12.input_layernorm.weight                                                                  [0119/0389] saving model.layers.12.post_attention_layernorm.weight                                                                  [0120/0389] saving model.layers.12.self_attn.qkv_proj.q_weight                                                                  [0121/0389] saving model.layers.12.self_attn.qkv_proj.q_scale                                                                  [0122/0389] saving model.layers.12.self_attn.o_proj.q_weight                                                                  [0123/0389] saving model.layers.12.self_attn.o_proj.q_scale                                                                  [0124/0389] saving model.layers.13.moe.e1_e3.q_weight                                                                  [0125/0389] saving model.layers.13.moe.e1_e3.q_scale                                                                  [0126/0389] saving model.layers.13.moe.e2.q_weight                                                                  [0127/0389] saving model.layers.13.moe.e2.q_scale                                                                  [0128/0389] saving model.layers.13.moe.gate.q_weight                                                                  [0129/0389] saving model.layers.13.moe.gate.q_scale                                                                  [0130/0389] saving model.layers.13.self_attn.qkv_proj.q_weight                                                                  [0131/0389] saving model.layers.13.self_attn.qkv_proj.q_scale                                                                  [0132/0389] saving model.layers.13.self_attn.o_proj.q_weight                                                                  [0133/0389] saving model.layers.13.self_attn.o_proj.q_scale                                                                  [0134/0389] saving model.layers.13.input_layernorm.weight                                                                  [0135/0389] saving model.layers.13.post_attention_layernorm.weight                                                                  [0136/0389] saving model.layers.14.moe.e1_e3.q_weight                                                                  [0137/0389] saving model.layers.14.moe.e1_e3.q_scale                                                                  [0138/0389] saving model.layers.14.moe.e2.q_weight                                                                  [0139/0389] saving model.layers.14.moe.e2.q_scale                                                                  [0140/0389] saving model.layers.14.moe.gate.q_weight                                                                  [0141/0389] saving model.layers.14.moe.gate.q_scale                                                                  [0142/0389] saving model.layers.14.input_layernorm.weight                                                                  [0143/0389] saving model.layers.14.post_attention_layernorm.weight                                                                  [0144/0389] saving model.layers.14.self_attn.qkv_proj.q_weight                                                                  [0145/0389] saving model.layers.14.self_attn.qkv_proj.q_scale                                                                  [0146/0389] saving model.layers.14.self_attn.o_proj.q_weight                                                                  [0147/0389] saving model.layers.14.self_attn.o_proj.q_scale                                                                  [0148/0389] saving model.layers.15.moe.e1_e3.q_weight                                                                  [0149/0389] saving model.layers.15.moe.e1_e3.q_scale                                                                  [0150/0389] saving model.layers.15.moe.e2.q_weight                                                                  [0151/0389] saving model.layers.15.moe.e2.q_scale                                                                  [0152/0389] saving model.layers.15.moe.gate.q_weight                                                                  [0153/0389] saving model.layers.15.moe.gate.q_scale                                                                  [0154/0389] saving model.layers.15.self_attn.qkv_proj.q_weight                                                                  [0155/0389] saving model.layers.15.self_attn.qkv_proj.q_scale                                                                  [0156/0389] saving model.layers.15.self_attn.o_proj.q_weight                                                                  [0157/0389] saving model.layers.15.self_attn.o_proj.q_scale                                                                  [0158/0389] saving model.layers.15.input_layernorm.weight                                                                  [0159/0389] saving model.layers.15.post_attention_layernorm.weight                                                                  [0160/0389] saving model.layers.16.moe.e1_e3.q_weight                                                                  [0161/0389] saving model.layers.16.moe.e1_e3.q_scale                                                                  [0162/0389] saving model.layers.16.moe.e2.q_weight                                                                  [0163/0389] saving model.layers.16.moe.e2.q_scale                                                                  [0164/0389] saving model.layers.16.moe.gate.q_weight                                                                  [0165/0389] saving model.layers.16.moe.gate.q_scale                                                                  [0166/0389] saving model.layers.16.self_attn.qkv_proj.q_weight                                                                  [0167/0389] saving model.layers.16.self_attn.qkv_proj.q_scale                                                                  [0168/0389] saving model.layers.16.self_attn.o_proj.q_weight                                                                  [0169/0389] saving model.layers.16.self_attn.o_proj.q_scale                                                                  [0170/0389] saving model.layers.16.input_layernorm.weight                                                                  [0171/0389] saving model.layers.16.post_attention_layernorm.weight                                                                  [0172/0389] saving model.layers.17.moe.e1_e3.q_weight                                                                  [0173/0389] saving model.layers.17.moe.e1_e3.q_scale                                                                  [0174/0389] saving model.layers.17.moe.e2.q_weight                                                                  [0175/0389] saving model.layers.17.moe.e2.q_scale                                                                  [0176/0389] saving model.layers.17.moe.gate.q_weight                                                                  [0177/0389] saving model.layers.17.moe.gate.q_scale                                                                  [0178/0389] saving model.layers.17.input_layernorm.weight                                                                  [0179/0389] saving model.layers.17.post_attention_layernorm.weight                                                                  [0180/0389] saving model.layers.17.self_attn.qkv_proj.q_weight                                                                  [0181/0389] saving model.layers.17.self_attn.qkv_proj.q_scale                                                                  [0182/0389] saving model.layers.17.self_attn.o_proj.q_weight                                                                  [0183/0389] saving model.layers.17.self_attn.o_proj.q_scale                                                                  [0184/0389] saving model.layers.18.moe.e1_e3.q_weight                                                                  [0185/0389] saving model.layers.18.moe.e1_e3.q_scale                                                                  [0186/0389] saving model.layers.18.moe.e2.q_weight                                                                  [0187/0389] saving model.layers.18.moe.e2.q_scale                                                                  [0188/0389] saving model.layers.18.moe.gate.q_weight                                                                  [0189/0389] saving model.layers.18.moe.gate.q_scale                                                                  [0190/0389] saving model.layers.18.self_attn.qkv_proj.q_weight                                                                  [0191/0389] saving model.layers.18.self_attn.qkv_proj.q_scale                                                                  [0192/0389] saving model.layers.18.self_attn.o_proj.q_weight                                                                  [0193/0389] saving model.layers.18.self_attn.o_proj.q_scale                                                                  [0194/0389] saving model.layers.18.input_layernorm.weight                                                                  [0195/0389] saving model.layers.18.post_attention_layernorm.weight                                                                  [0196/0389] saving model.layers.19.moe.e1_e3.q_weight                                                                  [0197/0389] saving model.layers.19.moe.e1_e3.q_scale                                                                  [0198/0389] saving model.layers.19.moe.e2.q_weight                                                                  [0199/0389] saving model.layers.19.moe.e2.q_scale                                                                  [0200/0389] saving model.layers.19.moe.gate.q_weight                                                                  [0201/0389] saving model.layers.19.moe.gate.q_scale                                                                  [0202/0389] saving model.layers.19.input_layernorm.weight                                                                  [0203/0389] saving model.layers.19.post_attention_layernorm.weight                                                                  [0204/0389] saving model.layers.19.self_attn.qkv_proj.q_weight                                                                  [0205/0389] saving model.layers.19.self_attn.qkv_proj.q_scale                                                                  [0206/0389] saving model.layers.19.self_attn.o_proj.q_weight                                                                  [0207/0389] saving model.layers.19.self_attn.o_proj.q_scale                                                                  [0208/0389] saving model.layers.20.moe.e1_e3.q_weight                                                                  [0209/0389] saving model.layers.20.moe.e1_e3.q_scale                                                                  [0210/0389] saving model.layers.20.moe.e2.q_weight                                                                  [0211/0389] saving model.layers.20.moe.e2.q_scale                                                                  [0212/0389] saving model.layers.20.moe.gate.q_weight                                                                  [0213/0389] saving model.layers.20.moe.gate.q_scale                                                                  [0214/0389] saving model.layers.20.self_attn.qkv_proj.q_weight                                                                  [0215/0389] saving model.layers.20.self_attn.qkv_proj.q_scale                                                                  [0216/0389] saving model.layers.20.self_attn.o_proj.q_weight                                                                  [0217/0389] saving model.layers.20.self_attn.o_proj.q_scale                                                                  [0218/0389] saving model.layers.20.input_layernorm.weight                                                                  [0219/0389] saving model.layers.20.post_attention_layernorm.weight                                                                  [0220/0389] saving model.layers.21.moe.e1_e3.q_weight                                                                  [0221/0389] saving model.layers.21.moe.e1_e3.q_scale                                                                  [0222/0389] saving model.layers.21.moe.e2.q_weight                                                                  [0223/0389] saving model.layers.21.moe.e2.q_scale                                                                  [0224/0389] saving model.layers.21.moe.gate.q_weight                                                                  [0225/0389] saving model.layers.21.moe.gate.q_scale                                                                  [0226/0389] saving model.layers.21.input_layernorm.weight                                                                  [0227/0389] saving model.layers.21.post_attention_layernorm.weight                                                                  [0228/0389] saving model.layers.21.self_attn.qkv_proj.q_weight                                                                  [0229/0389] saving model.layers.21.self_attn.qkv_proj.q_scale                                                                  [0230/0389] saving model.layers.21.self_attn.o_proj.q_weight                                                                  [0231/0389] saving model.layers.21.self_attn.o_proj.q_scale                                                                  [0232/0389] saving model.layers.22.moe.e1_e3.q_weight                                                                  [0233/0389] saving model.layers.22.moe.e1_e3.q_scale                                                                  [0234/0389] saving model.layers.22.moe.e2.q_weight                                                                  [0235/0389] saving model.layers.22.moe.e2.q_scale                                                                  [0236/0389] saving model.layers.22.moe.gate.q_weight                                                                  [0237/0389] saving model.layers.22.moe.gate.q_scale                                                                  [0238/0389] saving model.layers.22.self_attn.qkv_proj.q_weight                                                                  [0239/0389] saving model.layers.22.self_attn.qkv_proj.q_scale                                                                  [0240/0389] saving model.layers.22.self_attn.o_proj.q_weight                                                                  [0241/0389] saving model.layers.22.self_attn.o_proj.q_scale                                                                  [0242/0389] saving model.layers.22.input_layernorm.weight                                                                  [0243/0389] saving model.layers.22.post_attention_layernorm.weight                                                                  [0244/0389] saving model.layers.23.moe.e1_e3.q_weight                                                                  [0245/0389] saving model.layers.23.moe.e1_e3.q_scale                                                                  [0246/0389] saving model.layers.23.moe.e2.q_weight                                                                  [0247/0389] saving model.layers.23.moe.e2.q_scale                                                                  [0248/0389] saving model.layers.23.moe.gate.q_weight                                                                  [0249/0389] saving model.layers.23.moe.gate.q_scale                                                                  [0250/0389] saving model.layers.23.self_attn.qkv_proj.q_weight                                                                  [0251/0389] saving model.layers.23.self_attn.qkv_proj.q_scale                                                                  [0252/0389] saving model.layers.23.self_attn.o_proj.q_weight                                                                  [0253/0389] saving model.layers.23.self_attn.o_proj.q_scale                                                                  [0254/0389] saving model.layers.23.input_layernorm.weight                                                                  [0255/0389] saving model.layers.23.post_attention_layernorm.weight                                                                  [0256/0389] saving model.layers.24.moe.e1_e3.q_weight                                                                  [0257/0389] saving model.layers.24.moe.e1_e3.q_scale                                                                  [0258/0389] saving model.layers.24.moe.e2.q_weight                                                                  [0259/0389] saving model.layers.24.moe.e2.q_scale                                                                  [0260/0389] saving model.layers.24.moe.gate.q_weight                                                                  [0261/0389] saving model.layers.24.moe.gate.q_scale                                                                  [0262/0389] saving model.layers.24.input_layernorm.weight                                                                  [0263/0389] saving model.layers.24.post_attention_layernorm.weight                                                                  [0264/0389] saving model.layers.24.self_attn.qkv_proj.q_weight                                                                  [0265/0389] saving model.layers.24.self_attn.qkv_proj.q_scale                                                                  [0266/0389] saving model.layers.24.self_attn.o_proj.q_weight                                                                  [0267/0389] saving model.layers.24.self_attn.o_proj.q_scale                                                                  [0268/0389] saving model.layers.25.moe.e1_e3.q_weight                                                                  [0269/0389] saving model.layers.25.moe.e1_e3.q_scale                                                                  [0270/0389] saving model.layers.25.moe.e2.q_weight                                                                  [0271/0389] saving model.layers.25.moe.e2.q_scale                                                                  [0272/0389] saving model.layers.25.moe.gate.q_weight                                                                  [0273/0389] saving model.layers.25.moe.gate.q_scale                                                                  [0274/0389] saving model.layers.25.self_attn.qkv_proj.q_weight                                                                  [0275/0389] saving model.layers.25.self_attn.qkv_proj.q_scale                                                                  [0276/0389] saving model.layers.25.self_attn.o_proj.q_weight                                                                  [0277/0389] saving model.layers.25.self_attn.o_proj.q_scale                                                                  [0278/0389] saving model.layers.25.input_layernorm.weight                                                                  [0279/0389] saving model.layers.25.post_attention_layernorm.weight                                                                  [0280/0389] saving model.layers.26.moe.e1_e3.q_weight                                                                  [0281/0389] saving model.layers.26.moe.e1_e3.q_scale                                                                  [0282/0389] saving model.layers.26.moe.e2.q_weight                                                                  [0283/0389] saving model.layers.26.moe.e2.q_scale                                                                  [0284/0389] saving model.layers.26.moe.gate.q_weight                                                                  [0285/0389] saving model.layers.26.moe.gate.q_scale                                                                  [0286/0389] saving model.layers.26.input_layernorm.weight                                                                  [0287/0389] saving model.layers.26.post_attention_layernorm.weight                                                                  [0288/0389] saving model.layers.26.self_attn.qkv_proj.q_weight                                                                  [0289/0389] saving model.layers.26.self_attn.qkv_proj.q_scale                                                                  [0290/0389] saving model.layers.26.self_attn.o_proj.q_weight                                                                  [0291/0389] saving model.layers.26.self_attn.o_proj.q_scale                                                                  [0292/0389] saving model.layers.27.moe.e1_e3.q_weight                                                                  [0293/0389] saving model.layers.27.moe.e1_e3.q_scale                                                                  [0294/0389] saving model.layers.27.moe.e2.q_weight                                                                  [0295/0389] saving model.layers.27.moe.e2.q_scale                                                                  [0296/0389] saving model.layers.27.moe.gate.q_weight                                                                  [0297/0389] saving model.layers.27.moe.gate.q_scale                                                                  [0298/0389] saving model.layers.27.self_attn.qkv_proj.q_weight                                                                  [0299/0389] saving model.layers.27.self_attn.qkv_proj.q_scale                                                                  [0300/0389] saving model.layers.27.self_attn.o_proj.q_weight                                                                  [0301/0389] saving model.layers.27.self_attn.o_proj.q_scale                                                                  [0302/0389] saving model.layers.27.input_layernorm.weight                                                                  [0303/0389] saving model.layers.27.post_attention_layernorm.weight                                                                  [0304/0389] saving model.layers.28.moe.e1_e3.q_weight                                                                  [0305/0389] saving model.layers.28.moe.e1_e3.q_scale                                                                  [0306/0389] saving model.layers.28.moe.e2.q_weight                                                                  [0307/0389] saving model.layers.28.moe.e2.q_scale                                                                  [0308/0389] saving model.layers.28.moe.gate.q_weight                                                                  [0309/0389] saving model.layers.28.moe.gate.q_scale                                                                  [0310/0389] saving model.layers.28.self_attn.qkv_proj.q_weight                                                                  [0311/0389] saving model.layers.28.self_attn.qkv_proj.q_scale                                                                  [0312/0389] saving model.layers.28.self_attn.o_proj.q_weight                                                                  [0313/0389] saving model.layers.28.self_attn.o_proj.q_scale                                                                  [0314/0389] saving model.layers.28.input_layernorm.weight                                                                  [0315/0389] saving model.layers.28.post_attention_layernorm.weight                                                                  [0316/0389] saving model.layers.29.moe.e1_e3.q_weight                                                                  [0317/0389] saving model.layers.29.moe.e1_e3.q_scale                                                                  [0318/0389] saving model.layers.29.moe.e2.q_weight                                                                  [0319/0389] saving model.layers.29.moe.e2.q_scale                                                                  [0320/0389] saving model.layers.29.moe.gate.q_weight                                                                  [0321/0389] saving model.layers.29.moe.gate.q_scale                                                                  [0322/0389] saving model.layers.29.input_layernorm.weight                                                                  [0323/0389] saving model.layers.29.post_attention_layernorm.weight                                                                  [0324/0389] saving model.layers.29.self_attn.qkv_proj.q_weight                                                                  [0325/0389] saving model.layers.29.self_attn.qkv_proj.q_scale                                                                  [0326/0389] saving model.layers.29.self_attn.o_proj.q_weight                                                                  [0327/0389] saving model.layers.29.self_attn.o_proj.q_scale                                                                  [0328/0389] saving model.layers.30.moe.gate.q_weight                                                                  [0329/0389] saving model.layers.30.moe.gate.q_scale                                                                  [0330/0389] saving model.layers.30.self_attn.qkv_proj.q_weight[2024-01-08 05:21:33] INFO convert_weight.py:141: Saved to directory: [1m/home/junrushao/tmp/tmpoazjl9lj[0m
+                                                                  [0331/0389] saving model.layers.30.self_attn.qkv_proj.q_scale                                                                  [0332/0389] saving model.layers.30.self_attn.o_proj.q_weight                                                                  [0333/0389] saving model.layers.30.self_attn.o_proj.q_scale                                                                  [0334/0389] saving model.layers.3.input_layernorm.weight                                                                  [0335/0389] saving model.layers.3.post_attention_layernorm.weight                                                                  [0336/0389] saving model.layers.4.moe.e1_e3.q_weight                                                                  [0337/0389] saving model.layers.4.moe.e1_e3.q_scale                                                                  [0338/0389] saving model.layers.4.moe.e2.q_weight                                                                  [0339/0389] saving model.layers.4.moe.e2.q_scale                                                                  [0340/0389] saving model.layers.4.moe.gate.q_weight                                                                  [0341/0389] saving model.layers.4.moe.gate.q_scale                                                                  [0342/0389] saving model.layers.4.input_layernorm.weight                                                                  [0343/0389] saving model.layers.4.post_attention_layernorm.weight                                                                  [0344/0389] saving model.layers.4.self_attn.qkv_proj.q_weight                                                                  [0345/0389] saving model.layers.4.self_attn.qkv_proj.q_scale                                                                  [0346/0389] saving model.layers.4.self_attn.o_proj.q_weight                                                                  [0347/0389] saving model.layers.4.self_attn.o_proj.q_scale                                                                  [0348/0389] saving model.layers.5.moe.gate.q_weight                                                                  [0349/0389] saving model.layers.5.moe.gate.q_scale                                                                  [0350/0389] saving model.layers.5.self_attn.qkv_proj.q_weight                                                                  [0351/0389] saving model.layers.5.self_attn.qkv_proj.q_scale                                                                  [0352/0389] saving model.layers.5.self_attn.o_proj.q_weight                                                                  [0353/0389] saving model.layers.5.self_attn.o_proj.q_scale                                                                  [0354/0389] saving model.layers.5.moe.e1_e3.q_weight                                                                  [0355/0389] saving model.layers.5.moe.e1_e3.q_scale                                                                  [0356/0389] saving model.layers.5.moe.e2.q_weight                                                                  [0357/0389] saving model.layers.5.moe.e2.q_scale                                                                  [0358/0389] saving model.layers.5.input_layernorm.weight                                                                  [0359/0389] saving model.layers.5.post_attention_layernorm.weight                                                                  [0360/0389] saving model.layers.6.moe.e1_e3.q_weight                                                                  [0361/0389] saving model.layers.6.moe.e1_e3.q_scale                                                                  [0362/0389] saving model.layers.6.moe.e2.q_weight                                                                  [0363/0389] saving model.layers.6.moe.e2.q_scale                                                                  [0364/0389] saving model.layers.6.moe.gate.q_weight                                                                  [0365/0389] saving model.layers.6.moe.gate.q_scale                                                                  [0366/0389] saving model.layers.6.self_attn.qkv_proj.q_weight                                                                  [0367/0389] saving model.layers.6.self_attn.qkv_proj.q_scale                                                                  [0368/0389] saving model.layers.6.self_attn.o_proj.q_weight                                                                  [0369/0389] saving model.layers.6.self_attn.o_proj.q_scale                                                                  [0370/0389] saving model.layers.6.input_layernorm.weight                                                                  [0371/0389] saving model.layers.6.post_attention_layernorm.weight                                                                  [0372/0389] saving model.layers.7.moe.e1_e3.q_weight                                                                  [0373/0389] saving model.layers.7.moe.e1_e3.q_scale                                                                  [0374/0389] saving model.layers.7.moe.e2.q_weight                                                                  [0375/0389] saving model.layers.7.moe.e2.q_scale                                                                  [0376/0389] saving model.layers.7.moe.gate.q_weight                                                                  [0377/0389] saving model.layers.7.moe.gate.q_scale                                                                  [0378/0389] saving model.layers.7.input_layernorm.weight                                                                  [0379/0389] saving model.layers.7.post_attention_layernorm.weight                                                                  [0380/0389] saving model.layers.7.self_attn.qkv_proj.q_weight                                                                  [0381/0389] saving model.layers.7.self_attn.qkv_proj.q_scale                                                                  [0382/0389] saving model.layers.7.self_attn.o_proj.q_weight                                                                  [0383/0389] saving model.layers.7.self_attn.o_proj.q_scale                                                                  [0384/0389] saving model.layers.8.moe.gate.q_weight                                                                  [0385/0389] saving model.layers.8.moe.gate.q_scale                                                                  [0386/0389] saving model.layers.8.self_attn.qkv_proj.q_weight                                                                  [0387/0389] saving model.layers.8.self_attn.qkv_proj.q_scale                                                                  [0388/0389] saving model.layers.8.self_attn.o_proj.q_weight                                                                  [0389/0389] saving model.layers.8.self_attn.o_proj.q_scale
+All finished, 147 total shards committed, record saved to /home/junrushao/tmp/tmpoazjl9lj/ndarray-cache.json