riczhou commited on
Commit
b90484a
·
verified ·
1 Parent(s): e092667

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
mlc-chat-config.json ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "0.1.0",
3
+ "model_type": "qwen3",
4
+ "quantization": "q0f32",
5
+ "model_config": {
6
+ "hidden_act": "silu",
7
+ "hidden_size": 1024,
8
+ "intermediate_size": 3072,
9
+ "attention_bias": false,
10
+ "num_attention_heads": 16,
11
+ "num_hidden_layers": 28,
12
+ "num_key_value_heads": 8,
13
+ "rms_norm_eps": 1e-06,
14
+ "rope_theta": 1000000,
15
+ "vocab_size": 151936,
16
+ "tie_word_embeddings": true,
17
+ "context_window_size": 40960,
18
+ "prefill_chunk_size": 2048,
19
+ "tensor_parallel_shards": 1,
20
+ "head_dim": 128,
21
+ "dtype": "float32",
22
+ "max_batch_size": 128,
23
+ "weight_block_size": null
24
+ },
25
+ "vocab_size": 151936,
26
+ "context_window_size": 40960,
27
+ "sliding_window_size": -1,
28
+ "prefill_chunk_size": 2048,
29
+ "attention_sink_size": -1,
30
+ "tensor_parallel_shards": 1,
31
+ "pipeline_parallel_stages": 1,
32
+ "temperature": 0.6,
33
+ "presence_penalty": 0.0,
34
+ "frequency_penalty": 0.0,
35
+ "repetition_penalty": 1.0,
36
+ "top_p": 0.95,
37
+ "tokenizer_files": [
38
+ "tokenizer.json",
39
+ "vocab.json",
40
+ "merges.txt",
41
+ "tokenizer_config.json"
42
+ ],
43
+ "tokenizer_info": {
44
+ "token_postproc_method": "byte_level",
45
+ "prepend_space_in_encode": false,
46
+ "strip_space_in_decode": false
47
+ },
48
+ "conv_template": {
49
+ "name": "qwen2",
50
+ "system_template": "<|im_start|>system\n{system_message}<|im_end|>\n",
51
+ "system_message": "You are a helpful assistant.",
52
+ "system_prefix_token_ids": null,
53
+ "add_role_after_system_message": true,
54
+ "roles": {
55
+ "user": "<|im_start|>user",
56
+ "assistant": "<|im_start|>assistant"
57
+ },
58
+ "role_templates": {
59
+ "user": "{user_message}",
60
+ "assistant": "{assistant_message}",
61
+ "tool": "{tool_message}"
62
+ },
63
+ "messages": [],
64
+ "seps": [
65
+ "<|im_end|>\n"
66
+ ],
67
+ "role_content_sep": "\n",
68
+ "role_empty_sep": "\n",
69
+ "stop_str": [
70
+ "<|endoftext|>",
71
+ "<|im_end|>"
72
+ ],
73
+ "stop_token_ids": [
74
+ 151643,
75
+ 151645
76
+ ],
77
+ "function_string": "",
78
+ "use_function_calling": false
79
+ },
80
+ "pad_token_id": 151643,
81
+ "bos_token_id": 151643,
82
+ "eos_token_id": [
83
+ 151645,
84
+ 151643
85
+ ]
86
+ }
ndarray-cache-b16.json ADDED
@@ -0,0 +1,2614 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "ParamSize": 226,
4
+ "ParamBytes": 2384199680.0,
5
+ "BitsPerParam": 25.37623158078298
6
+ },
7
+ "records": [
8
+ {
9
+ "dataPath": "params_shard_0.bin",
10
+ "format": "raw-shard",
11
+ "nbytes": 311164928,
12
+ "records": [
13
+ {
14
+ "name": "model.embed_tokens.weight",
15
+ "shape": [
16
+ 151936,
17
+ 1024
18
+ ],
19
+ "dtype": "bfloat16",
20
+ "format": "raw",
21
+ "nbytes": 311164928,
22
+ "byteOffset": 0
23
+ }
24
+ ],
25
+ "md5sum": "4f615e4e204fe160829261879c8c4864"
26
+ },
27
+ {
28
+ "dataPath": "params_shard_1.bin",
29
+ "format": "raw-shard",
30
+ "nbytes": 31463936,
31
+ "records": [
32
+ {
33
+ "name": "model.layers.0.input_layernorm.weight",
34
+ "shape": [
35
+ 1024
36
+ ],
37
+ "dtype": "bfloat16",
38
+ "format": "raw",
39
+ "nbytes": 2048,
40
+ "byteOffset": 0
41
+ },
42
+ {
43
+ "name": "model.layers.0.mlp.down_proj.weight",
44
+ "shape": [
45
+ 1024,
46
+ 3072
47
+ ],
48
+ "dtype": "bfloat16",
49
+ "format": "raw",
50
+ "nbytes": 6291456,
51
+ "byteOffset": 2048
52
+ },
53
+ {
54
+ "name": "model.layers.0.mlp.gate_up_proj.weight",
55
+ "shape": [
56
+ 6144,
57
+ 1024
58
+ ],
59
+ "dtype": "bfloat16",
60
+ "format": "raw",
61
+ "nbytes": 12582912,
62
+ "byteOffset": 6293504
63
+ },
64
+ {
65
+ "name": "model.layers.0.post_attention_layernorm.weight",
66
+ "shape": [
67
+ 1024
68
+ ],
69
+ "dtype": "bfloat16",
70
+ "format": "raw",
71
+ "nbytes": 2048,
72
+ "byteOffset": 18876416
73
+ },
74
+ {
75
+ "name": "model.layers.0.self_attn.k_norm.weight",
76
+ "shape": [
77
+ 128
78
+ ],
79
+ "dtype": "bfloat16",
80
+ "format": "raw",
81
+ "nbytes": 256,
82
+ "byteOffset": 18878464
83
+ },
84
+ {
85
+ "name": "model.layers.0.self_attn.c_attn.weight",
86
+ "shape": [
87
+ 4096,
88
+ 1024
89
+ ],
90
+ "dtype": "bfloat16",
91
+ "format": "raw",
92
+ "nbytes": 8388608,
93
+ "byteOffset": 18878720
94
+ },
95
+ {
96
+ "name": "model.layers.0.self_attn.o_proj.weight",
97
+ "shape": [
98
+ 1024,
99
+ 2048
100
+ ],
101
+ "dtype": "bfloat16",
102
+ "format": "raw",
103
+ "nbytes": 4194304,
104
+ "byteOffset": 27267328
105
+ },
106
+ {
107
+ "name": "model.layers.0.self_attn.q_norm.weight",
108
+ "shape": [
109
+ 128
110
+ ],
111
+ "dtype": "bfloat16",
112
+ "format": "raw",
113
+ "nbytes": 256,
114
+ "byteOffset": 31461632
115
+ },
116
+ {
117
+ "name": "model.layers.1.input_layernorm.weight",
118
+ "shape": [
119
+ 1024
120
+ ],
121
+ "dtype": "bfloat16",
122
+ "format": "raw",
123
+ "nbytes": 2048,
124
+ "byteOffset": 31461888
125
+ }
126
+ ],
127
+ "md5sum": "9e8c92fad1c6d7956d2f76d18b34c0c0"
128
+ },
129
+ {
130
+ "dataPath": "params_shard_2.bin",
131
+ "format": "raw-shard",
132
+ "nbytes": 31461888,
133
+ "records": [
134
+ {
135
+ "name": "model.layers.1.mlp.down_proj.weight",
136
+ "shape": [
137
+ 1024,
138
+ 3072
139
+ ],
140
+ "dtype": "bfloat16",
141
+ "format": "raw",
142
+ "nbytes": 6291456,
143
+ "byteOffset": 0
144
+ },
145
+ {
146
+ "name": "model.layers.1.mlp.gate_up_proj.weight",
147
+ "shape": [
148
+ 6144,
149
+ 1024
150
+ ],
151
+ "dtype": "bfloat16",
152
+ "format": "raw",
153
+ "nbytes": 12582912,
154
+ "byteOffset": 6291456
155
+ },
156
+ {
157
+ "name": "model.layers.1.post_attention_layernorm.weight",
158
+ "shape": [
159
+ 1024
160
+ ],
161
+ "dtype": "bfloat16",
162
+ "format": "raw",
163
+ "nbytes": 2048,
164
+ "byteOffset": 18874368
165
+ },
166
+ {
167
+ "name": "model.layers.1.self_attn.k_norm.weight",
168
+ "shape": [
169
+ 128
170
+ ],
171
+ "dtype": "bfloat16",
172
+ "format": "raw",
173
+ "nbytes": 256,
174
+ "byteOffset": 18876416
175
+ },
176
+ {
177
+ "name": "model.layers.1.self_attn.c_attn.weight",
178
+ "shape": [
179
+ 4096,
180
+ 1024
181
+ ],
182
+ "dtype": "bfloat16",
183
+ "format": "raw",
184
+ "nbytes": 8388608,
185
+ "byteOffset": 18876672
186
+ },
187
+ {
188
+ "name": "model.layers.1.self_attn.o_proj.weight",
189
+ "shape": [
190
+ 1024,
191
+ 2048
192
+ ],
193
+ "dtype": "bfloat16",
194
+ "format": "raw",
195
+ "nbytes": 4194304,
196
+ "byteOffset": 27265280
197
+ },
198
+ {
199
+ "name": "model.layers.1.self_attn.q_norm.weight",
200
+ "shape": [
201
+ 128
202
+ ],
203
+ "dtype": "bfloat16",
204
+ "format": "raw",
205
+ "nbytes": 256,
206
+ "byteOffset": 31459584
207
+ },
208
+ {
209
+ "name": "model.layers.10.input_layernorm.weight",
210
+ "shape": [
211
+ 1024
212
+ ],
213
+ "dtype": "bfloat16",
214
+ "format": "raw",
215
+ "nbytes": 2048,
216
+ "byteOffset": 31459840
217
+ }
218
+ ],
219
+ "md5sum": "c74facda8699fab5c1fb909a0038eac3"
220
+ },
221
+ {
222
+ "dataPath": "params_shard_3.bin",
223
+ "format": "raw-shard",
224
+ "nbytes": 31461888,
225
+ "records": [
226
+ {
227
+ "name": "model.layers.10.mlp.down_proj.weight",
228
+ "shape": [
229
+ 1024,
230
+ 3072
231
+ ],
232
+ "dtype": "bfloat16",
233
+ "format": "raw",
234
+ "nbytes": 6291456,
235
+ "byteOffset": 0
236
+ },
237
+ {
238
+ "name": "model.layers.10.mlp.gate_up_proj.weight",
239
+ "shape": [
240
+ 6144,
241
+ 1024
242
+ ],
243
+ "dtype": "bfloat16",
244
+ "format": "raw",
245
+ "nbytes": 12582912,
246
+ "byteOffset": 6291456
247
+ },
248
+ {
249
+ "name": "model.layers.10.post_attention_layernorm.weight",
250
+ "shape": [
251
+ 1024
252
+ ],
253
+ "dtype": "bfloat16",
254
+ "format": "raw",
255
+ "nbytes": 2048,
256
+ "byteOffset": 18874368
257
+ },
258
+ {
259
+ "name": "model.layers.10.self_attn.k_norm.weight",
260
+ "shape": [
261
+ 128
262
+ ],
263
+ "dtype": "bfloat16",
264
+ "format": "raw",
265
+ "nbytes": 256,
266
+ "byteOffset": 18876416
267
+ },
268
+ {
269
+ "name": "model.layers.10.self_attn.c_attn.weight",
270
+ "shape": [
271
+ 4096,
272
+ 1024
273
+ ],
274
+ "dtype": "bfloat16",
275
+ "format": "raw",
276
+ "nbytes": 8388608,
277
+ "byteOffset": 18876672
278
+ },
279
+ {
280
+ "name": "model.layers.10.self_attn.o_proj.weight",
281
+ "shape": [
282
+ 1024,
283
+ 2048
284
+ ],
285
+ "dtype": "bfloat16",
286
+ "format": "raw",
287
+ "nbytes": 4194304,
288
+ "byteOffset": 27265280
289
+ },
290
+ {
291
+ "name": "model.layers.10.self_attn.q_norm.weight",
292
+ "shape": [
293
+ 128
294
+ ],
295
+ "dtype": "bfloat16",
296
+ "format": "raw",
297
+ "nbytes": 256,
298
+ "byteOffset": 31459584
299
+ },
300
+ {
301
+ "name": "model.layers.11.input_layernorm.weight",
302
+ "shape": [
303
+ 1024
304
+ ],
305
+ "dtype": "bfloat16",
306
+ "format": "raw",
307
+ "nbytes": 2048,
308
+ "byteOffset": 31459840
309
+ }
310
+ ],
311
+ "md5sum": "79511a618a11a839e255c0d6b8117701"
312
+ },
313
+ {
314
+ "dataPath": "params_shard_4.bin",
315
+ "format": "raw-shard",
316
+ "nbytes": 31461888,
317
+ "records": [
318
+ {
319
+ "name": "model.layers.11.mlp.down_proj.weight",
320
+ "shape": [
321
+ 1024,
322
+ 3072
323
+ ],
324
+ "dtype": "bfloat16",
325
+ "format": "raw",
326
+ "nbytes": 6291456,
327
+ "byteOffset": 0
328
+ },
329
+ {
330
+ "name": "model.layers.11.mlp.gate_up_proj.weight",
331
+ "shape": [
332
+ 6144,
333
+ 1024
334
+ ],
335
+ "dtype": "bfloat16",
336
+ "format": "raw",
337
+ "nbytes": 12582912,
338
+ "byteOffset": 6291456
339
+ },
340
+ {
341
+ "name": "model.layers.11.post_attention_layernorm.weight",
342
+ "shape": [
343
+ 1024
344
+ ],
345
+ "dtype": "bfloat16",
346
+ "format": "raw",
347
+ "nbytes": 2048,
348
+ "byteOffset": 18874368
349
+ },
350
+ {
351
+ "name": "model.layers.11.self_attn.k_norm.weight",
352
+ "shape": [
353
+ 128
354
+ ],
355
+ "dtype": "bfloat16",
356
+ "format": "raw",
357
+ "nbytes": 256,
358
+ "byteOffset": 18876416
359
+ },
360
+ {
361
+ "name": "model.layers.11.self_attn.c_attn.weight",
362
+ "shape": [
363
+ 4096,
364
+ 1024
365
+ ],
366
+ "dtype": "bfloat16",
367
+ "format": "raw",
368
+ "nbytes": 8388608,
369
+ "byteOffset": 18876672
370
+ },
371
+ {
372
+ "name": "model.layers.11.self_attn.o_proj.weight",
373
+ "shape": [
374
+ 1024,
375
+ 2048
376
+ ],
377
+ "dtype": "bfloat16",
378
+ "format": "raw",
379
+ "nbytes": 4194304,
380
+ "byteOffset": 27265280
381
+ },
382
+ {
383
+ "name": "model.layers.11.self_attn.q_norm.weight",
384
+ "shape": [
385
+ 128
386
+ ],
387
+ "dtype": "bfloat16",
388
+ "format": "raw",
389
+ "nbytes": 256,
390
+ "byteOffset": 31459584
391
+ },
392
+ {
393
+ "name": "model.layers.12.input_layernorm.weight",
394
+ "shape": [
395
+ 1024
396
+ ],
397
+ "dtype": "bfloat16",
398
+ "format": "raw",
399
+ "nbytes": 2048,
400
+ "byteOffset": 31459840
401
+ }
402
+ ],
403
+ "md5sum": "340f19886367858e06bb5b5367d20305"
404
+ },
405
+ {
406
+ "dataPath": "params_shard_5.bin",
407
+ "format": "raw-shard",
408
+ "nbytes": 31461888,
409
+ "records": [
410
+ {
411
+ "name": "model.layers.12.mlp.down_proj.weight",
412
+ "shape": [
413
+ 1024,
414
+ 3072
415
+ ],
416
+ "dtype": "bfloat16",
417
+ "format": "raw",
418
+ "nbytes": 6291456,
419
+ "byteOffset": 0
420
+ },
421
+ {
422
+ "name": "model.layers.12.mlp.gate_up_proj.weight",
423
+ "shape": [
424
+ 6144,
425
+ 1024
426
+ ],
427
+ "dtype": "bfloat16",
428
+ "format": "raw",
429
+ "nbytes": 12582912,
430
+ "byteOffset": 6291456
431
+ },
432
+ {
433
+ "name": "model.layers.12.post_attention_layernorm.weight",
434
+ "shape": [
435
+ 1024
436
+ ],
437
+ "dtype": "bfloat16",
438
+ "format": "raw",
439
+ "nbytes": 2048,
440
+ "byteOffset": 18874368
441
+ },
442
+ {
443
+ "name": "model.layers.12.self_attn.k_norm.weight",
444
+ "shape": [
445
+ 128
446
+ ],
447
+ "dtype": "bfloat16",
448
+ "format": "raw",
449
+ "nbytes": 256,
450
+ "byteOffset": 18876416
451
+ },
452
+ {
453
+ "name": "model.layers.12.self_attn.c_attn.weight",
454
+ "shape": [
455
+ 4096,
456
+ 1024
457
+ ],
458
+ "dtype": "bfloat16",
459
+ "format": "raw",
460
+ "nbytes": 8388608,
461
+ "byteOffset": 18876672
462
+ },
463
+ {
464
+ "name": "model.layers.12.self_attn.o_proj.weight",
465
+ "shape": [
466
+ 1024,
467
+ 2048
468
+ ],
469
+ "dtype": "bfloat16",
470
+ "format": "raw",
471
+ "nbytes": 4194304,
472
+ "byteOffset": 27265280
473
+ },
474
+ {
475
+ "name": "model.layers.12.self_attn.q_norm.weight",
476
+ "shape": [
477
+ 128
478
+ ],
479
+ "dtype": "bfloat16",
480
+ "format": "raw",
481
+ "nbytes": 256,
482
+ "byteOffset": 31459584
483
+ },
484
+ {
485
+ "name": "model.layers.13.input_layernorm.weight",
486
+ "shape": [
487
+ 1024
488
+ ],
489
+ "dtype": "bfloat16",
490
+ "format": "raw",
491
+ "nbytes": 2048,
492
+ "byteOffset": 31459840
493
+ }
494
+ ],
495
+ "md5sum": "051d820dffb4d079fe6324e4fed50c4f"
496
+ },
497
+ {
498
+ "dataPath": "params_shard_6.bin",
499
+ "format": "raw-shard",
500
+ "nbytes": 31461888,
501
+ "records": [
502
+ {
503
+ "name": "model.layers.13.mlp.down_proj.weight",
504
+ "shape": [
505
+ 1024,
506
+ 3072
507
+ ],
508
+ "dtype": "bfloat16",
509
+ "format": "raw",
510
+ "nbytes": 6291456,
511
+ "byteOffset": 0
512
+ },
513
+ {
514
+ "name": "model.layers.13.mlp.gate_up_proj.weight",
515
+ "shape": [
516
+ 6144,
517
+ 1024
518
+ ],
519
+ "dtype": "bfloat16",
520
+ "format": "raw",
521
+ "nbytes": 12582912,
522
+ "byteOffset": 6291456
523
+ },
524
+ {
525
+ "name": "model.layers.13.post_attention_layernorm.weight",
526
+ "shape": [
527
+ 1024
528
+ ],
529
+ "dtype": "bfloat16",
530
+ "format": "raw",
531
+ "nbytes": 2048,
532
+ "byteOffset": 18874368
533
+ },
534
+ {
535
+ "name": "model.layers.13.self_attn.k_norm.weight",
536
+ "shape": [
537
+ 128
538
+ ],
539
+ "dtype": "bfloat16",
540
+ "format": "raw",
541
+ "nbytes": 256,
542
+ "byteOffset": 18876416
543
+ },
544
+ {
545
+ "name": "model.layers.13.self_attn.c_attn.weight",
546
+ "shape": [
547
+ 4096,
548
+ 1024
549
+ ],
550
+ "dtype": "bfloat16",
551
+ "format": "raw",
552
+ "nbytes": 8388608,
553
+ "byteOffset": 18876672
554
+ },
555
+ {
556
+ "name": "model.layers.13.self_attn.o_proj.weight",
557
+ "shape": [
558
+ 1024,
559
+ 2048
560
+ ],
561
+ "dtype": "bfloat16",
562
+ "format": "raw",
563
+ "nbytes": 4194304,
564
+ "byteOffset": 27265280
565
+ },
566
+ {
567
+ "name": "model.layers.13.self_attn.q_norm.weight",
568
+ "shape": [
569
+ 128
570
+ ],
571
+ "dtype": "bfloat16",
572
+ "format": "raw",
573
+ "nbytes": 256,
574
+ "byteOffset": 31459584
575
+ },
576
+ {
577
+ "name": "model.layers.14.input_layernorm.weight",
578
+ "shape": [
579
+ 1024
580
+ ],
581
+ "dtype": "bfloat16",
582
+ "format": "raw",
583
+ "nbytes": 2048,
584
+ "byteOffset": 31459840
585
+ }
586
+ ],
587
+ "md5sum": "4aa584075dc763c62cfe3d5e58f17506"
588
+ },
589
+ {
590
+ "dataPath": "params_shard_7.bin",
591
+ "format": "raw-shard",
592
+ "nbytes": 31461888,
593
+ "records": [
594
+ {
595
+ "name": "model.layers.14.mlp.down_proj.weight",
596
+ "shape": [
597
+ 1024,
598
+ 3072
599
+ ],
600
+ "dtype": "bfloat16",
601
+ "format": "raw",
602
+ "nbytes": 6291456,
603
+ "byteOffset": 0
604
+ },
605
+ {
606
+ "name": "model.layers.14.mlp.gate_up_proj.weight",
607
+ "shape": [
608
+ 6144,
609
+ 1024
610
+ ],
611
+ "dtype": "bfloat16",
612
+ "format": "raw",
613
+ "nbytes": 12582912,
614
+ "byteOffset": 6291456
615
+ },
616
+ {
617
+ "name": "model.layers.14.post_attention_layernorm.weight",
618
+ "shape": [
619
+ 1024
620
+ ],
621
+ "dtype": "bfloat16",
622
+ "format": "raw",
623
+ "nbytes": 2048,
624
+ "byteOffset": 18874368
625
+ },
626
+ {
627
+ "name": "model.layers.14.self_attn.k_norm.weight",
628
+ "shape": [
629
+ 128
630
+ ],
631
+ "dtype": "bfloat16",
632
+ "format": "raw",
633
+ "nbytes": 256,
634
+ "byteOffset": 18876416
635
+ },
636
+ {
637
+ "name": "model.layers.14.self_attn.c_attn.weight",
638
+ "shape": [
639
+ 4096,
640
+ 1024
641
+ ],
642
+ "dtype": "bfloat16",
643
+ "format": "raw",
644
+ "nbytes": 8388608,
645
+ "byteOffset": 18876672
646
+ },
647
+ {
648
+ "name": "model.layers.14.self_attn.o_proj.weight",
649
+ "shape": [
650
+ 1024,
651
+ 2048
652
+ ],
653
+ "dtype": "bfloat16",
654
+ "format": "raw",
655
+ "nbytes": 4194304,
656
+ "byteOffset": 27265280
657
+ },
658
+ {
659
+ "name": "model.layers.14.self_attn.q_norm.weight",
660
+ "shape": [
661
+ 128
662
+ ],
663
+ "dtype": "bfloat16",
664
+ "format": "raw",
665
+ "nbytes": 256,
666
+ "byteOffset": 31459584
667
+ },
668
+ {
669
+ "name": "model.layers.15.input_layernorm.weight",
670
+ "shape": [
671
+ 1024
672
+ ],
673
+ "dtype": "bfloat16",
674
+ "format": "raw",
675
+ "nbytes": 2048,
676
+ "byteOffset": 31459840
677
+ }
678
+ ],
679
+ "md5sum": "f3846e544c925d1caab9675209fb9b72"
680
+ },
681
+ {
682
+ "dataPath": "params_shard_8.bin",
683
+ "format": "raw-shard",
684
+ "nbytes": 31461888,
685
+ "records": [
686
+ {
687
+ "name": "model.layers.15.mlp.down_proj.weight",
688
+ "shape": [
689
+ 1024,
690
+ 3072
691
+ ],
692
+ "dtype": "bfloat16",
693
+ "format": "raw",
694
+ "nbytes": 6291456,
695
+ "byteOffset": 0
696
+ },
697
+ {
698
+ "name": "model.layers.15.mlp.gate_up_proj.weight",
699
+ "shape": [
700
+ 6144,
701
+ 1024
702
+ ],
703
+ "dtype": "bfloat16",
704
+ "format": "raw",
705
+ "nbytes": 12582912,
706
+ "byteOffset": 6291456
707
+ },
708
+ {
709
+ "name": "model.layers.15.post_attention_layernorm.weight",
710
+ "shape": [
711
+ 1024
712
+ ],
713
+ "dtype": "bfloat16",
714
+ "format": "raw",
715
+ "nbytes": 2048,
716
+ "byteOffset": 18874368
717
+ },
718
+ {
719
+ "name": "model.layers.15.self_attn.k_norm.weight",
720
+ "shape": [
721
+ 128
722
+ ],
723
+ "dtype": "bfloat16",
724
+ "format": "raw",
725
+ "nbytes": 256,
726
+ "byteOffset": 18876416
727
+ },
728
+ {
729
+ "name": "model.layers.15.self_attn.c_attn.weight",
730
+ "shape": [
731
+ 4096,
732
+ 1024
733
+ ],
734
+ "dtype": "bfloat16",
735
+ "format": "raw",
736
+ "nbytes": 8388608,
737
+ "byteOffset": 18876672
738
+ },
739
+ {
740
+ "name": "model.layers.15.self_attn.o_proj.weight",
741
+ "shape": [
742
+ 1024,
743
+ 2048
744
+ ],
745
+ "dtype": "bfloat16",
746
+ "format": "raw",
747
+ "nbytes": 4194304,
748
+ "byteOffset": 27265280
749
+ },
750
+ {
751
+ "name": "model.layers.15.self_attn.q_norm.weight",
752
+ "shape": [
753
+ 128
754
+ ],
755
+ "dtype": "bfloat16",
756
+ "format": "raw",
757
+ "nbytes": 256,
758
+ "byteOffset": 31459584
759
+ },
760
+ {
761
+ "name": "model.layers.16.input_layernorm.weight",
762
+ "shape": [
763
+ 1024
764
+ ],
765
+ "dtype": "bfloat16",
766
+ "format": "raw",
767
+ "nbytes": 2048,
768
+ "byteOffset": 31459840
769
+ }
770
+ ],
771
+ "md5sum": "4782b81576a4555a1f25d98dbcc3397e"
772
+ },
773
+ {
774
+ "dataPath": "params_shard_9.bin",
775
+ "format": "raw-shard",
776
+ "nbytes": 31461888,
777
+ "records": [
778
+ {
779
+ "name": "model.layers.16.mlp.down_proj.weight",
780
+ "shape": [
781
+ 1024,
782
+ 3072
783
+ ],
784
+ "dtype": "bfloat16",
785
+ "format": "raw",
786
+ "nbytes": 6291456,
787
+ "byteOffset": 0
788
+ },
789
+ {
790
+ "name": "model.layers.16.mlp.gate_up_proj.weight",
791
+ "shape": [
792
+ 6144,
793
+ 1024
794
+ ],
795
+ "dtype": "bfloat16",
796
+ "format": "raw",
797
+ "nbytes": 12582912,
798
+ "byteOffset": 6291456
799
+ },
800
+ {
801
+ "name": "model.layers.16.post_attention_layernorm.weight",
802
+ "shape": [
803
+ 1024
804
+ ],
805
+ "dtype": "bfloat16",
806
+ "format": "raw",
807
+ "nbytes": 2048,
808
+ "byteOffset": 18874368
809
+ },
810
+ {
811
+ "name": "model.layers.16.self_attn.k_norm.weight",
812
+ "shape": [
813
+ 128
814
+ ],
815
+ "dtype": "bfloat16",
816
+ "format": "raw",
817
+ "nbytes": 256,
818
+ "byteOffset": 18876416
819
+ },
820
+ {
821
+ "name": "model.layers.16.self_attn.c_attn.weight",
822
+ "shape": [
823
+ 4096,
824
+ 1024
825
+ ],
826
+ "dtype": "bfloat16",
827
+ "format": "raw",
828
+ "nbytes": 8388608,
829
+ "byteOffset": 18876672
830
+ },
831
+ {
832
+ "name": "model.layers.16.self_attn.o_proj.weight",
833
+ "shape": [
834
+ 1024,
835
+ 2048
836
+ ],
837
+ "dtype": "bfloat16",
838
+ "format": "raw",
839
+ "nbytes": 4194304,
840
+ "byteOffset": 27265280
841
+ },
842
+ {
843
+ "name": "model.layers.16.self_attn.q_norm.weight",
844
+ "shape": [
845
+ 128
846
+ ],
847
+ "dtype": "bfloat16",
848
+ "format": "raw",
849
+ "nbytes": 256,
850
+ "byteOffset": 31459584
851
+ },
852
+ {
853
+ "name": "model.layers.17.input_layernorm.weight",
854
+ "shape": [
855
+ 1024
856
+ ],
857
+ "dtype": "bfloat16",
858
+ "format": "raw",
859
+ "nbytes": 2048,
860
+ "byteOffset": 31459840
861
+ }
862
+ ],
863
+ "md5sum": "50a506dc9852c1c7be39db6bbbb71bc4"
864
+ },
865
+ {
866
+ "dataPath": "params_shard_10.bin",
867
+ "format": "raw-shard",
868
+ "nbytes": 31461888,
869
+ "records": [
870
+ {
871
+ "name": "model.layers.17.mlp.down_proj.weight",
872
+ "shape": [
873
+ 1024,
874
+ 3072
875
+ ],
876
+ "dtype": "bfloat16",
877
+ "format": "raw",
878
+ "nbytes": 6291456,
879
+ "byteOffset": 0
880
+ },
881
+ {
882
+ "name": "model.layers.17.mlp.gate_up_proj.weight",
883
+ "shape": [
884
+ 6144,
885
+ 1024
886
+ ],
887
+ "dtype": "bfloat16",
888
+ "format": "raw",
889
+ "nbytes": 12582912,
890
+ "byteOffset": 6291456
891
+ },
892
+ {
893
+ "name": "model.layers.17.post_attention_layernorm.weight",
894
+ "shape": [
895
+ 1024
896
+ ],
897
+ "dtype": "bfloat16",
898
+ "format": "raw",
899
+ "nbytes": 2048,
900
+ "byteOffset": 18874368
901
+ },
902
+ {
903
+ "name": "model.layers.17.self_attn.k_norm.weight",
904
+ "shape": [
905
+ 128
906
+ ],
907
+ "dtype": "bfloat16",
908
+ "format": "raw",
909
+ "nbytes": 256,
910
+ "byteOffset": 18876416
911
+ },
912
+ {
913
+ "name": "model.layers.17.self_attn.c_attn.weight",
914
+ "shape": [
915
+ 4096,
916
+ 1024
917
+ ],
918
+ "dtype": "bfloat16",
919
+ "format": "raw",
920
+ "nbytes": 8388608,
921
+ "byteOffset": 18876672
922
+ },
923
+ {
924
+ "name": "model.layers.17.self_attn.o_proj.weight",
925
+ "shape": [
926
+ 1024,
927
+ 2048
928
+ ],
929
+ "dtype": "bfloat16",
930
+ "format": "raw",
931
+ "nbytes": 4194304,
932
+ "byteOffset": 27265280
933
+ },
934
+ {
935
+ "name": "model.layers.17.self_attn.q_norm.weight",
936
+ "shape": [
937
+ 128
938
+ ],
939
+ "dtype": "bfloat16",
940
+ "format": "raw",
941
+ "nbytes": 256,
942
+ "byteOffset": 31459584
943
+ },
944
+ {
945
+ "name": "model.layers.18.input_layernorm.weight",
946
+ "shape": [
947
+ 1024
948
+ ],
949
+ "dtype": "bfloat16",
950
+ "format": "raw",
951
+ "nbytes": 2048,
952
+ "byteOffset": 31459840
953
+ }
954
+ ],
955
+ "md5sum": "28226cd94e7b0b9cb620911de11ec28c"
956
+ },
957
+ {
958
+ "dataPath": "params_shard_11.bin",
959
+ "format": "raw-shard",
960
+ "nbytes": 31461888,
961
+ "records": [
962
+ {
963
+ "name": "model.layers.18.mlp.down_proj.weight",
964
+ "shape": [
965
+ 1024,
966
+ 3072
967
+ ],
968
+ "dtype": "bfloat16",
969
+ "format": "raw",
970
+ "nbytes": 6291456,
971
+ "byteOffset": 0
972
+ },
973
+ {
974
+ "name": "model.layers.18.mlp.gate_up_proj.weight",
975
+ "shape": [
976
+ 6144,
977
+ 1024
978
+ ],
979
+ "dtype": "bfloat16",
980
+ "format": "raw",
981
+ "nbytes": 12582912,
982
+ "byteOffset": 6291456
983
+ },
984
+ {
985
+ "name": "model.layers.18.post_attention_layernorm.weight",
986
+ "shape": [
987
+ 1024
988
+ ],
989
+ "dtype": "bfloat16",
990
+ "format": "raw",
991
+ "nbytes": 2048,
992
+ "byteOffset": 18874368
993
+ },
994
+ {
995
+ "name": "model.layers.18.self_attn.k_norm.weight",
996
+ "shape": [
997
+ 128
998
+ ],
999
+ "dtype": "bfloat16",
1000
+ "format": "raw",
1001
+ "nbytes": 256,
1002
+ "byteOffset": 18876416
1003
+ },
1004
+ {
1005
+ "name": "model.layers.18.self_attn.c_attn.weight",
1006
+ "shape": [
1007
+ 4096,
1008
+ 1024
1009
+ ],
1010
+ "dtype": "bfloat16",
1011
+ "format": "raw",
1012
+ "nbytes": 8388608,
1013
+ "byteOffset": 18876672
1014
+ },
1015
+ {
1016
+ "name": "model.layers.18.self_attn.o_proj.weight",
1017
+ "shape": [
1018
+ 1024,
1019
+ 2048
1020
+ ],
1021
+ "dtype": "bfloat16",
1022
+ "format": "raw",
1023
+ "nbytes": 4194304,
1024
+ "byteOffset": 27265280
1025
+ },
1026
+ {
1027
+ "name": "model.layers.18.self_attn.q_norm.weight",
1028
+ "shape": [
1029
+ 128
1030
+ ],
1031
+ "dtype": "bfloat16",
1032
+ "format": "raw",
1033
+ "nbytes": 256,
1034
+ "byteOffset": 31459584
1035
+ },
1036
+ {
1037
+ "name": "model.layers.19.input_layernorm.weight",
1038
+ "shape": [
1039
+ 1024
1040
+ ],
1041
+ "dtype": "bfloat16",
1042
+ "format": "raw",
1043
+ "nbytes": 2048,
1044
+ "byteOffset": 31459840
1045
+ }
1046
+ ],
1047
+ "md5sum": "961f8e1a308774dc0fd86312259d7c82"
1048
+ },
1049
+ {
1050
+ "dataPath": "params_shard_12.bin",
1051
+ "format": "raw-shard",
1052
+ "nbytes": 31461888,
1053
+ "records": [
1054
+ {
1055
+ "name": "model.layers.19.mlp.down_proj.weight",
1056
+ "shape": [
1057
+ 1024,
1058
+ 3072
1059
+ ],
1060
+ "dtype": "bfloat16",
1061
+ "format": "raw",
1062
+ "nbytes": 6291456,
1063
+ "byteOffset": 0
1064
+ },
1065
+ {
1066
+ "name": "model.layers.19.mlp.gate_up_proj.weight",
1067
+ "shape": [
1068
+ 6144,
1069
+ 1024
1070
+ ],
1071
+ "dtype": "bfloat16",
1072
+ "format": "raw",
1073
+ "nbytes": 12582912,
1074
+ "byteOffset": 6291456
1075
+ },
1076
+ {
1077
+ "name": "model.layers.19.post_attention_layernorm.weight",
1078
+ "shape": [
1079
+ 1024
1080
+ ],
1081
+ "dtype": "bfloat16",
1082
+ "format": "raw",
1083
+ "nbytes": 2048,
1084
+ "byteOffset": 18874368
1085
+ },
1086
+ {
1087
+ "name": "model.layers.19.self_attn.k_norm.weight",
1088
+ "shape": [
1089
+ 128
1090
+ ],
1091
+ "dtype": "bfloat16",
1092
+ "format": "raw",
1093
+ "nbytes": 256,
1094
+ "byteOffset": 18876416
1095
+ },
1096
+ {
1097
+ "name": "model.layers.19.self_attn.c_attn.weight",
1098
+ "shape": [
1099
+ 4096,
1100
+ 1024
1101
+ ],
1102
+ "dtype": "bfloat16",
1103
+ "format": "raw",
1104
+ "nbytes": 8388608,
1105
+ "byteOffset": 18876672
1106
+ },
1107
+ {
1108
+ "name": "model.layers.19.self_attn.o_proj.weight",
1109
+ "shape": [
1110
+ 1024,
1111
+ 2048
1112
+ ],
1113
+ "dtype": "bfloat16",
1114
+ "format": "raw",
1115
+ "nbytes": 4194304,
1116
+ "byteOffset": 27265280
1117
+ },
1118
+ {
1119
+ "name": "model.layers.19.self_attn.q_norm.weight",
1120
+ "shape": [
1121
+ 128
1122
+ ],
1123
+ "dtype": "bfloat16",
1124
+ "format": "raw",
1125
+ "nbytes": 256,
1126
+ "byteOffset": 31459584
1127
+ },
1128
+ {
1129
+ "name": "model.layers.2.input_layernorm.weight",
1130
+ "shape": [
1131
+ 1024
1132
+ ],
1133
+ "dtype": "bfloat16",
1134
+ "format": "raw",
1135
+ "nbytes": 2048,
1136
+ "byteOffset": 31459840
1137
+ }
1138
+ ],
1139
+ "md5sum": "f17374b039b0e7a53080d4e8c61f1f16"
1140
+ },
1141
+ {
1142
+ "dataPath": "params_shard_13.bin",
1143
+ "format": "raw-shard",
1144
+ "nbytes": 31461888,
1145
+ "records": [
1146
+ {
1147
+ "name": "model.layers.2.mlp.down_proj.weight",
1148
+ "shape": [
1149
+ 1024,
1150
+ 3072
1151
+ ],
1152
+ "dtype": "bfloat16",
1153
+ "format": "raw",
1154
+ "nbytes": 6291456,
1155
+ "byteOffset": 0
1156
+ },
1157
+ {
1158
+ "name": "model.layers.2.mlp.gate_up_proj.weight",
1159
+ "shape": [
1160
+ 6144,
1161
+ 1024
1162
+ ],
1163
+ "dtype": "bfloat16",
1164
+ "format": "raw",
1165
+ "nbytes": 12582912,
1166
+ "byteOffset": 6291456
1167
+ },
1168
+ {
1169
+ "name": "model.layers.2.post_attention_layernorm.weight",
1170
+ "shape": [
1171
+ 1024
1172
+ ],
1173
+ "dtype": "bfloat16",
1174
+ "format": "raw",
1175
+ "nbytes": 2048,
1176
+ "byteOffset": 18874368
1177
+ },
1178
+ {
1179
+ "name": "model.layers.2.self_attn.k_norm.weight",
1180
+ "shape": [
1181
+ 128
1182
+ ],
1183
+ "dtype": "bfloat16",
1184
+ "format": "raw",
1185
+ "nbytes": 256,
1186
+ "byteOffset": 18876416
1187
+ },
1188
+ {
1189
+ "name": "model.layers.2.self_attn.c_attn.weight",
1190
+ "shape": [
1191
+ 4096,
1192
+ 1024
1193
+ ],
1194
+ "dtype": "bfloat16",
1195
+ "format": "raw",
1196
+ "nbytes": 8388608,
1197
+ "byteOffset": 18876672
1198
+ },
1199
+ {
1200
+ "name": "model.layers.2.self_attn.o_proj.weight",
1201
+ "shape": [
1202
+ 1024,
1203
+ 2048
1204
+ ],
1205
+ "dtype": "bfloat16",
1206
+ "format": "raw",
1207
+ "nbytes": 4194304,
1208
+ "byteOffset": 27265280
1209
+ },
1210
+ {
1211
+ "name": "model.layers.2.self_attn.q_norm.weight",
1212
+ "shape": [
1213
+ 128
1214
+ ],
1215
+ "dtype": "bfloat16",
1216
+ "format": "raw",
1217
+ "nbytes": 256,
1218
+ "byteOffset": 31459584
1219
+ },
1220
+ {
1221
+ "name": "model.layers.20.input_layernorm.weight",
1222
+ "shape": [
1223
+ 1024
1224
+ ],
1225
+ "dtype": "bfloat16",
1226
+ "format": "raw",
1227
+ "nbytes": 2048,
1228
+ "byteOffset": 31459840
1229
+ }
1230
+ ],
1231
+ "md5sum": "7951ac8ebc3690aa9df718dfb599eb7b"
1232
+ },
1233
+ {
1234
+ "dataPath": "params_shard_14.bin",
1235
+ "format": "raw-shard",
1236
+ "nbytes": 31461888,
1237
+ "records": [
1238
+ {
1239
+ "name": "model.layers.20.mlp.down_proj.weight",
1240
+ "shape": [
1241
+ 1024,
1242
+ 3072
1243
+ ],
1244
+ "dtype": "bfloat16",
1245
+ "format": "raw",
1246
+ "nbytes": 6291456,
1247
+ "byteOffset": 0
1248
+ },
1249
+ {
1250
+ "name": "model.layers.20.mlp.gate_up_proj.weight",
1251
+ "shape": [
1252
+ 6144,
1253
+ 1024
1254
+ ],
1255
+ "dtype": "bfloat16",
1256
+ "format": "raw",
1257
+ "nbytes": 12582912,
1258
+ "byteOffset": 6291456
1259
+ },
1260
+ {
1261
+ "name": "model.layers.20.post_attention_layernorm.weight",
1262
+ "shape": [
1263
+ 1024
1264
+ ],
1265
+ "dtype": "bfloat16",
1266
+ "format": "raw",
1267
+ "nbytes": 2048,
1268
+ "byteOffset": 18874368
1269
+ },
1270
+ {
1271
+ "name": "model.layers.20.self_attn.k_norm.weight",
1272
+ "shape": [
1273
+ 128
1274
+ ],
1275
+ "dtype": "bfloat16",
1276
+ "format": "raw",
1277
+ "nbytes": 256,
1278
+ "byteOffset": 18876416
1279
+ },
1280
+ {
1281
+ "name": "model.layers.20.self_attn.c_attn.weight",
1282
+ "shape": [
1283
+ 4096,
1284
+ 1024
1285
+ ],
1286
+ "dtype": "bfloat16",
1287
+ "format": "raw",
1288
+ "nbytes": 8388608,
1289
+ "byteOffset": 18876672
1290
+ },
1291
+ {
1292
+ "name": "model.layers.20.self_attn.o_proj.weight",
1293
+ "shape": [
1294
+ 1024,
1295
+ 2048
1296
+ ],
1297
+ "dtype": "bfloat16",
1298
+ "format": "raw",
1299
+ "nbytes": 4194304,
1300
+ "byteOffset": 27265280
1301
+ },
1302
+ {
1303
+ "name": "model.layers.20.self_attn.q_norm.weight",
1304
+ "shape": [
1305
+ 128
1306
+ ],
1307
+ "dtype": "bfloat16",
1308
+ "format": "raw",
1309
+ "nbytes": 256,
1310
+ "byteOffset": 31459584
1311
+ },
1312
+ {
1313
+ "name": "model.layers.21.input_layernorm.weight",
1314
+ "shape": [
1315
+ 1024
1316
+ ],
1317
+ "dtype": "bfloat16",
1318
+ "format": "raw",
1319
+ "nbytes": 2048,
1320
+ "byteOffset": 31459840
1321
+ }
1322
+ ],
1323
+ "md5sum": "a17e66b0197f5f796edc26b7c0b60a79"
1324
+ },
1325
+ {
1326
+ "dataPath": "params_shard_15.bin",
1327
+ "format": "raw-shard",
1328
+ "nbytes": 31461888,
1329
+ "records": [
1330
+ {
1331
+ "name": "model.layers.21.mlp.down_proj.weight",
1332
+ "shape": [
1333
+ 1024,
1334
+ 3072
1335
+ ],
1336
+ "dtype": "bfloat16",
1337
+ "format": "raw",
1338
+ "nbytes": 6291456,
1339
+ "byteOffset": 0
1340
+ },
1341
+ {
1342
+ "name": "model.layers.21.mlp.gate_up_proj.weight",
1343
+ "shape": [
1344
+ 6144,
1345
+ 1024
1346
+ ],
1347
+ "dtype": "bfloat16",
1348
+ "format": "raw",
1349
+ "nbytes": 12582912,
1350
+ "byteOffset": 6291456
1351
+ },
1352
+ {
1353
+ "name": "model.layers.21.post_attention_layernorm.weight",
1354
+ "shape": [
1355
+ 1024
1356
+ ],
1357
+ "dtype": "bfloat16",
1358
+ "format": "raw",
1359
+ "nbytes": 2048,
1360
+ "byteOffset": 18874368
1361
+ },
1362
+ {
1363
+ "name": "model.layers.21.self_attn.k_norm.weight",
1364
+ "shape": [
1365
+ 128
1366
+ ],
1367
+ "dtype": "bfloat16",
1368
+ "format": "raw",
1369
+ "nbytes": 256,
1370
+ "byteOffset": 18876416
1371
+ },
1372
+ {
1373
+ "name": "model.layers.21.self_attn.c_attn.weight",
1374
+ "shape": [
1375
+ 4096,
1376
+ 1024
1377
+ ],
1378
+ "dtype": "bfloat16",
1379
+ "format": "raw",
1380
+ "nbytes": 8388608,
1381
+ "byteOffset": 18876672
1382
+ },
1383
+ {
1384
+ "name": "model.layers.21.self_attn.o_proj.weight",
1385
+ "shape": [
1386
+ 1024,
1387
+ 2048
1388
+ ],
1389
+ "dtype": "bfloat16",
1390
+ "format": "raw",
1391
+ "nbytes": 4194304,
1392
+ "byteOffset": 27265280
1393
+ },
1394
+ {
1395
+ "name": "model.layers.21.self_attn.q_norm.weight",
1396
+ "shape": [
1397
+ 128
1398
+ ],
1399
+ "dtype": "bfloat16",
1400
+ "format": "raw",
1401
+ "nbytes": 256,
1402
+ "byteOffset": 31459584
1403
+ },
1404
+ {
1405
+ "name": "model.layers.22.input_layernorm.weight",
1406
+ "shape": [
1407
+ 1024
1408
+ ],
1409
+ "dtype": "bfloat16",
1410
+ "format": "raw",
1411
+ "nbytes": 2048,
1412
+ "byteOffset": 31459840
1413
+ }
1414
+ ],
1415
+ "md5sum": "29dc75f0a0de71a20a19dc04d6ec2fe8"
1416
+ },
1417
+ {
1418
+ "dataPath": "params_shard_16.bin",
1419
+ "format": "raw-shard",
1420
+ "nbytes": 31461888,
1421
+ "records": [
1422
+ {
1423
+ "name": "model.layers.22.mlp.down_proj.weight",
1424
+ "shape": [
1425
+ 1024,
1426
+ 3072
1427
+ ],
1428
+ "dtype": "bfloat16",
1429
+ "format": "raw",
1430
+ "nbytes": 6291456,
1431
+ "byteOffset": 0
1432
+ },
1433
+ {
1434
+ "name": "model.layers.22.mlp.gate_up_proj.weight",
1435
+ "shape": [
1436
+ 6144,
1437
+ 1024
1438
+ ],
1439
+ "dtype": "bfloat16",
1440
+ "format": "raw",
1441
+ "nbytes": 12582912,
1442
+ "byteOffset": 6291456
1443
+ },
1444
+ {
1445
+ "name": "model.layers.22.post_attention_layernorm.weight",
1446
+ "shape": [
1447
+ 1024
1448
+ ],
1449
+ "dtype": "bfloat16",
1450
+ "format": "raw",
1451
+ "nbytes": 2048,
1452
+ "byteOffset": 18874368
1453
+ },
1454
+ {
1455
+ "name": "model.layers.22.self_attn.k_norm.weight",
1456
+ "shape": [
1457
+ 128
1458
+ ],
1459
+ "dtype": "bfloat16",
1460
+ "format": "raw",
1461
+ "nbytes": 256,
1462
+ "byteOffset": 18876416
1463
+ },
1464
+ {
1465
+ "name": "model.layers.22.self_attn.c_attn.weight",
1466
+ "shape": [
1467
+ 4096,
1468
+ 1024
1469
+ ],
1470
+ "dtype": "bfloat16",
1471
+ "format": "raw",
1472
+ "nbytes": 8388608,
1473
+ "byteOffset": 18876672
1474
+ },
1475
+ {
1476
+ "name": "model.layers.22.self_attn.o_proj.weight",
1477
+ "shape": [
1478
+ 1024,
1479
+ 2048
1480
+ ],
1481
+ "dtype": "bfloat16",
1482
+ "format": "raw",
1483
+ "nbytes": 4194304,
1484
+ "byteOffset": 27265280
1485
+ },
1486
+ {
1487
+ "name": "model.layers.22.self_attn.q_norm.weight",
1488
+ "shape": [
1489
+ 128
1490
+ ],
1491
+ "dtype": "bfloat16",
1492
+ "format": "raw",
1493
+ "nbytes": 256,
1494
+ "byteOffset": 31459584
1495
+ },
1496
+ {
1497
+ "name": "model.layers.23.input_layernorm.weight",
1498
+ "shape": [
1499
+ 1024
1500
+ ],
1501
+ "dtype": "bfloat16",
1502
+ "format": "raw",
1503
+ "nbytes": 2048,
1504
+ "byteOffset": 31459840
1505
+ }
1506
+ ],
1507
+ "md5sum": "77370b671251a8b0f10de8721da680eb"
1508
+ },
1509
+ {
1510
+ "dataPath": "params_shard_17.bin",
1511
+ "format": "raw-shard",
1512
+ "nbytes": 31461888,
1513
+ "records": [
1514
+ {
1515
+ "name": "model.layers.23.mlp.down_proj.weight",
1516
+ "shape": [
1517
+ 1024,
1518
+ 3072
1519
+ ],
1520
+ "dtype": "bfloat16",
1521
+ "format": "raw",
1522
+ "nbytes": 6291456,
1523
+ "byteOffset": 0
1524
+ },
1525
+ {
1526
+ "name": "model.layers.23.mlp.gate_up_proj.weight",
1527
+ "shape": [
1528
+ 6144,
1529
+ 1024
1530
+ ],
1531
+ "dtype": "bfloat16",
1532
+ "format": "raw",
1533
+ "nbytes": 12582912,
1534
+ "byteOffset": 6291456
1535
+ },
1536
+ {
1537
+ "name": "model.layers.23.post_attention_layernorm.weight",
1538
+ "shape": [
1539
+ 1024
1540
+ ],
1541
+ "dtype": "bfloat16",
1542
+ "format": "raw",
1543
+ "nbytes": 2048,
1544
+ "byteOffset": 18874368
1545
+ },
1546
+ {
1547
+ "name": "model.layers.23.self_attn.k_norm.weight",
1548
+ "shape": [
1549
+ 128
1550
+ ],
1551
+ "dtype": "bfloat16",
1552
+ "format": "raw",
1553
+ "nbytes": 256,
1554
+ "byteOffset": 18876416
1555
+ },
1556
+ {
1557
+ "name": "model.layers.23.self_attn.c_attn.weight",
1558
+ "shape": [
1559
+ 4096,
1560
+ 1024
1561
+ ],
1562
+ "dtype": "bfloat16",
1563
+ "format": "raw",
1564
+ "nbytes": 8388608,
1565
+ "byteOffset": 18876672
1566
+ },
1567
+ {
1568
+ "name": "model.layers.23.self_attn.o_proj.weight",
1569
+ "shape": [
1570
+ 1024,
1571
+ 2048
1572
+ ],
1573
+ "dtype": "bfloat16",
1574
+ "format": "raw",
1575
+ "nbytes": 4194304,
1576
+ "byteOffset": 27265280
1577
+ },
1578
+ {
1579
+ "name": "model.layers.23.self_attn.q_norm.weight",
1580
+ "shape": [
1581
+ 128
1582
+ ],
1583
+ "dtype": "bfloat16",
1584
+ "format": "raw",
1585
+ "nbytes": 256,
1586
+ "byteOffset": 31459584
1587
+ },
1588
+ {
1589
+ "name": "model.layers.24.input_layernorm.weight",
1590
+ "shape": [
1591
+ 1024
1592
+ ],
1593
+ "dtype": "bfloat16",
1594
+ "format": "raw",
1595
+ "nbytes": 2048,
1596
+ "byteOffset": 31459840
1597
+ }
1598
+ ],
1599
+ "md5sum": "e1b3dd47fb933da1144e2d4ee03a6447"
1600
+ },
1601
+ {
1602
+ "dataPath": "params_shard_18.bin",
1603
+ "format": "raw-shard",
1604
+ "nbytes": 31461888,
1605
+ "records": [
1606
+ {
1607
+ "name": "model.layers.24.mlp.down_proj.weight",
1608
+ "shape": [
1609
+ 1024,
1610
+ 3072
1611
+ ],
1612
+ "dtype": "bfloat16",
1613
+ "format": "raw",
1614
+ "nbytes": 6291456,
1615
+ "byteOffset": 0
1616
+ },
1617
+ {
1618
+ "name": "model.layers.24.mlp.gate_up_proj.weight",
1619
+ "shape": [
1620
+ 6144,
1621
+ 1024
1622
+ ],
1623
+ "dtype": "bfloat16",
1624
+ "format": "raw",
1625
+ "nbytes": 12582912,
1626
+ "byteOffset": 6291456
1627
+ },
1628
+ {
1629
+ "name": "model.layers.24.post_attention_layernorm.weight",
1630
+ "shape": [
1631
+ 1024
1632
+ ],
1633
+ "dtype": "bfloat16",
1634
+ "format": "raw",
1635
+ "nbytes": 2048,
1636
+ "byteOffset": 18874368
1637
+ },
1638
+ {
1639
+ "name": "model.layers.24.self_attn.k_norm.weight",
1640
+ "shape": [
1641
+ 128
1642
+ ],
1643
+ "dtype": "bfloat16",
1644
+ "format": "raw",
1645
+ "nbytes": 256,
1646
+ "byteOffset": 18876416
1647
+ },
1648
+ {
1649
+ "name": "model.layers.24.self_attn.c_attn.weight",
1650
+ "shape": [
1651
+ 4096,
1652
+ 1024
1653
+ ],
1654
+ "dtype": "bfloat16",
1655
+ "format": "raw",
1656
+ "nbytes": 8388608,
1657
+ "byteOffset": 18876672
1658
+ },
1659
+ {
1660
+ "name": "model.layers.24.self_attn.o_proj.weight",
1661
+ "shape": [
1662
+ 1024,
1663
+ 2048
1664
+ ],
1665
+ "dtype": "bfloat16",
1666
+ "format": "raw",
1667
+ "nbytes": 4194304,
1668
+ "byteOffset": 27265280
1669
+ },
1670
+ {
1671
+ "name": "model.layers.24.self_attn.q_norm.weight",
1672
+ "shape": [
1673
+ 128
1674
+ ],
1675
+ "dtype": "bfloat16",
1676
+ "format": "raw",
1677
+ "nbytes": 256,
1678
+ "byteOffset": 31459584
1679
+ },
1680
+ {
1681
+ "name": "model.layers.25.input_layernorm.weight",
1682
+ "shape": [
1683
+ 1024
1684
+ ],
1685
+ "dtype": "bfloat16",
1686
+ "format": "raw",
1687
+ "nbytes": 2048,
1688
+ "byteOffset": 31459840
1689
+ }
1690
+ ],
1691
+ "md5sum": "1d9f848a8eba2d142c9aa2d536969ecf"
1692
+ },
1693
+ {
1694
+ "dataPath": "params_shard_19.bin",
1695
+ "format": "raw-shard",
1696
+ "nbytes": 31461888,
1697
+ "records": [
1698
+ {
1699
+ "name": "model.layers.25.mlp.down_proj.weight",
1700
+ "shape": [
1701
+ 1024,
1702
+ 3072
1703
+ ],
1704
+ "dtype": "bfloat16",
1705
+ "format": "raw",
1706
+ "nbytes": 6291456,
1707
+ "byteOffset": 0
1708
+ },
1709
+ {
1710
+ "name": "model.layers.25.mlp.gate_up_proj.weight",
1711
+ "shape": [
1712
+ 6144,
1713
+ 1024
1714
+ ],
1715
+ "dtype": "bfloat16",
1716
+ "format": "raw",
1717
+ "nbytes": 12582912,
1718
+ "byteOffset": 6291456
1719
+ },
1720
+ {
1721
+ "name": "model.layers.25.post_attention_layernorm.weight",
1722
+ "shape": [
1723
+ 1024
1724
+ ],
1725
+ "dtype": "bfloat16",
1726
+ "format": "raw",
1727
+ "nbytes": 2048,
1728
+ "byteOffset": 18874368
1729
+ },
1730
+ {
1731
+ "name": "model.layers.25.self_attn.k_norm.weight",
1732
+ "shape": [
1733
+ 128
1734
+ ],
1735
+ "dtype": "bfloat16",
1736
+ "format": "raw",
1737
+ "nbytes": 256,
1738
+ "byteOffset": 18876416
1739
+ },
1740
+ {
1741
+ "name": "model.layers.25.self_attn.c_attn.weight",
1742
+ "shape": [
1743
+ 4096,
1744
+ 1024
1745
+ ],
1746
+ "dtype": "bfloat16",
1747
+ "format": "raw",
1748
+ "nbytes": 8388608,
1749
+ "byteOffset": 18876672
1750
+ },
1751
+ {
1752
+ "name": "model.layers.25.self_attn.o_proj.weight",
1753
+ "shape": [
1754
+ 1024,
1755
+ 2048
1756
+ ],
1757
+ "dtype": "bfloat16",
1758
+ "format": "raw",
1759
+ "nbytes": 4194304,
1760
+ "byteOffset": 27265280
1761
+ },
1762
+ {
1763
+ "name": "model.layers.25.self_attn.q_norm.weight",
1764
+ "shape": [
1765
+ 128
1766
+ ],
1767
+ "dtype": "bfloat16",
1768
+ "format": "raw",
1769
+ "nbytes": 256,
1770
+ "byteOffset": 31459584
1771
+ },
1772
+ {
1773
+ "name": "model.layers.26.input_layernorm.weight",
1774
+ "shape": [
1775
+ 1024
1776
+ ],
1777
+ "dtype": "bfloat16",
1778
+ "format": "raw",
1779
+ "nbytes": 2048,
1780
+ "byteOffset": 31459840
1781
+ }
1782
+ ],
1783
+ "md5sum": "e6125aaacc98c10d3e7702aa6bdb39d8"
1784
+ },
1785
+ {
1786
+ "dataPath": "params_shard_20.bin",
1787
+ "format": "raw-shard",
1788
+ "nbytes": 31461888,
1789
+ "records": [
1790
+ {
1791
+ "name": "model.layers.26.mlp.down_proj.weight",
1792
+ "shape": [
1793
+ 1024,
1794
+ 3072
1795
+ ],
1796
+ "dtype": "bfloat16",
1797
+ "format": "raw",
1798
+ "nbytes": 6291456,
1799
+ "byteOffset": 0
1800
+ },
1801
+ {
1802
+ "name": "model.layers.26.mlp.gate_up_proj.weight",
1803
+ "shape": [
1804
+ 6144,
1805
+ 1024
1806
+ ],
1807
+ "dtype": "bfloat16",
1808
+ "format": "raw",
1809
+ "nbytes": 12582912,
1810
+ "byteOffset": 6291456
1811
+ },
1812
+ {
1813
+ "name": "model.layers.26.post_attention_layernorm.weight",
1814
+ "shape": [
1815
+ 1024
1816
+ ],
1817
+ "dtype": "bfloat16",
1818
+ "format": "raw",
1819
+ "nbytes": 2048,
1820
+ "byteOffset": 18874368
1821
+ },
1822
+ {
1823
+ "name": "model.layers.26.self_attn.k_norm.weight",
1824
+ "shape": [
1825
+ 128
1826
+ ],
1827
+ "dtype": "bfloat16",
1828
+ "format": "raw",
1829
+ "nbytes": 256,
1830
+ "byteOffset": 18876416
1831
+ },
1832
+ {
1833
+ "name": "model.layers.26.self_attn.c_attn.weight",
1834
+ "shape": [
1835
+ 4096,
1836
+ 1024
1837
+ ],
1838
+ "dtype": "bfloat16",
1839
+ "format": "raw",
1840
+ "nbytes": 8388608,
1841
+ "byteOffset": 18876672
1842
+ },
1843
+ {
1844
+ "name": "model.layers.26.self_attn.o_proj.weight",
1845
+ "shape": [
1846
+ 1024,
1847
+ 2048
1848
+ ],
1849
+ "dtype": "bfloat16",
1850
+ "format": "raw",
1851
+ "nbytes": 4194304,
1852
+ "byteOffset": 27265280
1853
+ },
1854
+ {
1855
+ "name": "model.layers.26.self_attn.q_norm.weight",
1856
+ "shape": [
1857
+ 128
1858
+ ],
1859
+ "dtype": "bfloat16",
1860
+ "format": "raw",
1861
+ "nbytes": 256,
1862
+ "byteOffset": 31459584
1863
+ },
1864
+ {
1865
+ "name": "model.layers.27.input_layernorm.weight",
1866
+ "shape": [
1867
+ 1024
1868
+ ],
1869
+ "dtype": "bfloat16",
1870
+ "format": "raw",
1871
+ "nbytes": 2048,
1872
+ "byteOffset": 31459840
1873
+ }
1874
+ ],
1875
+ "md5sum": "9d135333707ef2f4acb7dd7cf9326f25"
1876
+ },
1877
+ {
1878
+ "dataPath": "params_shard_21.bin",
1879
+ "format": "raw-shard",
1880
+ "nbytes": 31461888,
1881
+ "records": [
1882
+ {
1883
+ "name": "model.layers.27.mlp.down_proj.weight",
1884
+ "shape": [
1885
+ 1024,
1886
+ 3072
1887
+ ],
1888
+ "dtype": "bfloat16",
1889
+ "format": "raw",
1890
+ "nbytes": 6291456,
1891
+ "byteOffset": 0
1892
+ },
1893
+ {
1894
+ "name": "model.layers.27.mlp.gate_up_proj.weight",
1895
+ "shape": [
1896
+ 6144,
1897
+ 1024
1898
+ ],
1899
+ "dtype": "bfloat16",
1900
+ "format": "raw",
1901
+ "nbytes": 12582912,
1902
+ "byteOffset": 6291456
1903
+ },
1904
+ {
1905
+ "name": "model.layers.27.post_attention_layernorm.weight",
1906
+ "shape": [
1907
+ 1024
1908
+ ],
1909
+ "dtype": "bfloat16",
1910
+ "format": "raw",
1911
+ "nbytes": 2048,
1912
+ "byteOffset": 18874368
1913
+ },
1914
+ {
1915
+ "name": "model.layers.27.self_attn.k_norm.weight",
1916
+ "shape": [
1917
+ 128
1918
+ ],
1919
+ "dtype": "bfloat16",
1920
+ "format": "raw",
1921
+ "nbytes": 256,
1922
+ "byteOffset": 18876416
1923
+ },
1924
+ {
1925
+ "name": "model.layers.27.self_attn.c_attn.weight",
1926
+ "shape": [
1927
+ 4096,
1928
+ 1024
1929
+ ],
1930
+ "dtype": "bfloat16",
1931
+ "format": "raw",
1932
+ "nbytes": 8388608,
1933
+ "byteOffset": 18876672
1934
+ },
1935
+ {
1936
+ "name": "model.layers.27.self_attn.o_proj.weight",
1937
+ "shape": [
1938
+ 1024,
1939
+ 2048
1940
+ ],
1941
+ "dtype": "bfloat16",
1942
+ "format": "raw",
1943
+ "nbytes": 4194304,
1944
+ "byteOffset": 27265280
1945
+ },
1946
+ {
1947
+ "name": "model.layers.27.self_attn.q_norm.weight",
1948
+ "shape": [
1949
+ 128
1950
+ ],
1951
+ "dtype": "bfloat16",
1952
+ "format": "raw",
1953
+ "nbytes": 256,
1954
+ "byteOffset": 31459584
1955
+ },
1956
+ {
1957
+ "name": "model.layers.3.input_layernorm.weight",
1958
+ "shape": [
1959
+ 1024
1960
+ ],
1961
+ "dtype": "bfloat16",
1962
+ "format": "raw",
1963
+ "nbytes": 2048,
1964
+ "byteOffset": 31459840
1965
+ }
1966
+ ],
1967
+ "md5sum": "e07ec46dc1988b87491a60c6e922c02a"
1968
+ },
1969
+ {
1970
+ "dataPath": "params_shard_22.bin",
1971
+ "format": "raw-shard",
1972
+ "nbytes": 31461888,
1973
+ "records": [
1974
+ {
1975
+ "name": "model.layers.3.mlp.down_proj.weight",
1976
+ "shape": [
1977
+ 1024,
1978
+ 3072
1979
+ ],
1980
+ "dtype": "bfloat16",
1981
+ "format": "raw",
1982
+ "nbytes": 6291456,
1983
+ "byteOffset": 0
1984
+ },
1985
+ {
1986
+ "name": "model.layers.3.mlp.gate_up_proj.weight",
1987
+ "shape": [
1988
+ 6144,
1989
+ 1024
1990
+ ],
1991
+ "dtype": "bfloat16",
1992
+ "format": "raw",
1993
+ "nbytes": 12582912,
1994
+ "byteOffset": 6291456
1995
+ },
1996
+ {
1997
+ "name": "model.layers.3.post_attention_layernorm.weight",
1998
+ "shape": [
1999
+ 1024
2000
+ ],
2001
+ "dtype": "bfloat16",
2002
+ "format": "raw",
2003
+ "nbytes": 2048,
2004
+ "byteOffset": 18874368
2005
+ },
2006
+ {
2007
+ "name": "model.layers.3.self_attn.k_norm.weight",
2008
+ "shape": [
2009
+ 128
2010
+ ],
2011
+ "dtype": "bfloat16",
2012
+ "format": "raw",
2013
+ "nbytes": 256,
2014
+ "byteOffset": 18876416
2015
+ },
2016
+ {
2017
+ "name": "model.layers.3.self_attn.c_attn.weight",
2018
+ "shape": [
2019
+ 4096,
2020
+ 1024
2021
+ ],
2022
+ "dtype": "bfloat16",
2023
+ "format": "raw",
2024
+ "nbytes": 8388608,
2025
+ "byteOffset": 18876672
2026
+ },
2027
+ {
2028
+ "name": "model.layers.3.self_attn.o_proj.weight",
2029
+ "shape": [
2030
+ 1024,
2031
+ 2048
2032
+ ],
2033
+ "dtype": "bfloat16",
2034
+ "format": "raw",
2035
+ "nbytes": 4194304,
2036
+ "byteOffset": 27265280
2037
+ },
2038
+ {
2039
+ "name": "model.layers.3.self_attn.q_norm.weight",
2040
+ "shape": [
2041
+ 128
2042
+ ],
2043
+ "dtype": "bfloat16",
2044
+ "format": "raw",
2045
+ "nbytes": 256,
2046
+ "byteOffset": 31459584
2047
+ },
2048
+ {
2049
+ "name": "model.layers.4.input_layernorm.weight",
2050
+ "shape": [
2051
+ 1024
2052
+ ],
2053
+ "dtype": "bfloat16",
2054
+ "format": "raw",
2055
+ "nbytes": 2048,
2056
+ "byteOffset": 31459840
2057
+ }
2058
+ ],
2059
+ "md5sum": "f0b4915b859259222e9787ee1e7b354d"
2060
+ },
2061
+ {
2062
+ "dataPath": "params_shard_23.bin",
2063
+ "format": "raw-shard",
2064
+ "nbytes": 31461888,
2065
+ "records": [
2066
+ {
2067
+ "name": "model.layers.4.mlp.down_proj.weight",
2068
+ "shape": [
2069
+ 1024,
2070
+ 3072
2071
+ ],
2072
+ "dtype": "bfloat16",
2073
+ "format": "raw",
2074
+ "nbytes": 6291456,
2075
+ "byteOffset": 0
2076
+ },
2077
+ {
2078
+ "name": "model.layers.4.mlp.gate_up_proj.weight",
2079
+ "shape": [
2080
+ 6144,
2081
+ 1024
2082
+ ],
2083
+ "dtype": "bfloat16",
2084
+ "format": "raw",
2085
+ "nbytes": 12582912,
2086
+ "byteOffset": 6291456
2087
+ },
2088
+ {
2089
+ "name": "model.layers.4.post_attention_layernorm.weight",
2090
+ "shape": [
2091
+ 1024
2092
+ ],
2093
+ "dtype": "bfloat16",
2094
+ "format": "raw",
2095
+ "nbytes": 2048,
2096
+ "byteOffset": 18874368
2097
+ },
2098
+ {
2099
+ "name": "model.layers.4.self_attn.k_norm.weight",
2100
+ "shape": [
2101
+ 128
2102
+ ],
2103
+ "dtype": "bfloat16",
2104
+ "format": "raw",
2105
+ "nbytes": 256,
2106
+ "byteOffset": 18876416
2107
+ },
2108
+ {
2109
+ "name": "model.layers.4.self_attn.c_attn.weight",
2110
+ "shape": [
2111
+ 4096,
2112
+ 1024
2113
+ ],
2114
+ "dtype": "bfloat16",
2115
+ "format": "raw",
2116
+ "nbytes": 8388608,
2117
+ "byteOffset": 18876672
2118
+ },
2119
+ {
2120
+ "name": "model.layers.4.self_attn.o_proj.weight",
2121
+ "shape": [
2122
+ 1024,
2123
+ 2048
2124
+ ],
2125
+ "dtype": "bfloat16",
2126
+ "format": "raw",
2127
+ "nbytes": 4194304,
2128
+ "byteOffset": 27265280
2129
+ },
2130
+ {
2131
+ "name": "model.layers.4.self_attn.q_norm.weight",
2132
+ "shape": [
2133
+ 128
2134
+ ],
2135
+ "dtype": "bfloat16",
2136
+ "format": "raw",
2137
+ "nbytes": 256,
2138
+ "byteOffset": 31459584
2139
+ },
2140
+ {
2141
+ "name": "model.layers.5.input_layernorm.weight",
2142
+ "shape": [
2143
+ 1024
2144
+ ],
2145
+ "dtype": "bfloat16",
2146
+ "format": "raw",
2147
+ "nbytes": 2048,
2148
+ "byteOffset": 31459840
2149
+ }
2150
+ ],
2151
+ "md5sum": "890c68f7289867143b40975439f97c02"
2152
+ },
2153
+ {
2154
+ "dataPath": "params_shard_24.bin",
2155
+ "format": "raw-shard",
2156
+ "nbytes": 31461888,
2157
+ "records": [
2158
+ {
2159
+ "name": "model.layers.5.mlp.down_proj.weight",
2160
+ "shape": [
2161
+ 1024,
2162
+ 3072
2163
+ ],
2164
+ "dtype": "bfloat16",
2165
+ "format": "raw",
2166
+ "nbytes": 6291456,
2167
+ "byteOffset": 0
2168
+ },
2169
+ {
2170
+ "name": "model.layers.5.mlp.gate_up_proj.weight",
2171
+ "shape": [
2172
+ 6144,
2173
+ 1024
2174
+ ],
2175
+ "dtype": "bfloat16",
2176
+ "format": "raw",
2177
+ "nbytes": 12582912,
2178
+ "byteOffset": 6291456
2179
+ },
2180
+ {
2181
+ "name": "model.layers.5.post_attention_layernorm.weight",
2182
+ "shape": [
2183
+ 1024
2184
+ ],
2185
+ "dtype": "bfloat16",
2186
+ "format": "raw",
2187
+ "nbytes": 2048,
2188
+ "byteOffset": 18874368
2189
+ },
2190
+ {
2191
+ "name": "model.layers.5.self_attn.k_norm.weight",
2192
+ "shape": [
2193
+ 128
2194
+ ],
2195
+ "dtype": "bfloat16",
2196
+ "format": "raw",
2197
+ "nbytes": 256,
2198
+ "byteOffset": 18876416
2199
+ },
2200
+ {
2201
+ "name": "model.layers.5.self_attn.c_attn.weight",
2202
+ "shape": [
2203
+ 4096,
2204
+ 1024
2205
+ ],
2206
+ "dtype": "bfloat16",
2207
+ "format": "raw",
2208
+ "nbytes": 8388608,
2209
+ "byteOffset": 18876672
2210
+ },
2211
+ {
2212
+ "name": "model.layers.5.self_attn.o_proj.weight",
2213
+ "shape": [
2214
+ 1024,
2215
+ 2048
2216
+ ],
2217
+ "dtype": "bfloat16",
2218
+ "format": "raw",
2219
+ "nbytes": 4194304,
2220
+ "byteOffset": 27265280
2221
+ },
2222
+ {
2223
+ "name": "model.layers.5.self_attn.q_norm.weight",
2224
+ "shape": [
2225
+ 128
2226
+ ],
2227
+ "dtype": "bfloat16",
2228
+ "format": "raw",
2229
+ "nbytes": 256,
2230
+ "byteOffset": 31459584
2231
+ },
2232
+ {
2233
+ "name": "model.layers.6.input_layernorm.weight",
2234
+ "shape": [
2235
+ 1024
2236
+ ],
2237
+ "dtype": "bfloat16",
2238
+ "format": "raw",
2239
+ "nbytes": 2048,
2240
+ "byteOffset": 31459840
2241
+ }
2242
+ ],
2243
+ "md5sum": "4a2e6abf9a0ac27fed499bb02f61047b"
2244
+ },
2245
+ {
2246
+ "dataPath": "params_shard_25.bin",
2247
+ "format": "raw-shard",
2248
+ "nbytes": 31461888,
2249
+ "records": [
2250
+ {
2251
+ "name": "model.layers.6.mlp.down_proj.weight",
2252
+ "shape": [
2253
+ 1024,
2254
+ 3072
2255
+ ],
2256
+ "dtype": "bfloat16",
2257
+ "format": "raw",
2258
+ "nbytes": 6291456,
2259
+ "byteOffset": 0
2260
+ },
2261
+ {
2262
+ "name": "model.layers.6.mlp.gate_up_proj.weight",
2263
+ "shape": [
2264
+ 6144,
2265
+ 1024
2266
+ ],
2267
+ "dtype": "bfloat16",
2268
+ "format": "raw",
2269
+ "nbytes": 12582912,
2270
+ "byteOffset": 6291456
2271
+ },
2272
+ {
2273
+ "name": "model.layers.6.post_attention_layernorm.weight",
2274
+ "shape": [
2275
+ 1024
2276
+ ],
2277
+ "dtype": "bfloat16",
2278
+ "format": "raw",
2279
+ "nbytes": 2048,
2280
+ "byteOffset": 18874368
2281
+ },
2282
+ {
2283
+ "name": "model.layers.6.self_attn.k_norm.weight",
2284
+ "shape": [
2285
+ 128
2286
+ ],
2287
+ "dtype": "bfloat16",
2288
+ "format": "raw",
2289
+ "nbytes": 256,
2290
+ "byteOffset": 18876416
2291
+ },
2292
+ {
2293
+ "name": "model.layers.6.self_attn.c_attn.weight",
2294
+ "shape": [
2295
+ 4096,
2296
+ 1024
2297
+ ],
2298
+ "dtype": "bfloat16",
2299
+ "format": "raw",
2300
+ "nbytes": 8388608,
2301
+ "byteOffset": 18876672
2302
+ },
2303
+ {
2304
+ "name": "model.layers.6.self_attn.o_proj.weight",
2305
+ "shape": [
2306
+ 1024,
2307
+ 2048
2308
+ ],
2309
+ "dtype": "bfloat16",
2310
+ "format": "raw",
2311
+ "nbytes": 4194304,
2312
+ "byteOffset": 27265280
2313
+ },
2314
+ {
2315
+ "name": "model.layers.6.self_attn.q_norm.weight",
2316
+ "shape": [
2317
+ 128
2318
+ ],
2319
+ "dtype": "bfloat16",
2320
+ "format": "raw",
2321
+ "nbytes": 256,
2322
+ "byteOffset": 31459584
2323
+ },
2324
+ {
2325
+ "name": "model.layers.7.input_layernorm.weight",
2326
+ "shape": [
2327
+ 1024
2328
+ ],
2329
+ "dtype": "bfloat16",
2330
+ "format": "raw",
2331
+ "nbytes": 2048,
2332
+ "byteOffset": 31459840
2333
+ }
2334
+ ],
2335
+ "md5sum": "bbf21bdcb33c7b1dfaf7279672693cc2"
2336
+ },
2337
+ {
2338
+ "dataPath": "params_shard_26.bin",
2339
+ "format": "raw-shard",
2340
+ "nbytes": 31461888,
2341
+ "records": [
2342
+ {
2343
+ "name": "model.layers.7.mlp.down_proj.weight",
2344
+ "shape": [
2345
+ 1024,
2346
+ 3072
2347
+ ],
2348
+ "dtype": "bfloat16",
2349
+ "format": "raw",
2350
+ "nbytes": 6291456,
2351
+ "byteOffset": 0
2352
+ },
2353
+ {
2354
+ "name": "model.layers.7.mlp.gate_up_proj.weight",
2355
+ "shape": [
2356
+ 6144,
2357
+ 1024
2358
+ ],
2359
+ "dtype": "bfloat16",
2360
+ "format": "raw",
2361
+ "nbytes": 12582912,
2362
+ "byteOffset": 6291456
2363
+ },
2364
+ {
2365
+ "name": "model.layers.7.post_attention_layernorm.weight",
2366
+ "shape": [
2367
+ 1024
2368
+ ],
2369
+ "dtype": "bfloat16",
2370
+ "format": "raw",
2371
+ "nbytes": 2048,
2372
+ "byteOffset": 18874368
2373
+ },
2374
+ {
2375
+ "name": "model.layers.7.self_attn.k_norm.weight",
2376
+ "shape": [
2377
+ 128
2378
+ ],
2379
+ "dtype": "bfloat16",
2380
+ "format": "raw",
2381
+ "nbytes": 256,
2382
+ "byteOffset": 18876416
2383
+ },
2384
+ {
2385
+ "name": "model.layers.7.self_attn.c_attn.weight",
2386
+ "shape": [
2387
+ 4096,
2388
+ 1024
2389
+ ],
2390
+ "dtype": "bfloat16",
2391
+ "format": "raw",
2392
+ "nbytes": 8388608,
2393
+ "byteOffset": 18876672
2394
+ },
2395
+ {
2396
+ "name": "model.layers.7.self_attn.o_proj.weight",
2397
+ "shape": [
2398
+ 1024,
2399
+ 2048
2400
+ ],
2401
+ "dtype": "bfloat16",
2402
+ "format": "raw",
2403
+ "nbytes": 4194304,
2404
+ "byteOffset": 27265280
2405
+ },
2406
+ {
2407
+ "name": "model.layers.7.self_attn.q_norm.weight",
2408
+ "shape": [
2409
+ 128
2410
+ ],
2411
+ "dtype": "bfloat16",
2412
+ "format": "raw",
2413
+ "nbytes": 256,
2414
+ "byteOffset": 31459584
2415
+ },
2416
+ {
2417
+ "name": "model.layers.8.input_layernorm.weight",
2418
+ "shape": [
2419
+ 1024
2420
+ ],
2421
+ "dtype": "bfloat16",
2422
+ "format": "raw",
2423
+ "nbytes": 2048,
2424
+ "byteOffset": 31459840
2425
+ }
2426
+ ],
2427
+ "md5sum": "d0db88ea1fdab01e768d9ee5a4cc24e1"
2428
+ },
2429
+ {
2430
+ "dataPath": "params_shard_27.bin",
2431
+ "format": "raw-shard",
2432
+ "nbytes": 31461888,
2433
+ "records": [
2434
+ {
2435
+ "name": "model.layers.8.mlp.down_proj.weight",
2436
+ "shape": [
2437
+ 1024,
2438
+ 3072
2439
+ ],
2440
+ "dtype": "bfloat16",
2441
+ "format": "raw",
2442
+ "nbytes": 6291456,
2443
+ "byteOffset": 0
2444
+ },
2445
+ {
2446
+ "name": "model.layers.8.mlp.gate_up_proj.weight",
2447
+ "shape": [
2448
+ 6144,
2449
+ 1024
2450
+ ],
2451
+ "dtype": "bfloat16",
2452
+ "format": "raw",
2453
+ "nbytes": 12582912,
2454
+ "byteOffset": 6291456
2455
+ },
2456
+ {
2457
+ "name": "model.layers.8.post_attention_layernorm.weight",
2458
+ "shape": [
2459
+ 1024
2460
+ ],
2461
+ "dtype": "bfloat16",
2462
+ "format": "raw",
2463
+ "nbytes": 2048,
2464
+ "byteOffset": 18874368
2465
+ },
2466
+ {
2467
+ "name": "model.layers.8.self_attn.k_norm.weight",
2468
+ "shape": [
2469
+ 128
2470
+ ],
2471
+ "dtype": "bfloat16",
2472
+ "format": "raw",
2473
+ "nbytes": 256,
2474
+ "byteOffset": 18876416
2475
+ },
2476
+ {
2477
+ "name": "model.layers.8.self_attn.c_attn.weight",
2478
+ "shape": [
2479
+ 4096,
2480
+ 1024
2481
+ ],
2482
+ "dtype": "bfloat16",
2483
+ "format": "raw",
2484
+ "nbytes": 8388608,
2485
+ "byteOffset": 18876672
2486
+ },
2487
+ {
2488
+ "name": "model.layers.8.self_attn.o_proj.weight",
2489
+ "shape": [
2490
+ 1024,
2491
+ 2048
2492
+ ],
2493
+ "dtype": "bfloat16",
2494
+ "format": "raw",
2495
+ "nbytes": 4194304,
2496
+ "byteOffset": 27265280
2497
+ },
2498
+ {
2499
+ "name": "model.layers.8.self_attn.q_norm.weight",
2500
+ "shape": [
2501
+ 128
2502
+ ],
2503
+ "dtype": "bfloat16",
2504
+ "format": "raw",
2505
+ "nbytes": 256,
2506
+ "byteOffset": 31459584
2507
+ },
2508
+ {
2509
+ "name": "model.layers.9.input_layernorm.weight",
2510
+ "shape": [
2511
+ 1024
2512
+ ],
2513
+ "dtype": "bfloat16",
2514
+ "format": "raw",
2515
+ "nbytes": 2048,
2516
+ "byteOffset": 31459840
2517
+ }
2518
+ ],
2519
+ "md5sum": "3a05a1acf71722e4ad3b000b324a8649"
2520
+ },
2521
+ {
2522
+ "dataPath": "params_shard_28.bin",
2523
+ "format": "raw-shard",
2524
+ "nbytes": 31461888,
2525
+ "records": [
2526
+ {
2527
+ "name": "model.layers.9.mlp.down_proj.weight",
2528
+ "shape": [
2529
+ 1024,
2530
+ 3072
2531
+ ],
2532
+ "dtype": "bfloat16",
2533
+ "format": "raw",
2534
+ "nbytes": 6291456,
2535
+ "byteOffset": 0
2536
+ },
2537
+ {
2538
+ "name": "model.layers.9.mlp.gate_up_proj.weight",
2539
+ "shape": [
2540
+ 6144,
2541
+ 1024
2542
+ ],
2543
+ "dtype": "bfloat16",
2544
+ "format": "raw",
2545
+ "nbytes": 12582912,
2546
+ "byteOffset": 6291456
2547
+ },
2548
+ {
2549
+ "name": "model.layers.9.post_attention_layernorm.weight",
2550
+ "shape": [
2551
+ 1024
2552
+ ],
2553
+ "dtype": "bfloat16",
2554
+ "format": "raw",
2555
+ "nbytes": 2048,
2556
+ "byteOffset": 18874368
2557
+ },
2558
+ {
2559
+ "name": "model.layers.9.self_attn.k_norm.weight",
2560
+ "shape": [
2561
+ 128
2562
+ ],
2563
+ "dtype": "bfloat16",
2564
+ "format": "raw",
2565
+ "nbytes": 256,
2566
+ "byteOffset": 18876416
2567
+ },
2568
+ {
2569
+ "name": "model.layers.9.self_attn.c_attn.weight",
2570
+ "shape": [
2571
+ 4096,
2572
+ 1024
2573
+ ],
2574
+ "dtype": "bfloat16",
2575
+ "format": "raw",
2576
+ "nbytes": 8388608,
2577
+ "byteOffset": 18876672
2578
+ },
2579
+ {
2580
+ "name": "model.layers.9.self_attn.o_proj.weight",
2581
+ "shape": [
2582
+ 1024,
2583
+ 2048
2584
+ ],
2585
+ "dtype": "bfloat16",
2586
+ "format": "raw",
2587
+ "nbytes": 4194304,
2588
+ "byteOffset": 27265280
2589
+ },
2590
+ {
2591
+ "name": "model.layers.9.self_attn.q_norm.weight",
2592
+ "shape": [
2593
+ 128
2594
+ ],
2595
+ "dtype": "bfloat16",
2596
+ "format": "raw",
2597
+ "nbytes": 256,
2598
+ "byteOffset": 31459584
2599
+ },
2600
+ {
2601
+ "name": "model.norm.weight",
2602
+ "shape": [
2603
+ 1024
2604
+ ],
2605
+ "dtype": "bfloat16",
2606
+ "format": "raw",
2607
+ "nbytes": 2048,
2608
+ "byteOffset": 31459840
2609
+ }
2610
+ ],
2611
+ "md5sum": "195d11d45fdbd6e214234ed298ed0684"
2612
+ }
2613
+ ]
2614
+ }
ndarray-cache.json ADDED
@@ -0,0 +1,2614 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "ParamSize": 226,
4
+ "ParamBytes": 2384199680.0,
5
+ "BitsPerParam": 25.37623158078298
6
+ },
7
+ "records": [
8
+ {
9
+ "dataPath": "params_shard_0.bin",
10
+ "format": "raw-shard",
11
+ "nbytes": 311164928,
12
+ "records": [
13
+ {
14
+ "name": "model.embed_tokens.weight",
15
+ "shape": [
16
+ 151936,
17
+ 1024
18
+ ],
19
+ "dtype": "float32",
20
+ "format": "f32-to-bf16",
21
+ "nbytes": 311164928,
22
+ "byteOffset": 0
23
+ }
24
+ ],
25
+ "md5sum": "4f615e4e204fe160829261879c8c4864"
26
+ },
27
+ {
28
+ "dataPath": "params_shard_1.bin",
29
+ "format": "raw-shard",
30
+ "nbytes": 31463936,
31
+ "records": [
32
+ {
33
+ "name": "model.layers.0.input_layernorm.weight",
34
+ "shape": [
35
+ 1024
36
+ ],
37
+ "dtype": "float32",
38
+ "format": "f32-to-bf16",
39
+ "nbytes": 2048,
40
+ "byteOffset": 0
41
+ },
42
+ {
43
+ "name": "model.layers.0.mlp.down_proj.weight",
44
+ "shape": [
45
+ 1024,
46
+ 3072
47
+ ],
48
+ "dtype": "float32",
49
+ "format": "f32-to-bf16",
50
+ "nbytes": 6291456,
51
+ "byteOffset": 2048
52
+ },
53
+ {
54
+ "name": "model.layers.0.mlp.gate_up_proj.weight",
55
+ "shape": [
56
+ 6144,
57
+ 1024
58
+ ],
59
+ "dtype": "float32",
60
+ "format": "f32-to-bf16",
61
+ "nbytes": 12582912,
62
+ "byteOffset": 6293504
63
+ },
64
+ {
65
+ "name": "model.layers.0.post_attention_layernorm.weight",
66
+ "shape": [
67
+ 1024
68
+ ],
69
+ "dtype": "float32",
70
+ "format": "f32-to-bf16",
71
+ "nbytes": 2048,
72
+ "byteOffset": 18876416
73
+ },
74
+ {
75
+ "name": "model.layers.0.self_attn.k_norm.weight",
76
+ "shape": [
77
+ 128
78
+ ],
79
+ "dtype": "float32",
80
+ "format": "f32-to-bf16",
81
+ "nbytes": 256,
82
+ "byteOffset": 18878464
83
+ },
84
+ {
85
+ "name": "model.layers.0.self_attn.c_attn.weight",
86
+ "shape": [
87
+ 4096,
88
+ 1024
89
+ ],
90
+ "dtype": "float32",
91
+ "format": "f32-to-bf16",
92
+ "nbytes": 8388608,
93
+ "byteOffset": 18878720
94
+ },
95
+ {
96
+ "name": "model.layers.0.self_attn.o_proj.weight",
97
+ "shape": [
98
+ 1024,
99
+ 2048
100
+ ],
101
+ "dtype": "float32",
102
+ "format": "f32-to-bf16",
103
+ "nbytes": 4194304,
104
+ "byteOffset": 27267328
105
+ },
106
+ {
107
+ "name": "model.layers.0.self_attn.q_norm.weight",
108
+ "shape": [
109
+ 128
110
+ ],
111
+ "dtype": "float32",
112
+ "format": "f32-to-bf16",
113
+ "nbytes": 256,
114
+ "byteOffset": 31461632
115
+ },
116
+ {
117
+ "name": "model.layers.1.input_layernorm.weight",
118
+ "shape": [
119
+ 1024
120
+ ],
121
+ "dtype": "float32",
122
+ "format": "f32-to-bf16",
123
+ "nbytes": 2048,
124
+ "byteOffset": 31461888
125
+ }
126
+ ],
127
+ "md5sum": "9e8c92fad1c6d7956d2f76d18b34c0c0"
128
+ },
129
+ {
130
+ "dataPath": "params_shard_2.bin",
131
+ "format": "raw-shard",
132
+ "nbytes": 31461888,
133
+ "records": [
134
+ {
135
+ "name": "model.layers.1.mlp.down_proj.weight",
136
+ "shape": [
137
+ 1024,
138
+ 3072
139
+ ],
140
+ "dtype": "float32",
141
+ "format": "f32-to-bf16",
142
+ "nbytes": 6291456,
143
+ "byteOffset": 0
144
+ },
145
+ {
146
+ "name": "model.layers.1.mlp.gate_up_proj.weight",
147
+ "shape": [
148
+ 6144,
149
+ 1024
150
+ ],
151
+ "dtype": "float32",
152
+ "format": "f32-to-bf16",
153
+ "nbytes": 12582912,
154
+ "byteOffset": 6291456
155
+ },
156
+ {
157
+ "name": "model.layers.1.post_attention_layernorm.weight",
158
+ "shape": [
159
+ 1024
160
+ ],
161
+ "dtype": "float32",
162
+ "format": "f32-to-bf16",
163
+ "nbytes": 2048,
164
+ "byteOffset": 18874368
165
+ },
166
+ {
167
+ "name": "model.layers.1.self_attn.k_norm.weight",
168
+ "shape": [
169
+ 128
170
+ ],
171
+ "dtype": "float32",
172
+ "format": "f32-to-bf16",
173
+ "nbytes": 256,
174
+ "byteOffset": 18876416
175
+ },
176
+ {
177
+ "name": "model.layers.1.self_attn.c_attn.weight",
178
+ "shape": [
179
+ 4096,
180
+ 1024
181
+ ],
182
+ "dtype": "float32",
183
+ "format": "f32-to-bf16",
184
+ "nbytes": 8388608,
185
+ "byteOffset": 18876672
186
+ },
187
+ {
188
+ "name": "model.layers.1.self_attn.o_proj.weight",
189
+ "shape": [
190
+ 1024,
191
+ 2048
192
+ ],
193
+ "dtype": "float32",
194
+ "format": "f32-to-bf16",
195
+ "nbytes": 4194304,
196
+ "byteOffset": 27265280
197
+ },
198
+ {
199
+ "name": "model.layers.1.self_attn.q_norm.weight",
200
+ "shape": [
201
+ 128
202
+ ],
203
+ "dtype": "float32",
204
+ "format": "f32-to-bf16",
205
+ "nbytes": 256,
206
+ "byteOffset": 31459584
207
+ },
208
+ {
209
+ "name": "model.layers.10.input_layernorm.weight",
210
+ "shape": [
211
+ 1024
212
+ ],
213
+ "dtype": "float32",
214
+ "format": "f32-to-bf16",
215
+ "nbytes": 2048,
216
+ "byteOffset": 31459840
217
+ }
218
+ ],
219
+ "md5sum": "c74facda8699fab5c1fb909a0038eac3"
220
+ },
221
+ {
222
+ "dataPath": "params_shard_3.bin",
223
+ "format": "raw-shard",
224
+ "nbytes": 31461888,
225
+ "records": [
226
+ {
227
+ "name": "model.layers.10.mlp.down_proj.weight",
228
+ "shape": [
229
+ 1024,
230
+ 3072
231
+ ],
232
+ "dtype": "float32",
233
+ "format": "f32-to-bf16",
234
+ "nbytes": 6291456,
235
+ "byteOffset": 0
236
+ },
237
+ {
238
+ "name": "model.layers.10.mlp.gate_up_proj.weight",
239
+ "shape": [
240
+ 6144,
241
+ 1024
242
+ ],
243
+ "dtype": "float32",
244
+ "format": "f32-to-bf16",
245
+ "nbytes": 12582912,
246
+ "byteOffset": 6291456
247
+ },
248
+ {
249
+ "name": "model.layers.10.post_attention_layernorm.weight",
250
+ "shape": [
251
+ 1024
252
+ ],
253
+ "dtype": "float32",
254
+ "format": "f32-to-bf16",
255
+ "nbytes": 2048,
256
+ "byteOffset": 18874368
257
+ },
258
+ {
259
+ "name": "model.layers.10.self_attn.k_norm.weight",
260
+ "shape": [
261
+ 128
262
+ ],
263
+ "dtype": "float32",
264
+ "format": "f32-to-bf16",
265
+ "nbytes": 256,
266
+ "byteOffset": 18876416
267
+ },
268
+ {
269
+ "name": "model.layers.10.self_attn.c_attn.weight",
270
+ "shape": [
271
+ 4096,
272
+ 1024
273
+ ],
274
+ "dtype": "float32",
275
+ "format": "f32-to-bf16",
276
+ "nbytes": 8388608,
277
+ "byteOffset": 18876672
278
+ },
279
+ {
280
+ "name": "model.layers.10.self_attn.o_proj.weight",
281
+ "shape": [
282
+ 1024,
283
+ 2048
284
+ ],
285
+ "dtype": "float32",
286
+ "format": "f32-to-bf16",
287
+ "nbytes": 4194304,
288
+ "byteOffset": 27265280
289
+ },
290
+ {
291
+ "name": "model.layers.10.self_attn.q_norm.weight",
292
+ "shape": [
293
+ 128
294
+ ],
295
+ "dtype": "float32",
296
+ "format": "f32-to-bf16",
297
+ "nbytes": 256,
298
+ "byteOffset": 31459584
299
+ },
300
+ {
301
+ "name": "model.layers.11.input_layernorm.weight",
302
+ "shape": [
303
+ 1024
304
+ ],
305
+ "dtype": "float32",
306
+ "format": "f32-to-bf16",
307
+ "nbytes": 2048,
308
+ "byteOffset": 31459840
309
+ }
310
+ ],
311
+ "md5sum": "79511a618a11a839e255c0d6b8117701"
312
+ },
313
+ {
314
+ "dataPath": "params_shard_4.bin",
315
+ "format": "raw-shard",
316
+ "nbytes": 31461888,
317
+ "records": [
318
+ {
319
+ "name": "model.layers.11.mlp.down_proj.weight",
320
+ "shape": [
321
+ 1024,
322
+ 3072
323
+ ],
324
+ "dtype": "float32",
325
+ "format": "f32-to-bf16",
326
+ "nbytes": 6291456,
327
+ "byteOffset": 0
328
+ },
329
+ {
330
+ "name": "model.layers.11.mlp.gate_up_proj.weight",
331
+ "shape": [
332
+ 6144,
333
+ 1024
334
+ ],
335
+ "dtype": "float32",
336
+ "format": "f32-to-bf16",
337
+ "nbytes": 12582912,
338
+ "byteOffset": 6291456
339
+ },
340
+ {
341
+ "name": "model.layers.11.post_attention_layernorm.weight",
342
+ "shape": [
343
+ 1024
344
+ ],
345
+ "dtype": "float32",
346
+ "format": "f32-to-bf16",
347
+ "nbytes": 2048,
348
+ "byteOffset": 18874368
349
+ },
350
+ {
351
+ "name": "model.layers.11.self_attn.k_norm.weight",
352
+ "shape": [
353
+ 128
354
+ ],
355
+ "dtype": "float32",
356
+ "format": "f32-to-bf16",
357
+ "nbytes": 256,
358
+ "byteOffset": 18876416
359
+ },
360
+ {
361
+ "name": "model.layers.11.self_attn.c_attn.weight",
362
+ "shape": [
363
+ 4096,
364
+ 1024
365
+ ],
366
+ "dtype": "float32",
367
+ "format": "f32-to-bf16",
368
+ "nbytes": 8388608,
369
+ "byteOffset": 18876672
370
+ },
371
+ {
372
+ "name": "model.layers.11.self_attn.o_proj.weight",
373
+ "shape": [
374
+ 1024,
375
+ 2048
376
+ ],
377
+ "dtype": "float32",
378
+ "format": "f32-to-bf16",
379
+ "nbytes": 4194304,
380
+ "byteOffset": 27265280
381
+ },
382
+ {
383
+ "name": "model.layers.11.self_attn.q_norm.weight",
384
+ "shape": [
385
+ 128
386
+ ],
387
+ "dtype": "float32",
388
+ "format": "f32-to-bf16",
389
+ "nbytes": 256,
390
+ "byteOffset": 31459584
391
+ },
392
+ {
393
+ "name": "model.layers.12.input_layernorm.weight",
394
+ "shape": [
395
+ 1024
396
+ ],
397
+ "dtype": "float32",
398
+ "format": "f32-to-bf16",
399
+ "nbytes": 2048,
400
+ "byteOffset": 31459840
401
+ }
402
+ ],
403
+ "md5sum": "340f19886367858e06bb5b5367d20305"
404
+ },
405
+ {
406
+ "dataPath": "params_shard_5.bin",
407
+ "format": "raw-shard",
408
+ "nbytes": 31461888,
409
+ "records": [
410
+ {
411
+ "name": "model.layers.12.mlp.down_proj.weight",
412
+ "shape": [
413
+ 1024,
414
+ 3072
415
+ ],
416
+ "dtype": "float32",
417
+ "format": "f32-to-bf16",
418
+ "nbytes": 6291456,
419
+ "byteOffset": 0
420
+ },
421
+ {
422
+ "name": "model.layers.12.mlp.gate_up_proj.weight",
423
+ "shape": [
424
+ 6144,
425
+ 1024
426
+ ],
427
+ "dtype": "float32",
428
+ "format": "f32-to-bf16",
429
+ "nbytes": 12582912,
430
+ "byteOffset": 6291456
431
+ },
432
+ {
433
+ "name": "model.layers.12.post_attention_layernorm.weight",
434
+ "shape": [
435
+ 1024
436
+ ],
437
+ "dtype": "float32",
438
+ "format": "f32-to-bf16",
439
+ "nbytes": 2048,
440
+ "byteOffset": 18874368
441
+ },
442
+ {
443
+ "name": "model.layers.12.self_attn.k_norm.weight",
444
+ "shape": [
445
+ 128
446
+ ],
447
+ "dtype": "float32",
448
+ "format": "f32-to-bf16",
449
+ "nbytes": 256,
450
+ "byteOffset": 18876416
451
+ },
452
+ {
453
+ "name": "model.layers.12.self_attn.c_attn.weight",
454
+ "shape": [
455
+ 4096,
456
+ 1024
457
+ ],
458
+ "dtype": "float32",
459
+ "format": "f32-to-bf16",
460
+ "nbytes": 8388608,
461
+ "byteOffset": 18876672
462
+ },
463
+ {
464
+ "name": "model.layers.12.self_attn.o_proj.weight",
465
+ "shape": [
466
+ 1024,
467
+ 2048
468
+ ],
469
+ "dtype": "float32",
470
+ "format": "f32-to-bf16",
471
+ "nbytes": 4194304,
472
+ "byteOffset": 27265280
473
+ },
474
+ {
475
+ "name": "model.layers.12.self_attn.q_norm.weight",
476
+ "shape": [
477
+ 128
478
+ ],
479
+ "dtype": "float32",
480
+ "format": "f32-to-bf16",
481
+ "nbytes": 256,
482
+ "byteOffset": 31459584
483
+ },
484
+ {
485
+ "name": "model.layers.13.input_layernorm.weight",
486
+ "shape": [
487
+ 1024
488
+ ],
489
+ "dtype": "float32",
490
+ "format": "f32-to-bf16",
491
+ "nbytes": 2048,
492
+ "byteOffset": 31459840
493
+ }
494
+ ],
495
+ "md5sum": "051d820dffb4d079fe6324e4fed50c4f"
496
+ },
497
+ {
498
+ "dataPath": "params_shard_6.bin",
499
+ "format": "raw-shard",
500
+ "nbytes": 31461888,
501
+ "records": [
502
+ {
503
+ "name": "model.layers.13.mlp.down_proj.weight",
504
+ "shape": [
505
+ 1024,
506
+ 3072
507
+ ],
508
+ "dtype": "float32",
509
+ "format": "f32-to-bf16",
510
+ "nbytes": 6291456,
511
+ "byteOffset": 0
512
+ },
513
+ {
514
+ "name": "model.layers.13.mlp.gate_up_proj.weight",
515
+ "shape": [
516
+ 6144,
517
+ 1024
518
+ ],
519
+ "dtype": "float32",
520
+ "format": "f32-to-bf16",
521
+ "nbytes": 12582912,
522
+ "byteOffset": 6291456
523
+ },
524
+ {
525
+ "name": "model.layers.13.post_attention_layernorm.weight",
526
+ "shape": [
527
+ 1024
528
+ ],
529
+ "dtype": "float32",
530
+ "format": "f32-to-bf16",
531
+ "nbytes": 2048,
532
+ "byteOffset": 18874368
533
+ },
534
+ {
535
+ "name": "model.layers.13.self_attn.k_norm.weight",
536
+ "shape": [
537
+ 128
538
+ ],
539
+ "dtype": "float32",
540
+ "format": "f32-to-bf16",
541
+ "nbytes": 256,
542
+ "byteOffset": 18876416
543
+ },
544
+ {
545
+ "name": "model.layers.13.self_attn.c_attn.weight",
546
+ "shape": [
547
+ 4096,
548
+ 1024
549
+ ],
550
+ "dtype": "float32",
551
+ "format": "f32-to-bf16",
552
+ "nbytes": 8388608,
553
+ "byteOffset": 18876672
554
+ },
555
+ {
556
+ "name": "model.layers.13.self_attn.o_proj.weight",
557
+ "shape": [
558
+ 1024,
559
+ 2048
560
+ ],
561
+ "dtype": "float32",
562
+ "format": "f32-to-bf16",
563
+ "nbytes": 4194304,
564
+ "byteOffset": 27265280
565
+ },
566
+ {
567
+ "name": "model.layers.13.self_attn.q_norm.weight",
568
+ "shape": [
569
+ 128
570
+ ],
571
+ "dtype": "float32",
572
+ "format": "f32-to-bf16",
573
+ "nbytes": 256,
574
+ "byteOffset": 31459584
575
+ },
576
+ {
577
+ "name": "model.layers.14.input_layernorm.weight",
578
+ "shape": [
579
+ 1024
580
+ ],
581
+ "dtype": "float32",
582
+ "format": "f32-to-bf16",
583
+ "nbytes": 2048,
584
+ "byteOffset": 31459840
585
+ }
586
+ ],
587
+ "md5sum": "4aa584075dc763c62cfe3d5e58f17506"
588
+ },
589
+ {
590
+ "dataPath": "params_shard_7.bin",
591
+ "format": "raw-shard",
592
+ "nbytes": 31461888,
593
+ "records": [
594
+ {
595
+ "name": "model.layers.14.mlp.down_proj.weight",
596
+ "shape": [
597
+ 1024,
598
+ 3072
599
+ ],
600
+ "dtype": "float32",
601
+ "format": "f32-to-bf16",
602
+ "nbytes": 6291456,
603
+ "byteOffset": 0
604
+ },
605
+ {
606
+ "name": "model.layers.14.mlp.gate_up_proj.weight",
607
+ "shape": [
608
+ 6144,
609
+ 1024
610
+ ],
611
+ "dtype": "float32",
612
+ "format": "f32-to-bf16",
613
+ "nbytes": 12582912,
614
+ "byteOffset": 6291456
615
+ },
616
+ {
617
+ "name": "model.layers.14.post_attention_layernorm.weight",
618
+ "shape": [
619
+ 1024
620
+ ],
621
+ "dtype": "float32",
622
+ "format": "f32-to-bf16",
623
+ "nbytes": 2048,
624
+ "byteOffset": 18874368
625
+ },
626
+ {
627
+ "name": "model.layers.14.self_attn.k_norm.weight",
628
+ "shape": [
629
+ 128
630
+ ],
631
+ "dtype": "float32",
632
+ "format": "f32-to-bf16",
633
+ "nbytes": 256,
634
+ "byteOffset": 18876416
635
+ },
636
+ {
637
+ "name": "model.layers.14.self_attn.c_attn.weight",
638
+ "shape": [
639
+ 4096,
640
+ 1024
641
+ ],
642
+ "dtype": "float32",
643
+ "format": "f32-to-bf16",
644
+ "nbytes": 8388608,
645
+ "byteOffset": 18876672
646
+ },
647
+ {
648
+ "name": "model.layers.14.self_attn.o_proj.weight",
649
+ "shape": [
650
+ 1024,
651
+ 2048
652
+ ],
653
+ "dtype": "float32",
654
+ "format": "f32-to-bf16",
655
+ "nbytes": 4194304,
656
+ "byteOffset": 27265280
657
+ },
658
+ {
659
+ "name": "model.layers.14.self_attn.q_norm.weight",
660
+ "shape": [
661
+ 128
662
+ ],
663
+ "dtype": "float32",
664
+ "format": "f32-to-bf16",
665
+ "nbytes": 256,
666
+ "byteOffset": 31459584
667
+ },
668
+ {
669
+ "name": "model.layers.15.input_layernorm.weight",
670
+ "shape": [
671
+ 1024
672
+ ],
673
+ "dtype": "float32",
674
+ "format": "f32-to-bf16",
675
+ "nbytes": 2048,
676
+ "byteOffset": 31459840
677
+ }
678
+ ],
679
+ "md5sum": "f3846e544c925d1caab9675209fb9b72"
680
+ },
681
+ {
682
+ "dataPath": "params_shard_8.bin",
683
+ "format": "raw-shard",
684
+ "nbytes": 31461888,
685
+ "records": [
686
+ {
687
+ "name": "model.layers.15.mlp.down_proj.weight",
688
+ "shape": [
689
+ 1024,
690
+ 3072
691
+ ],
692
+ "dtype": "float32",
693
+ "format": "f32-to-bf16",
694
+ "nbytes": 6291456,
695
+ "byteOffset": 0
696
+ },
697
+ {
698
+ "name": "model.layers.15.mlp.gate_up_proj.weight",
699
+ "shape": [
700
+ 6144,
701
+ 1024
702
+ ],
703
+ "dtype": "float32",
704
+ "format": "f32-to-bf16",
705
+ "nbytes": 12582912,
706
+ "byteOffset": 6291456
707
+ },
708
+ {
709
+ "name": "model.layers.15.post_attention_layernorm.weight",
710
+ "shape": [
711
+ 1024
712
+ ],
713
+ "dtype": "float32",
714
+ "format": "f32-to-bf16",
715
+ "nbytes": 2048,
716
+ "byteOffset": 18874368
717
+ },
718
+ {
719
+ "name": "model.layers.15.self_attn.k_norm.weight",
720
+ "shape": [
721
+ 128
722
+ ],
723
+ "dtype": "float32",
724
+ "format": "f32-to-bf16",
725
+ "nbytes": 256,
726
+ "byteOffset": 18876416
727
+ },
728
+ {
729
+ "name": "model.layers.15.self_attn.c_attn.weight",
730
+ "shape": [
731
+ 4096,
732
+ 1024
733
+ ],
734
+ "dtype": "float32",
735
+ "format": "f32-to-bf16",
736
+ "nbytes": 8388608,
737
+ "byteOffset": 18876672
738
+ },
739
+ {
740
+ "name": "model.layers.15.self_attn.o_proj.weight",
741
+ "shape": [
742
+ 1024,
743
+ 2048
744
+ ],
745
+ "dtype": "float32",
746
+ "format": "f32-to-bf16",
747
+ "nbytes": 4194304,
748
+ "byteOffset": 27265280
749
+ },
750
+ {
751
+ "name": "model.layers.15.self_attn.q_norm.weight",
752
+ "shape": [
753
+ 128
754
+ ],
755
+ "dtype": "float32",
756
+ "format": "f32-to-bf16",
757
+ "nbytes": 256,
758
+ "byteOffset": 31459584
759
+ },
760
+ {
761
+ "name": "model.layers.16.input_layernorm.weight",
762
+ "shape": [
763
+ 1024
764
+ ],
765
+ "dtype": "float32",
766
+ "format": "f32-to-bf16",
767
+ "nbytes": 2048,
768
+ "byteOffset": 31459840
769
+ }
770
+ ],
771
+ "md5sum": "4782b81576a4555a1f25d98dbcc3397e"
772
+ },
773
+ {
774
+ "dataPath": "params_shard_9.bin",
775
+ "format": "raw-shard",
776
+ "nbytes": 31461888,
777
+ "records": [
778
+ {
779
+ "name": "model.layers.16.mlp.down_proj.weight",
780
+ "shape": [
781
+ 1024,
782
+ 3072
783
+ ],
784
+ "dtype": "float32",
785
+ "format": "f32-to-bf16",
786
+ "nbytes": 6291456,
787
+ "byteOffset": 0
788
+ },
789
+ {
790
+ "name": "model.layers.16.mlp.gate_up_proj.weight",
791
+ "shape": [
792
+ 6144,
793
+ 1024
794
+ ],
795
+ "dtype": "float32",
796
+ "format": "f32-to-bf16",
797
+ "nbytes": 12582912,
798
+ "byteOffset": 6291456
799
+ },
800
+ {
801
+ "name": "model.layers.16.post_attention_layernorm.weight",
802
+ "shape": [
803
+ 1024
804
+ ],
805
+ "dtype": "float32",
806
+ "format": "f32-to-bf16",
807
+ "nbytes": 2048,
808
+ "byteOffset": 18874368
809
+ },
810
+ {
811
+ "name": "model.layers.16.self_attn.k_norm.weight",
812
+ "shape": [
813
+ 128
814
+ ],
815
+ "dtype": "float32",
816
+ "format": "f32-to-bf16",
817
+ "nbytes": 256,
818
+ "byteOffset": 18876416
819
+ },
820
+ {
821
+ "name": "model.layers.16.self_attn.c_attn.weight",
822
+ "shape": [
823
+ 4096,
824
+ 1024
825
+ ],
826
+ "dtype": "float32",
827
+ "format": "f32-to-bf16",
828
+ "nbytes": 8388608,
829
+ "byteOffset": 18876672
830
+ },
831
+ {
832
+ "name": "model.layers.16.self_attn.o_proj.weight",
833
+ "shape": [
834
+ 1024,
835
+ 2048
836
+ ],
837
+ "dtype": "float32",
838
+ "format": "f32-to-bf16",
839
+ "nbytes": 4194304,
840
+ "byteOffset": 27265280
841
+ },
842
+ {
843
+ "name": "model.layers.16.self_attn.q_norm.weight",
844
+ "shape": [
845
+ 128
846
+ ],
847
+ "dtype": "float32",
848
+ "format": "f32-to-bf16",
849
+ "nbytes": 256,
850
+ "byteOffset": 31459584
851
+ },
852
+ {
853
+ "name": "model.layers.17.input_layernorm.weight",
854
+ "shape": [
855
+ 1024
856
+ ],
857
+ "dtype": "float32",
858
+ "format": "f32-to-bf16",
859
+ "nbytes": 2048,
860
+ "byteOffset": 31459840
861
+ }
862
+ ],
863
+ "md5sum": "50a506dc9852c1c7be39db6bbbb71bc4"
864
+ },
865
+ {
866
+ "dataPath": "params_shard_10.bin",
867
+ "format": "raw-shard",
868
+ "nbytes": 31461888,
869
+ "records": [
870
+ {
871
+ "name": "model.layers.17.mlp.down_proj.weight",
872
+ "shape": [
873
+ 1024,
874
+ 3072
875
+ ],
876
+ "dtype": "float32",
877
+ "format": "f32-to-bf16",
878
+ "nbytes": 6291456,
879
+ "byteOffset": 0
880
+ },
881
+ {
882
+ "name": "model.layers.17.mlp.gate_up_proj.weight",
883
+ "shape": [
884
+ 6144,
885
+ 1024
886
+ ],
887
+ "dtype": "float32",
888
+ "format": "f32-to-bf16",
889
+ "nbytes": 12582912,
890
+ "byteOffset": 6291456
891
+ },
892
+ {
893
+ "name": "model.layers.17.post_attention_layernorm.weight",
894
+ "shape": [
895
+ 1024
896
+ ],
897
+ "dtype": "float32",
898
+ "format": "f32-to-bf16",
899
+ "nbytes": 2048,
900
+ "byteOffset": 18874368
901
+ },
902
+ {
903
+ "name": "model.layers.17.self_attn.k_norm.weight",
904
+ "shape": [
905
+ 128
906
+ ],
907
+ "dtype": "float32",
908
+ "format": "f32-to-bf16",
909
+ "nbytes": 256,
910
+ "byteOffset": 18876416
911
+ },
912
+ {
913
+ "name": "model.layers.17.self_attn.c_attn.weight",
914
+ "shape": [
915
+ 4096,
916
+ 1024
917
+ ],
918
+ "dtype": "float32",
919
+ "format": "f32-to-bf16",
920
+ "nbytes": 8388608,
921
+ "byteOffset": 18876672
922
+ },
923
+ {
924
+ "name": "model.layers.17.self_attn.o_proj.weight",
925
+ "shape": [
926
+ 1024,
927
+ 2048
928
+ ],
929
+ "dtype": "float32",
930
+ "format": "f32-to-bf16",
931
+ "nbytes": 4194304,
932
+ "byteOffset": 27265280
933
+ },
934
+ {
935
+ "name": "model.layers.17.self_attn.q_norm.weight",
936
+ "shape": [
937
+ 128
938
+ ],
939
+ "dtype": "float32",
940
+ "format": "f32-to-bf16",
941
+ "nbytes": 256,
942
+ "byteOffset": 31459584
943
+ },
944
+ {
945
+ "name": "model.layers.18.input_layernorm.weight",
946
+ "shape": [
947
+ 1024
948
+ ],
949
+ "dtype": "float32",
950
+ "format": "f32-to-bf16",
951
+ "nbytes": 2048,
952
+ "byteOffset": 31459840
953
+ }
954
+ ],
955
+ "md5sum": "28226cd94e7b0b9cb620911de11ec28c"
956
+ },
957
+ {
958
+ "dataPath": "params_shard_11.bin",
959
+ "format": "raw-shard",
960
+ "nbytes": 31461888,
961
+ "records": [
962
+ {
963
+ "name": "model.layers.18.mlp.down_proj.weight",
964
+ "shape": [
965
+ 1024,
966
+ 3072
967
+ ],
968
+ "dtype": "float32",
969
+ "format": "f32-to-bf16",
970
+ "nbytes": 6291456,
971
+ "byteOffset": 0
972
+ },
973
+ {
974
+ "name": "model.layers.18.mlp.gate_up_proj.weight",
975
+ "shape": [
976
+ 6144,
977
+ 1024
978
+ ],
979
+ "dtype": "float32",
980
+ "format": "f32-to-bf16",
981
+ "nbytes": 12582912,
982
+ "byteOffset": 6291456
983
+ },
984
+ {
985
+ "name": "model.layers.18.post_attention_layernorm.weight",
986
+ "shape": [
987
+ 1024
988
+ ],
989
+ "dtype": "float32",
990
+ "format": "f32-to-bf16",
991
+ "nbytes": 2048,
992
+ "byteOffset": 18874368
993
+ },
994
+ {
995
+ "name": "model.layers.18.self_attn.k_norm.weight",
996
+ "shape": [
997
+ 128
998
+ ],
999
+ "dtype": "float32",
1000
+ "format": "f32-to-bf16",
1001
+ "nbytes": 256,
1002
+ "byteOffset": 18876416
1003
+ },
1004
+ {
1005
+ "name": "model.layers.18.self_attn.c_attn.weight",
1006
+ "shape": [
1007
+ 4096,
1008
+ 1024
1009
+ ],
1010
+ "dtype": "float32",
1011
+ "format": "f32-to-bf16",
1012
+ "nbytes": 8388608,
1013
+ "byteOffset": 18876672
1014
+ },
1015
+ {
1016
+ "name": "model.layers.18.self_attn.o_proj.weight",
1017
+ "shape": [
1018
+ 1024,
1019
+ 2048
1020
+ ],
1021
+ "dtype": "float32",
1022
+ "format": "f32-to-bf16",
1023
+ "nbytes": 4194304,
1024
+ "byteOffset": 27265280
1025
+ },
1026
+ {
1027
+ "name": "model.layers.18.self_attn.q_norm.weight",
1028
+ "shape": [
1029
+ 128
1030
+ ],
1031
+ "dtype": "float32",
1032
+ "format": "f32-to-bf16",
1033
+ "nbytes": 256,
1034
+ "byteOffset": 31459584
1035
+ },
1036
+ {
1037
+ "name": "model.layers.19.input_layernorm.weight",
1038
+ "shape": [
1039
+ 1024
1040
+ ],
1041
+ "dtype": "float32",
1042
+ "format": "f32-to-bf16",
1043
+ "nbytes": 2048,
1044
+ "byteOffset": 31459840
1045
+ }
1046
+ ],
1047
+ "md5sum": "961f8e1a308774dc0fd86312259d7c82"
1048
+ },
1049
+ {
1050
+ "dataPath": "params_shard_12.bin",
1051
+ "format": "raw-shard",
1052
+ "nbytes": 31461888,
1053
+ "records": [
1054
+ {
1055
+ "name": "model.layers.19.mlp.down_proj.weight",
1056
+ "shape": [
1057
+ 1024,
1058
+ 3072
1059
+ ],
1060
+ "dtype": "float32",
1061
+ "format": "f32-to-bf16",
1062
+ "nbytes": 6291456,
1063
+ "byteOffset": 0
1064
+ },
1065
+ {
1066
+ "name": "model.layers.19.mlp.gate_up_proj.weight",
1067
+ "shape": [
1068
+ 6144,
1069
+ 1024
1070
+ ],
1071
+ "dtype": "float32",
1072
+ "format": "f32-to-bf16",
1073
+ "nbytes": 12582912,
1074
+ "byteOffset": 6291456
1075
+ },
1076
+ {
1077
+ "name": "model.layers.19.post_attention_layernorm.weight",
1078
+ "shape": [
1079
+ 1024
1080
+ ],
1081
+ "dtype": "float32",
1082
+ "format": "f32-to-bf16",
1083
+ "nbytes": 2048,
1084
+ "byteOffset": 18874368
1085
+ },
1086
+ {
1087
+ "name": "model.layers.19.self_attn.k_norm.weight",
1088
+ "shape": [
1089
+ 128
1090
+ ],
1091
+ "dtype": "float32",
1092
+ "format": "f32-to-bf16",
1093
+ "nbytes": 256,
1094
+ "byteOffset": 18876416
1095
+ },
1096
+ {
1097
+ "name": "model.layers.19.self_attn.c_attn.weight",
1098
+ "shape": [
1099
+ 4096,
1100
+ 1024
1101
+ ],
1102
+ "dtype": "float32",
1103
+ "format": "f32-to-bf16",
1104
+ "nbytes": 8388608,
1105
+ "byteOffset": 18876672
1106
+ },
1107
+ {
1108
+ "name": "model.layers.19.self_attn.o_proj.weight",
1109
+ "shape": [
1110
+ 1024,
1111
+ 2048
1112
+ ],
1113
+ "dtype": "float32",
1114
+ "format": "f32-to-bf16",
1115
+ "nbytes": 4194304,
1116
+ "byteOffset": 27265280
1117
+ },
1118
+ {
1119
+ "name": "model.layers.19.self_attn.q_norm.weight",
1120
+ "shape": [
1121
+ 128
1122
+ ],
1123
+ "dtype": "float32",
1124
+ "format": "f32-to-bf16",
1125
+ "nbytes": 256,
1126
+ "byteOffset": 31459584
1127
+ },
1128
+ {
1129
+ "name": "model.layers.2.input_layernorm.weight",
1130
+ "shape": [
1131
+ 1024
1132
+ ],
1133
+ "dtype": "float32",
1134
+ "format": "f32-to-bf16",
1135
+ "nbytes": 2048,
1136
+ "byteOffset": 31459840
1137
+ }
1138
+ ],
1139
+ "md5sum": "f17374b039b0e7a53080d4e8c61f1f16"
1140
+ },
1141
+ {
1142
+ "dataPath": "params_shard_13.bin",
1143
+ "format": "raw-shard",
1144
+ "nbytes": 31461888,
1145
+ "records": [
1146
+ {
1147
+ "name": "model.layers.2.mlp.down_proj.weight",
1148
+ "shape": [
1149
+ 1024,
1150
+ 3072
1151
+ ],
1152
+ "dtype": "float32",
1153
+ "format": "f32-to-bf16",
1154
+ "nbytes": 6291456,
1155
+ "byteOffset": 0
1156
+ },
1157
+ {
1158
+ "name": "model.layers.2.mlp.gate_up_proj.weight",
1159
+ "shape": [
1160
+ 6144,
1161
+ 1024
1162
+ ],
1163
+ "dtype": "float32",
1164
+ "format": "f32-to-bf16",
1165
+ "nbytes": 12582912,
1166
+ "byteOffset": 6291456
1167
+ },
1168
+ {
1169
+ "name": "model.layers.2.post_attention_layernorm.weight",
1170
+ "shape": [
1171
+ 1024
1172
+ ],
1173
+ "dtype": "float32",
1174
+ "format": "f32-to-bf16",
1175
+ "nbytes": 2048,
1176
+ "byteOffset": 18874368
1177
+ },
1178
+ {
1179
+ "name": "model.layers.2.self_attn.k_norm.weight",
1180
+ "shape": [
1181
+ 128
1182
+ ],
1183
+ "dtype": "float32",
1184
+ "format": "f32-to-bf16",
1185
+ "nbytes": 256,
1186
+ "byteOffset": 18876416
1187
+ },
1188
+ {
1189
+ "name": "model.layers.2.self_attn.c_attn.weight",
1190
+ "shape": [
1191
+ 4096,
1192
+ 1024
1193
+ ],
1194
+ "dtype": "float32",
1195
+ "format": "f32-to-bf16",
1196
+ "nbytes": 8388608,
1197
+ "byteOffset": 18876672
1198
+ },
1199
+ {
1200
+ "name": "model.layers.2.self_attn.o_proj.weight",
1201
+ "shape": [
1202
+ 1024,
1203
+ 2048
1204
+ ],
1205
+ "dtype": "float32",
1206
+ "format": "f32-to-bf16",
1207
+ "nbytes": 4194304,
1208
+ "byteOffset": 27265280
1209
+ },
1210
+ {
1211
+ "name": "model.layers.2.self_attn.q_norm.weight",
1212
+ "shape": [
1213
+ 128
1214
+ ],
1215
+ "dtype": "float32",
1216
+ "format": "f32-to-bf16",
1217
+ "nbytes": 256,
1218
+ "byteOffset": 31459584
1219
+ },
1220
+ {
1221
+ "name": "model.layers.20.input_layernorm.weight",
1222
+ "shape": [
1223
+ 1024
1224
+ ],
1225
+ "dtype": "float32",
1226
+ "format": "f32-to-bf16",
1227
+ "nbytes": 2048,
1228
+ "byteOffset": 31459840
1229
+ }
1230
+ ],
1231
+ "md5sum": "7951ac8ebc3690aa9df718dfb599eb7b"
1232
+ },
1233
+ {
1234
+ "dataPath": "params_shard_14.bin",
1235
+ "format": "raw-shard",
1236
+ "nbytes": 31461888,
1237
+ "records": [
1238
+ {
1239
+ "name": "model.layers.20.mlp.down_proj.weight",
1240
+ "shape": [
1241
+ 1024,
1242
+ 3072
1243
+ ],
1244
+ "dtype": "float32",
1245
+ "format": "f32-to-bf16",
1246
+ "nbytes": 6291456,
1247
+ "byteOffset": 0
1248
+ },
1249
+ {
1250
+ "name": "model.layers.20.mlp.gate_up_proj.weight",
1251
+ "shape": [
1252
+ 6144,
1253
+ 1024
1254
+ ],
1255
+ "dtype": "float32",
1256
+ "format": "f32-to-bf16",
1257
+ "nbytes": 12582912,
1258
+ "byteOffset": 6291456
1259
+ },
1260
+ {
1261
+ "name": "model.layers.20.post_attention_layernorm.weight",
1262
+ "shape": [
1263
+ 1024
1264
+ ],
1265
+ "dtype": "float32",
1266
+ "format": "f32-to-bf16",
1267
+ "nbytes": 2048,
1268
+ "byteOffset": 18874368
1269
+ },
1270
+ {
1271
+ "name": "model.layers.20.self_attn.k_norm.weight",
1272
+ "shape": [
1273
+ 128
1274
+ ],
1275
+ "dtype": "float32",
1276
+ "format": "f32-to-bf16",
1277
+ "nbytes": 256,
1278
+ "byteOffset": 18876416
1279
+ },
1280
+ {
1281
+ "name": "model.layers.20.self_attn.c_attn.weight",
1282
+ "shape": [
1283
+ 4096,
1284
+ 1024
1285
+ ],
1286
+ "dtype": "float32",
1287
+ "format": "f32-to-bf16",
1288
+ "nbytes": 8388608,
1289
+ "byteOffset": 18876672
1290
+ },
1291
+ {
1292
+ "name": "model.layers.20.self_attn.o_proj.weight",
1293
+ "shape": [
1294
+ 1024,
1295
+ 2048
1296
+ ],
1297
+ "dtype": "float32",
1298
+ "format": "f32-to-bf16",
1299
+ "nbytes": 4194304,
1300
+ "byteOffset": 27265280
1301
+ },
1302
+ {
1303
+ "name": "model.layers.20.self_attn.q_norm.weight",
1304
+ "shape": [
1305
+ 128
1306
+ ],
1307
+ "dtype": "float32",
1308
+ "format": "f32-to-bf16",
1309
+ "nbytes": 256,
1310
+ "byteOffset": 31459584
1311
+ },
1312
+ {
1313
+ "name": "model.layers.21.input_layernorm.weight",
1314
+ "shape": [
1315
+ 1024
1316
+ ],
1317
+ "dtype": "float32",
1318
+ "format": "f32-to-bf16",
1319
+ "nbytes": 2048,
1320
+ "byteOffset": 31459840
1321
+ }
1322
+ ],
1323
+ "md5sum": "a17e66b0197f5f796edc26b7c0b60a79"
1324
+ },
1325
+ {
1326
+ "dataPath": "params_shard_15.bin",
1327
+ "format": "raw-shard",
1328
+ "nbytes": 31461888,
1329
+ "records": [
1330
+ {
1331
+ "name": "model.layers.21.mlp.down_proj.weight",
1332
+ "shape": [
1333
+ 1024,
1334
+ 3072
1335
+ ],
1336
+ "dtype": "float32",
1337
+ "format": "f32-to-bf16",
1338
+ "nbytes": 6291456,
1339
+ "byteOffset": 0
1340
+ },
1341
+ {
1342
+ "name": "model.layers.21.mlp.gate_up_proj.weight",
1343
+ "shape": [
1344
+ 6144,
1345
+ 1024
1346
+ ],
1347
+ "dtype": "float32",
1348
+ "format": "f32-to-bf16",
1349
+ "nbytes": 12582912,
1350
+ "byteOffset": 6291456
1351
+ },
1352
+ {
1353
+ "name": "model.layers.21.post_attention_layernorm.weight",
1354
+ "shape": [
1355
+ 1024
1356
+ ],
1357
+ "dtype": "float32",
1358
+ "format": "f32-to-bf16",
1359
+ "nbytes": 2048,
1360
+ "byteOffset": 18874368
1361
+ },
1362
+ {
1363
+ "name": "model.layers.21.self_attn.k_norm.weight",
1364
+ "shape": [
1365
+ 128
1366
+ ],
1367
+ "dtype": "float32",
1368
+ "format": "f32-to-bf16",
1369
+ "nbytes": 256,
1370
+ "byteOffset": 18876416
1371
+ },
1372
+ {
1373
+ "name": "model.layers.21.self_attn.c_attn.weight",
1374
+ "shape": [
1375
+ 4096,
1376
+ 1024
1377
+ ],
1378
+ "dtype": "float32",
1379
+ "format": "f32-to-bf16",
1380
+ "nbytes": 8388608,
1381
+ "byteOffset": 18876672
1382
+ },
1383
+ {
1384
+ "name": "model.layers.21.self_attn.o_proj.weight",
1385
+ "shape": [
1386
+ 1024,
1387
+ 2048
1388
+ ],
1389
+ "dtype": "float32",
1390
+ "format": "f32-to-bf16",
1391
+ "nbytes": 4194304,
1392
+ "byteOffset": 27265280
1393
+ },
1394
+ {
1395
+ "name": "model.layers.21.self_attn.q_norm.weight",
1396
+ "shape": [
1397
+ 128
1398
+ ],
1399
+ "dtype": "float32",
1400
+ "format": "f32-to-bf16",
1401
+ "nbytes": 256,
1402
+ "byteOffset": 31459584
1403
+ },
1404
+ {
1405
+ "name": "model.layers.22.input_layernorm.weight",
1406
+ "shape": [
1407
+ 1024
1408
+ ],
1409
+ "dtype": "float32",
1410
+ "format": "f32-to-bf16",
1411
+ "nbytes": 2048,
1412
+ "byteOffset": 31459840
1413
+ }
1414
+ ],
1415
+ "md5sum": "29dc75f0a0de71a20a19dc04d6ec2fe8"
1416
+ },
1417
+ {
1418
+ "dataPath": "params_shard_16.bin",
1419
+ "format": "raw-shard",
1420
+ "nbytes": 31461888,
1421
+ "records": [
1422
+ {
1423
+ "name": "model.layers.22.mlp.down_proj.weight",
1424
+ "shape": [
1425
+ 1024,
1426
+ 3072
1427
+ ],
1428
+ "dtype": "float32",
1429
+ "format": "f32-to-bf16",
1430
+ "nbytes": 6291456,
1431
+ "byteOffset": 0
1432
+ },
1433
+ {
1434
+ "name": "model.layers.22.mlp.gate_up_proj.weight",
1435
+ "shape": [
1436
+ 6144,
1437
+ 1024
1438
+ ],
1439
+ "dtype": "float32",
1440
+ "format": "f32-to-bf16",
1441
+ "nbytes": 12582912,
1442
+ "byteOffset": 6291456
1443
+ },
1444
+ {
1445
+ "name": "model.layers.22.post_attention_layernorm.weight",
1446
+ "shape": [
1447
+ 1024
1448
+ ],
1449
+ "dtype": "float32",
1450
+ "format": "f32-to-bf16",
1451
+ "nbytes": 2048,
1452
+ "byteOffset": 18874368
1453
+ },
1454
+ {
1455
+ "name": "model.layers.22.self_attn.k_norm.weight",
1456
+ "shape": [
1457
+ 128
1458
+ ],
1459
+ "dtype": "float32",
1460
+ "format": "f32-to-bf16",
1461
+ "nbytes": 256,
1462
+ "byteOffset": 18876416
1463
+ },
1464
+ {
1465
+ "name": "model.layers.22.self_attn.c_attn.weight",
1466
+ "shape": [
1467
+ 4096,
1468
+ 1024
1469
+ ],
1470
+ "dtype": "float32",
1471
+ "format": "f32-to-bf16",
1472
+ "nbytes": 8388608,
1473
+ "byteOffset": 18876672
1474
+ },
1475
+ {
1476
+ "name": "model.layers.22.self_attn.o_proj.weight",
1477
+ "shape": [
1478
+ 1024,
1479
+ 2048
1480
+ ],
1481
+ "dtype": "float32",
1482
+ "format": "f32-to-bf16",
1483
+ "nbytes": 4194304,
1484
+ "byteOffset": 27265280
1485
+ },
1486
+ {
1487
+ "name": "model.layers.22.self_attn.q_norm.weight",
1488
+ "shape": [
1489
+ 128
1490
+ ],
1491
+ "dtype": "float32",
1492
+ "format": "f32-to-bf16",
1493
+ "nbytes": 256,
1494
+ "byteOffset": 31459584
1495
+ },
1496
+ {
1497
+ "name": "model.layers.23.input_layernorm.weight",
1498
+ "shape": [
1499
+ 1024
1500
+ ],
1501
+ "dtype": "float32",
1502
+ "format": "f32-to-bf16",
1503
+ "nbytes": 2048,
1504
+ "byteOffset": 31459840
1505
+ }
1506
+ ],
1507
+ "md5sum": "77370b671251a8b0f10de8721da680eb"
1508
+ },
1509
+ {
1510
+ "dataPath": "params_shard_17.bin",
1511
+ "format": "raw-shard",
1512
+ "nbytes": 31461888,
1513
+ "records": [
1514
+ {
1515
+ "name": "model.layers.23.mlp.down_proj.weight",
1516
+ "shape": [
1517
+ 1024,
1518
+ 3072
1519
+ ],
1520
+ "dtype": "float32",
1521
+ "format": "f32-to-bf16",
1522
+ "nbytes": 6291456,
1523
+ "byteOffset": 0
1524
+ },
1525
+ {
1526
+ "name": "model.layers.23.mlp.gate_up_proj.weight",
1527
+ "shape": [
1528
+ 6144,
1529
+ 1024
1530
+ ],
1531
+ "dtype": "float32",
1532
+ "format": "f32-to-bf16",
1533
+ "nbytes": 12582912,
1534
+ "byteOffset": 6291456
1535
+ },
1536
+ {
1537
+ "name": "model.layers.23.post_attention_layernorm.weight",
1538
+ "shape": [
1539
+ 1024
1540
+ ],
1541
+ "dtype": "float32",
1542
+ "format": "f32-to-bf16",
1543
+ "nbytes": 2048,
1544
+ "byteOffset": 18874368
1545
+ },
1546
+ {
1547
+ "name": "model.layers.23.self_attn.k_norm.weight",
1548
+ "shape": [
1549
+ 128
1550
+ ],
1551
+ "dtype": "float32",
1552
+ "format": "f32-to-bf16",
1553
+ "nbytes": 256,
1554
+ "byteOffset": 18876416
1555
+ },
1556
+ {
1557
+ "name": "model.layers.23.self_attn.c_attn.weight",
1558
+ "shape": [
1559
+ 4096,
1560
+ 1024
1561
+ ],
1562
+ "dtype": "float32",
1563
+ "format": "f32-to-bf16",
1564
+ "nbytes": 8388608,
1565
+ "byteOffset": 18876672
1566
+ },
1567
+ {
1568
+ "name": "model.layers.23.self_attn.o_proj.weight",
1569
+ "shape": [
1570
+ 1024,
1571
+ 2048
1572
+ ],
1573
+ "dtype": "float32",
1574
+ "format": "f32-to-bf16",
1575
+ "nbytes": 4194304,
1576
+ "byteOffset": 27265280
1577
+ },
1578
+ {
1579
+ "name": "model.layers.23.self_attn.q_norm.weight",
1580
+ "shape": [
1581
+ 128
1582
+ ],
1583
+ "dtype": "float32",
1584
+ "format": "f32-to-bf16",
1585
+ "nbytes": 256,
1586
+ "byteOffset": 31459584
1587
+ },
1588
+ {
1589
+ "name": "model.layers.24.input_layernorm.weight",
1590
+ "shape": [
1591
+ 1024
1592
+ ],
1593
+ "dtype": "float32",
1594
+ "format": "f32-to-bf16",
1595
+ "nbytes": 2048,
1596
+ "byteOffset": 31459840
1597
+ }
1598
+ ],
1599
+ "md5sum": "e1b3dd47fb933da1144e2d4ee03a6447"
1600
+ },
1601
+ {
1602
+ "dataPath": "params_shard_18.bin",
1603
+ "format": "raw-shard",
1604
+ "nbytes": 31461888,
1605
+ "records": [
1606
+ {
1607
+ "name": "model.layers.24.mlp.down_proj.weight",
1608
+ "shape": [
1609
+ 1024,
1610
+ 3072
1611
+ ],
1612
+ "dtype": "float32",
1613
+ "format": "f32-to-bf16",
1614
+ "nbytes": 6291456,
1615
+ "byteOffset": 0
1616
+ },
1617
+ {
1618
+ "name": "model.layers.24.mlp.gate_up_proj.weight",
1619
+ "shape": [
1620
+ 6144,
1621
+ 1024
1622
+ ],
1623
+ "dtype": "float32",
1624
+ "format": "f32-to-bf16",
1625
+ "nbytes": 12582912,
1626
+ "byteOffset": 6291456
1627
+ },
1628
+ {
1629
+ "name": "model.layers.24.post_attention_layernorm.weight",
1630
+ "shape": [
1631
+ 1024
1632
+ ],
1633
+ "dtype": "float32",
1634
+ "format": "f32-to-bf16",
1635
+ "nbytes": 2048,
1636
+ "byteOffset": 18874368
1637
+ },
1638
+ {
1639
+ "name": "model.layers.24.self_attn.k_norm.weight",
1640
+ "shape": [
1641
+ 128
1642
+ ],
1643
+ "dtype": "float32",
1644
+ "format": "f32-to-bf16",
1645
+ "nbytes": 256,
1646
+ "byteOffset": 18876416
1647
+ },
1648
+ {
1649
+ "name": "model.layers.24.self_attn.c_attn.weight",
1650
+ "shape": [
1651
+ 4096,
1652
+ 1024
1653
+ ],
1654
+ "dtype": "float32",
1655
+ "format": "f32-to-bf16",
1656
+ "nbytes": 8388608,
1657
+ "byteOffset": 18876672
1658
+ },
1659
+ {
1660
+ "name": "model.layers.24.self_attn.o_proj.weight",
1661
+ "shape": [
1662
+ 1024,
1663
+ 2048
1664
+ ],
1665
+ "dtype": "float32",
1666
+ "format": "f32-to-bf16",
1667
+ "nbytes": 4194304,
1668
+ "byteOffset": 27265280
1669
+ },
1670
+ {
1671
+ "name": "model.layers.24.self_attn.q_norm.weight",
1672
+ "shape": [
1673
+ 128
1674
+ ],
1675
+ "dtype": "float32",
1676
+ "format": "f32-to-bf16",
1677
+ "nbytes": 256,
1678
+ "byteOffset": 31459584
1679
+ },
1680
+ {
1681
+ "name": "model.layers.25.input_layernorm.weight",
1682
+ "shape": [
1683
+ 1024
1684
+ ],
1685
+ "dtype": "float32",
1686
+ "format": "f32-to-bf16",
1687
+ "nbytes": 2048,
1688
+ "byteOffset": 31459840
1689
+ }
1690
+ ],
1691
+ "md5sum": "1d9f848a8eba2d142c9aa2d536969ecf"
1692
+ },
1693
+ {
1694
+ "dataPath": "params_shard_19.bin",
1695
+ "format": "raw-shard",
1696
+ "nbytes": 31461888,
1697
+ "records": [
1698
+ {
1699
+ "name": "model.layers.25.mlp.down_proj.weight",
1700
+ "shape": [
1701
+ 1024,
1702
+ 3072
1703
+ ],
1704
+ "dtype": "float32",
1705
+ "format": "f32-to-bf16",
1706
+ "nbytes": 6291456,
1707
+ "byteOffset": 0
1708
+ },
1709
+ {
1710
+ "name": "model.layers.25.mlp.gate_up_proj.weight",
1711
+ "shape": [
1712
+ 6144,
1713
+ 1024
1714
+ ],
1715
+ "dtype": "float32",
1716
+ "format": "f32-to-bf16",
1717
+ "nbytes": 12582912,
1718
+ "byteOffset": 6291456
1719
+ },
1720
+ {
1721
+ "name": "model.layers.25.post_attention_layernorm.weight",
1722
+ "shape": [
1723
+ 1024
1724
+ ],
1725
+ "dtype": "float32",
1726
+ "format": "f32-to-bf16",
1727
+ "nbytes": 2048,
1728
+ "byteOffset": 18874368
1729
+ },
1730
+ {
1731
+ "name": "model.layers.25.self_attn.k_norm.weight",
1732
+ "shape": [
1733
+ 128
1734
+ ],
1735
+ "dtype": "float32",
1736
+ "format": "f32-to-bf16",
1737
+ "nbytes": 256,
1738
+ "byteOffset": 18876416
1739
+ },
1740
+ {
1741
+ "name": "model.layers.25.self_attn.c_attn.weight",
1742
+ "shape": [
1743
+ 4096,
1744
+ 1024
1745
+ ],
1746
+ "dtype": "float32",
1747
+ "format": "f32-to-bf16",
1748
+ "nbytes": 8388608,
1749
+ "byteOffset": 18876672
1750
+ },
1751
+ {
1752
+ "name": "model.layers.25.self_attn.o_proj.weight",
1753
+ "shape": [
1754
+ 1024,
1755
+ 2048
1756
+ ],
1757
+ "dtype": "float32",
1758
+ "format": "f32-to-bf16",
1759
+ "nbytes": 4194304,
1760
+ "byteOffset": 27265280
1761
+ },
1762
+ {
1763
+ "name": "model.layers.25.self_attn.q_norm.weight",
1764
+ "shape": [
1765
+ 128
1766
+ ],
1767
+ "dtype": "float32",
1768
+ "format": "f32-to-bf16",
1769
+ "nbytes": 256,
1770
+ "byteOffset": 31459584
1771
+ },
1772
+ {
1773
+ "name": "model.layers.26.input_layernorm.weight",
1774
+ "shape": [
1775
+ 1024
1776
+ ],
1777
+ "dtype": "float32",
1778
+ "format": "f32-to-bf16",
1779
+ "nbytes": 2048,
1780
+ "byteOffset": 31459840
1781
+ }
1782
+ ],
1783
+ "md5sum": "e6125aaacc98c10d3e7702aa6bdb39d8"
1784
+ },
1785
+ {
1786
+ "dataPath": "params_shard_20.bin",
1787
+ "format": "raw-shard",
1788
+ "nbytes": 31461888,
1789
+ "records": [
1790
+ {
1791
+ "name": "model.layers.26.mlp.down_proj.weight",
1792
+ "shape": [
1793
+ 1024,
1794
+ 3072
1795
+ ],
1796
+ "dtype": "float32",
1797
+ "format": "f32-to-bf16",
1798
+ "nbytes": 6291456,
1799
+ "byteOffset": 0
1800
+ },
1801
+ {
1802
+ "name": "model.layers.26.mlp.gate_up_proj.weight",
1803
+ "shape": [
1804
+ 6144,
1805
+ 1024
1806
+ ],
1807
+ "dtype": "float32",
1808
+ "format": "f32-to-bf16",
1809
+ "nbytes": 12582912,
1810
+ "byteOffset": 6291456
1811
+ },
1812
+ {
1813
+ "name": "model.layers.26.post_attention_layernorm.weight",
1814
+ "shape": [
1815
+ 1024
1816
+ ],
1817
+ "dtype": "float32",
1818
+ "format": "f32-to-bf16",
1819
+ "nbytes": 2048,
1820
+ "byteOffset": 18874368
1821
+ },
1822
+ {
1823
+ "name": "model.layers.26.self_attn.k_norm.weight",
1824
+ "shape": [
1825
+ 128
1826
+ ],
1827
+ "dtype": "float32",
1828
+ "format": "f32-to-bf16",
1829
+ "nbytes": 256,
1830
+ "byteOffset": 18876416
1831
+ },
1832
+ {
1833
+ "name": "model.layers.26.self_attn.c_attn.weight",
1834
+ "shape": [
1835
+ 4096,
1836
+ 1024
1837
+ ],
1838
+ "dtype": "float32",
1839
+ "format": "f32-to-bf16",
1840
+ "nbytes": 8388608,
1841
+ "byteOffset": 18876672
1842
+ },
1843
+ {
1844
+ "name": "model.layers.26.self_attn.o_proj.weight",
1845
+ "shape": [
1846
+ 1024,
1847
+ 2048
1848
+ ],
1849
+ "dtype": "float32",
1850
+ "format": "f32-to-bf16",
1851
+ "nbytes": 4194304,
1852
+ "byteOffset": 27265280
1853
+ },
1854
+ {
1855
+ "name": "model.layers.26.self_attn.q_norm.weight",
1856
+ "shape": [
1857
+ 128
1858
+ ],
1859
+ "dtype": "float32",
1860
+ "format": "f32-to-bf16",
1861
+ "nbytes": 256,
1862
+ "byteOffset": 31459584
1863
+ },
1864
+ {
1865
+ "name": "model.layers.27.input_layernorm.weight",
1866
+ "shape": [
1867
+ 1024
1868
+ ],
1869
+ "dtype": "float32",
1870
+ "format": "f32-to-bf16",
1871
+ "nbytes": 2048,
1872
+ "byteOffset": 31459840
1873
+ }
1874
+ ],
1875
+ "md5sum": "9d135333707ef2f4acb7dd7cf9326f25"
1876
+ },
1877
+ {
1878
+ "dataPath": "params_shard_21.bin",
1879
+ "format": "raw-shard",
1880
+ "nbytes": 31461888,
1881
+ "records": [
1882
+ {
1883
+ "name": "model.layers.27.mlp.down_proj.weight",
1884
+ "shape": [
1885
+ 1024,
1886
+ 3072
1887
+ ],
1888
+ "dtype": "float32",
1889
+ "format": "f32-to-bf16",
1890
+ "nbytes": 6291456,
1891
+ "byteOffset": 0
1892
+ },
1893
+ {
1894
+ "name": "model.layers.27.mlp.gate_up_proj.weight",
1895
+ "shape": [
1896
+ 6144,
1897
+ 1024
1898
+ ],
1899
+ "dtype": "float32",
1900
+ "format": "f32-to-bf16",
1901
+ "nbytes": 12582912,
1902
+ "byteOffset": 6291456
1903
+ },
1904
+ {
1905
+ "name": "model.layers.27.post_attention_layernorm.weight",
1906
+ "shape": [
1907
+ 1024
1908
+ ],
1909
+ "dtype": "float32",
1910
+ "format": "f32-to-bf16",
1911
+ "nbytes": 2048,
1912
+ "byteOffset": 18874368
1913
+ },
1914
+ {
1915
+ "name": "model.layers.27.self_attn.k_norm.weight",
1916
+ "shape": [
1917
+ 128
1918
+ ],
1919
+ "dtype": "float32",
1920
+ "format": "f32-to-bf16",
1921
+ "nbytes": 256,
1922
+ "byteOffset": 18876416
1923
+ },
1924
+ {
1925
+ "name": "model.layers.27.self_attn.c_attn.weight",
1926
+ "shape": [
1927
+ 4096,
1928
+ 1024
1929
+ ],
1930
+ "dtype": "float32",
1931
+ "format": "f32-to-bf16",
1932
+ "nbytes": 8388608,
1933
+ "byteOffset": 18876672
1934
+ },
1935
+ {
1936
+ "name": "model.layers.27.self_attn.o_proj.weight",
1937
+ "shape": [
1938
+ 1024,
1939
+ 2048
1940
+ ],
1941
+ "dtype": "float32",
1942
+ "format": "f32-to-bf16",
1943
+ "nbytes": 4194304,
1944
+ "byteOffset": 27265280
1945
+ },
1946
+ {
1947
+ "name": "model.layers.27.self_attn.q_norm.weight",
1948
+ "shape": [
1949
+ 128
1950
+ ],
1951
+ "dtype": "float32",
1952
+ "format": "f32-to-bf16",
1953
+ "nbytes": 256,
1954
+ "byteOffset": 31459584
1955
+ },
1956
+ {
1957
+ "name": "model.layers.3.input_layernorm.weight",
1958
+ "shape": [
1959
+ 1024
1960
+ ],
1961
+ "dtype": "float32",
1962
+ "format": "f32-to-bf16",
1963
+ "nbytes": 2048,
1964
+ "byteOffset": 31459840
1965
+ }
1966
+ ],
1967
+ "md5sum": "e07ec46dc1988b87491a60c6e922c02a"
1968
+ },
1969
+ {
1970
+ "dataPath": "params_shard_22.bin",
1971
+ "format": "raw-shard",
1972
+ "nbytes": 31461888,
1973
+ "records": [
1974
+ {
1975
+ "name": "model.layers.3.mlp.down_proj.weight",
1976
+ "shape": [
1977
+ 1024,
1978
+ 3072
1979
+ ],
1980
+ "dtype": "float32",
1981
+ "format": "f32-to-bf16",
1982
+ "nbytes": 6291456,
1983
+ "byteOffset": 0
1984
+ },
1985
+ {
1986
+ "name": "model.layers.3.mlp.gate_up_proj.weight",
1987
+ "shape": [
1988
+ 6144,
1989
+ 1024
1990
+ ],
1991
+ "dtype": "float32",
1992
+ "format": "f32-to-bf16",
1993
+ "nbytes": 12582912,
1994
+ "byteOffset": 6291456
1995
+ },
1996
+ {
1997
+ "name": "model.layers.3.post_attention_layernorm.weight",
1998
+ "shape": [
1999
+ 1024
2000
+ ],
2001
+ "dtype": "float32",
2002
+ "format": "f32-to-bf16",
2003
+ "nbytes": 2048,
2004
+ "byteOffset": 18874368
2005
+ },
2006
+ {
2007
+ "name": "model.layers.3.self_attn.k_norm.weight",
2008
+ "shape": [
2009
+ 128
2010
+ ],
2011
+ "dtype": "float32",
2012
+ "format": "f32-to-bf16",
2013
+ "nbytes": 256,
2014
+ "byteOffset": 18876416
2015
+ },
2016
+ {
2017
+ "name": "model.layers.3.self_attn.c_attn.weight",
2018
+ "shape": [
2019
+ 4096,
2020
+ 1024
2021
+ ],
2022
+ "dtype": "float32",
2023
+ "format": "f32-to-bf16",
2024
+ "nbytes": 8388608,
2025
+ "byteOffset": 18876672
2026
+ },
2027
+ {
2028
+ "name": "model.layers.3.self_attn.o_proj.weight",
2029
+ "shape": [
2030
+ 1024,
2031
+ 2048
2032
+ ],
2033
+ "dtype": "float32",
2034
+ "format": "f32-to-bf16",
2035
+ "nbytes": 4194304,
2036
+ "byteOffset": 27265280
2037
+ },
2038
+ {
2039
+ "name": "model.layers.3.self_attn.q_norm.weight",
2040
+ "shape": [
2041
+ 128
2042
+ ],
2043
+ "dtype": "float32",
2044
+ "format": "f32-to-bf16",
2045
+ "nbytes": 256,
2046
+ "byteOffset": 31459584
2047
+ },
2048
+ {
2049
+ "name": "model.layers.4.input_layernorm.weight",
2050
+ "shape": [
2051
+ 1024
2052
+ ],
2053
+ "dtype": "float32",
2054
+ "format": "f32-to-bf16",
2055
+ "nbytes": 2048,
2056
+ "byteOffset": 31459840
2057
+ }
2058
+ ],
2059
+ "md5sum": "f0b4915b859259222e9787ee1e7b354d"
2060
+ },
2061
+ {
2062
+ "dataPath": "params_shard_23.bin",
2063
+ "format": "raw-shard",
2064
+ "nbytes": 31461888,
2065
+ "records": [
2066
+ {
2067
+ "name": "model.layers.4.mlp.down_proj.weight",
2068
+ "shape": [
2069
+ 1024,
2070
+ 3072
2071
+ ],
2072
+ "dtype": "float32",
2073
+ "format": "f32-to-bf16",
2074
+ "nbytes": 6291456,
2075
+ "byteOffset": 0
2076
+ },
2077
+ {
2078
+ "name": "model.layers.4.mlp.gate_up_proj.weight",
2079
+ "shape": [
2080
+ 6144,
2081
+ 1024
2082
+ ],
2083
+ "dtype": "float32",
2084
+ "format": "f32-to-bf16",
2085
+ "nbytes": 12582912,
2086
+ "byteOffset": 6291456
2087
+ },
2088
+ {
2089
+ "name": "model.layers.4.post_attention_layernorm.weight",
2090
+ "shape": [
2091
+ 1024
2092
+ ],
2093
+ "dtype": "float32",
2094
+ "format": "f32-to-bf16",
2095
+ "nbytes": 2048,
2096
+ "byteOffset": 18874368
2097
+ },
2098
+ {
2099
+ "name": "model.layers.4.self_attn.k_norm.weight",
2100
+ "shape": [
2101
+ 128
2102
+ ],
2103
+ "dtype": "float32",
2104
+ "format": "f32-to-bf16",
2105
+ "nbytes": 256,
2106
+ "byteOffset": 18876416
2107
+ },
2108
+ {
2109
+ "name": "model.layers.4.self_attn.c_attn.weight",
2110
+ "shape": [
2111
+ 4096,
2112
+ 1024
2113
+ ],
2114
+ "dtype": "float32",
2115
+ "format": "f32-to-bf16",
2116
+ "nbytes": 8388608,
2117
+ "byteOffset": 18876672
2118
+ },
2119
+ {
2120
+ "name": "model.layers.4.self_attn.o_proj.weight",
2121
+ "shape": [
2122
+ 1024,
2123
+ 2048
2124
+ ],
2125
+ "dtype": "float32",
2126
+ "format": "f32-to-bf16",
2127
+ "nbytes": 4194304,
2128
+ "byteOffset": 27265280
2129
+ },
2130
+ {
2131
+ "name": "model.layers.4.self_attn.q_norm.weight",
2132
+ "shape": [
2133
+ 128
2134
+ ],
2135
+ "dtype": "float32",
2136
+ "format": "f32-to-bf16",
2137
+ "nbytes": 256,
2138
+ "byteOffset": 31459584
2139
+ },
2140
+ {
2141
+ "name": "model.layers.5.input_layernorm.weight",
2142
+ "shape": [
2143
+ 1024
2144
+ ],
2145
+ "dtype": "float32",
2146
+ "format": "f32-to-bf16",
2147
+ "nbytes": 2048,
2148
+ "byteOffset": 31459840
2149
+ }
2150
+ ],
2151
+ "md5sum": "890c68f7289867143b40975439f97c02"
2152
+ },
2153
+ {
2154
+ "dataPath": "params_shard_24.bin",
2155
+ "format": "raw-shard",
2156
+ "nbytes": 31461888,
2157
+ "records": [
2158
+ {
2159
+ "name": "model.layers.5.mlp.down_proj.weight",
2160
+ "shape": [
2161
+ 1024,
2162
+ 3072
2163
+ ],
2164
+ "dtype": "float32",
2165
+ "format": "f32-to-bf16",
2166
+ "nbytes": 6291456,
2167
+ "byteOffset": 0
2168
+ },
2169
+ {
2170
+ "name": "model.layers.5.mlp.gate_up_proj.weight",
2171
+ "shape": [
2172
+ 6144,
2173
+ 1024
2174
+ ],
2175
+ "dtype": "float32",
2176
+ "format": "f32-to-bf16",
2177
+ "nbytes": 12582912,
2178
+ "byteOffset": 6291456
2179
+ },
2180
+ {
2181
+ "name": "model.layers.5.post_attention_layernorm.weight",
2182
+ "shape": [
2183
+ 1024
2184
+ ],
2185
+ "dtype": "float32",
2186
+ "format": "f32-to-bf16",
2187
+ "nbytes": 2048,
2188
+ "byteOffset": 18874368
2189
+ },
2190
+ {
2191
+ "name": "model.layers.5.self_attn.k_norm.weight",
2192
+ "shape": [
2193
+ 128
2194
+ ],
2195
+ "dtype": "float32",
2196
+ "format": "f32-to-bf16",
2197
+ "nbytes": 256,
2198
+ "byteOffset": 18876416
2199
+ },
2200
+ {
2201
+ "name": "model.layers.5.self_attn.c_attn.weight",
2202
+ "shape": [
2203
+ 4096,
2204
+ 1024
2205
+ ],
2206
+ "dtype": "float32",
2207
+ "format": "f32-to-bf16",
2208
+ "nbytes": 8388608,
2209
+ "byteOffset": 18876672
2210
+ },
2211
+ {
2212
+ "name": "model.layers.5.self_attn.o_proj.weight",
2213
+ "shape": [
2214
+ 1024,
2215
+ 2048
2216
+ ],
2217
+ "dtype": "float32",
2218
+ "format": "f32-to-bf16",
2219
+ "nbytes": 4194304,
2220
+ "byteOffset": 27265280
2221
+ },
2222
+ {
2223
+ "name": "model.layers.5.self_attn.q_norm.weight",
2224
+ "shape": [
2225
+ 128
2226
+ ],
2227
+ "dtype": "float32",
2228
+ "format": "f32-to-bf16",
2229
+ "nbytes": 256,
2230
+ "byteOffset": 31459584
2231
+ },
2232
+ {
2233
+ "name": "model.layers.6.input_layernorm.weight",
2234
+ "shape": [
2235
+ 1024
2236
+ ],
2237
+ "dtype": "float32",
2238
+ "format": "f32-to-bf16",
2239
+ "nbytes": 2048,
2240
+ "byteOffset": 31459840
2241
+ }
2242
+ ],
2243
+ "md5sum": "4a2e6abf9a0ac27fed499bb02f61047b"
2244
+ },
2245
+ {
2246
+ "dataPath": "params_shard_25.bin",
2247
+ "format": "raw-shard",
2248
+ "nbytes": 31461888,
2249
+ "records": [
2250
+ {
2251
+ "name": "model.layers.6.mlp.down_proj.weight",
2252
+ "shape": [
2253
+ 1024,
2254
+ 3072
2255
+ ],
2256
+ "dtype": "float32",
2257
+ "format": "f32-to-bf16",
2258
+ "nbytes": 6291456,
2259
+ "byteOffset": 0
2260
+ },
2261
+ {
2262
+ "name": "model.layers.6.mlp.gate_up_proj.weight",
2263
+ "shape": [
2264
+ 6144,
2265
+ 1024
2266
+ ],
2267
+ "dtype": "float32",
2268
+ "format": "f32-to-bf16",
2269
+ "nbytes": 12582912,
2270
+ "byteOffset": 6291456
2271
+ },
2272
+ {
2273
+ "name": "model.layers.6.post_attention_layernorm.weight",
2274
+ "shape": [
2275
+ 1024
2276
+ ],
2277
+ "dtype": "float32",
2278
+ "format": "f32-to-bf16",
2279
+ "nbytes": 2048,
2280
+ "byteOffset": 18874368
2281
+ },
2282
+ {
2283
+ "name": "model.layers.6.self_attn.k_norm.weight",
2284
+ "shape": [
2285
+ 128
2286
+ ],
2287
+ "dtype": "float32",
2288
+ "format": "f32-to-bf16",
2289
+ "nbytes": 256,
2290
+ "byteOffset": 18876416
2291
+ },
2292
+ {
2293
+ "name": "model.layers.6.self_attn.c_attn.weight",
2294
+ "shape": [
2295
+ 4096,
2296
+ 1024
2297
+ ],
2298
+ "dtype": "float32",
2299
+ "format": "f32-to-bf16",
2300
+ "nbytes": 8388608,
2301
+ "byteOffset": 18876672
2302
+ },
2303
+ {
2304
+ "name": "model.layers.6.self_attn.o_proj.weight",
2305
+ "shape": [
2306
+ 1024,
2307
+ 2048
2308
+ ],
2309
+ "dtype": "float32",
2310
+ "format": "f32-to-bf16",
2311
+ "nbytes": 4194304,
2312
+ "byteOffset": 27265280
2313
+ },
2314
+ {
2315
+ "name": "model.layers.6.self_attn.q_norm.weight",
2316
+ "shape": [
2317
+ 128
2318
+ ],
2319
+ "dtype": "float32",
2320
+ "format": "f32-to-bf16",
2321
+ "nbytes": 256,
2322
+ "byteOffset": 31459584
2323
+ },
2324
+ {
2325
+ "name": "model.layers.7.input_layernorm.weight",
2326
+ "shape": [
2327
+ 1024
2328
+ ],
2329
+ "dtype": "float32",
2330
+ "format": "f32-to-bf16",
2331
+ "nbytes": 2048,
2332
+ "byteOffset": 31459840
2333
+ }
2334
+ ],
2335
+ "md5sum": "bbf21bdcb33c7b1dfaf7279672693cc2"
2336
+ },
2337
+ {
2338
+ "dataPath": "params_shard_26.bin",
2339
+ "format": "raw-shard",
2340
+ "nbytes": 31461888,
2341
+ "records": [
2342
+ {
2343
+ "name": "model.layers.7.mlp.down_proj.weight",
2344
+ "shape": [
2345
+ 1024,
2346
+ 3072
2347
+ ],
2348
+ "dtype": "float32",
2349
+ "format": "f32-to-bf16",
2350
+ "nbytes": 6291456,
2351
+ "byteOffset": 0
2352
+ },
2353
+ {
2354
+ "name": "model.layers.7.mlp.gate_up_proj.weight",
2355
+ "shape": [
2356
+ 6144,
2357
+ 1024
2358
+ ],
2359
+ "dtype": "float32",
2360
+ "format": "f32-to-bf16",
2361
+ "nbytes": 12582912,
2362
+ "byteOffset": 6291456
2363
+ },
2364
+ {
2365
+ "name": "model.layers.7.post_attention_layernorm.weight",
2366
+ "shape": [
2367
+ 1024
2368
+ ],
2369
+ "dtype": "float32",
2370
+ "format": "f32-to-bf16",
2371
+ "nbytes": 2048,
2372
+ "byteOffset": 18874368
2373
+ },
2374
+ {
2375
+ "name": "model.layers.7.self_attn.k_norm.weight",
2376
+ "shape": [
2377
+ 128
2378
+ ],
2379
+ "dtype": "float32",
2380
+ "format": "f32-to-bf16",
2381
+ "nbytes": 256,
2382
+ "byteOffset": 18876416
2383
+ },
2384
+ {
2385
+ "name": "model.layers.7.self_attn.c_attn.weight",
2386
+ "shape": [
2387
+ 4096,
2388
+ 1024
2389
+ ],
2390
+ "dtype": "float32",
2391
+ "format": "f32-to-bf16",
2392
+ "nbytes": 8388608,
2393
+ "byteOffset": 18876672
2394
+ },
2395
+ {
2396
+ "name": "model.layers.7.self_attn.o_proj.weight",
2397
+ "shape": [
2398
+ 1024,
2399
+ 2048
2400
+ ],
2401
+ "dtype": "float32",
2402
+ "format": "f32-to-bf16",
2403
+ "nbytes": 4194304,
2404
+ "byteOffset": 27265280
2405
+ },
2406
+ {
2407
+ "name": "model.layers.7.self_attn.q_norm.weight",
2408
+ "shape": [
2409
+ 128
2410
+ ],
2411
+ "dtype": "float32",
2412
+ "format": "f32-to-bf16",
2413
+ "nbytes": 256,
2414
+ "byteOffset": 31459584
2415
+ },
2416
+ {
2417
+ "name": "model.layers.8.input_layernorm.weight",
2418
+ "shape": [
2419
+ 1024
2420
+ ],
2421
+ "dtype": "float32",
2422
+ "format": "f32-to-bf16",
2423
+ "nbytes": 2048,
2424
+ "byteOffset": 31459840
2425
+ }
2426
+ ],
2427
+ "md5sum": "d0db88ea1fdab01e768d9ee5a4cc24e1"
2428
+ },
2429
+ {
2430
+ "dataPath": "params_shard_27.bin",
2431
+ "format": "raw-shard",
2432
+ "nbytes": 31461888,
2433
+ "records": [
2434
+ {
2435
+ "name": "model.layers.8.mlp.down_proj.weight",
2436
+ "shape": [
2437
+ 1024,
2438
+ 3072
2439
+ ],
2440
+ "dtype": "float32",
2441
+ "format": "f32-to-bf16",
2442
+ "nbytes": 6291456,
2443
+ "byteOffset": 0
2444
+ },
2445
+ {
2446
+ "name": "model.layers.8.mlp.gate_up_proj.weight",
2447
+ "shape": [
2448
+ 6144,
2449
+ 1024
2450
+ ],
2451
+ "dtype": "float32",
2452
+ "format": "f32-to-bf16",
2453
+ "nbytes": 12582912,
2454
+ "byteOffset": 6291456
2455
+ },
2456
+ {
2457
+ "name": "model.layers.8.post_attention_layernorm.weight",
2458
+ "shape": [
2459
+ 1024
2460
+ ],
2461
+ "dtype": "float32",
2462
+ "format": "f32-to-bf16",
2463
+ "nbytes": 2048,
2464
+ "byteOffset": 18874368
2465
+ },
2466
+ {
2467
+ "name": "model.layers.8.self_attn.k_norm.weight",
2468
+ "shape": [
2469
+ 128
2470
+ ],
2471
+ "dtype": "float32",
2472
+ "format": "f32-to-bf16",
2473
+ "nbytes": 256,
2474
+ "byteOffset": 18876416
2475
+ },
2476
+ {
2477
+ "name": "model.layers.8.self_attn.c_attn.weight",
2478
+ "shape": [
2479
+ 4096,
2480
+ 1024
2481
+ ],
2482
+ "dtype": "float32",
2483
+ "format": "f32-to-bf16",
2484
+ "nbytes": 8388608,
2485
+ "byteOffset": 18876672
2486
+ },
2487
+ {
2488
+ "name": "model.layers.8.self_attn.o_proj.weight",
2489
+ "shape": [
2490
+ 1024,
2491
+ 2048
2492
+ ],
2493
+ "dtype": "float32",
2494
+ "format": "f32-to-bf16",
2495
+ "nbytes": 4194304,
2496
+ "byteOffset": 27265280
2497
+ },
2498
+ {
2499
+ "name": "model.layers.8.self_attn.q_norm.weight",
2500
+ "shape": [
2501
+ 128
2502
+ ],
2503
+ "dtype": "float32",
2504
+ "format": "f32-to-bf16",
2505
+ "nbytes": 256,
2506
+ "byteOffset": 31459584
2507
+ },
2508
+ {
2509
+ "name": "model.layers.9.input_layernorm.weight",
2510
+ "shape": [
2511
+ 1024
2512
+ ],
2513
+ "dtype": "float32",
2514
+ "format": "f32-to-bf16",
2515
+ "nbytes": 2048,
2516
+ "byteOffset": 31459840
2517
+ }
2518
+ ],
2519
+ "md5sum": "3a05a1acf71722e4ad3b000b324a8649"
2520
+ },
2521
+ {
2522
+ "dataPath": "params_shard_28.bin",
2523
+ "format": "raw-shard",
2524
+ "nbytes": 31461888,
2525
+ "records": [
2526
+ {
2527
+ "name": "model.layers.9.mlp.down_proj.weight",
2528
+ "shape": [
2529
+ 1024,
2530
+ 3072
2531
+ ],
2532
+ "dtype": "float32",
2533
+ "format": "f32-to-bf16",
2534
+ "nbytes": 6291456,
2535
+ "byteOffset": 0
2536
+ },
2537
+ {
2538
+ "name": "model.layers.9.mlp.gate_up_proj.weight",
2539
+ "shape": [
2540
+ 6144,
2541
+ 1024
2542
+ ],
2543
+ "dtype": "float32",
2544
+ "format": "f32-to-bf16",
2545
+ "nbytes": 12582912,
2546
+ "byteOffset": 6291456
2547
+ },
2548
+ {
2549
+ "name": "model.layers.9.post_attention_layernorm.weight",
2550
+ "shape": [
2551
+ 1024
2552
+ ],
2553
+ "dtype": "float32",
2554
+ "format": "f32-to-bf16",
2555
+ "nbytes": 2048,
2556
+ "byteOffset": 18874368
2557
+ },
2558
+ {
2559
+ "name": "model.layers.9.self_attn.k_norm.weight",
2560
+ "shape": [
2561
+ 128
2562
+ ],
2563
+ "dtype": "float32",
2564
+ "format": "f32-to-bf16",
2565
+ "nbytes": 256,
2566
+ "byteOffset": 18876416
2567
+ },
2568
+ {
2569
+ "name": "model.layers.9.self_attn.c_attn.weight",
2570
+ "shape": [
2571
+ 4096,
2572
+ 1024
2573
+ ],
2574
+ "dtype": "float32",
2575
+ "format": "f32-to-bf16",
2576
+ "nbytes": 8388608,
2577
+ "byteOffset": 18876672
2578
+ },
2579
+ {
2580
+ "name": "model.layers.9.self_attn.o_proj.weight",
2581
+ "shape": [
2582
+ 1024,
2583
+ 2048
2584
+ ],
2585
+ "dtype": "float32",
2586
+ "format": "f32-to-bf16",
2587
+ "nbytes": 4194304,
2588
+ "byteOffset": 27265280
2589
+ },
2590
+ {
2591
+ "name": "model.layers.9.self_attn.q_norm.weight",
2592
+ "shape": [
2593
+ 128
2594
+ ],
2595
+ "dtype": "float32",
2596
+ "format": "f32-to-bf16",
2597
+ "nbytes": 256,
2598
+ "byteOffset": 31459584
2599
+ },
2600
+ {
2601
+ "name": "model.norm.weight",
2602
+ "shape": [
2603
+ 1024
2604
+ ],
2605
+ "dtype": "float32",
2606
+ "format": "f32-to-bf16",
2607
+ "nbytes": 2048,
2608
+ "byteOffset": 31459840
2609
+ }
2610
+ ],
2611
+ "md5sum": "195d11d45fdbd6e214234ed298ed0684"
2612
+ }
2613
+ ]
2614
+ }
params_shard_0.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8f29acf519434862d95613b2b4f6b9d14933a5e4d16baebf8ac0b33b410acfb6
3
+ size 311164928
params_shard_1.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:83233c1b455dc0e36429411f083d6f38e124e7e58cd3cc4436430384314e3785
3
+ size 31463936
params_shard_10.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2d46238375595a7f76565bdee18731ea6031f921285ef4175c874d2c555198fa
3
+ size 31461888
params_shard_11.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b25e0e839da541c657822508d036b1da3b6a1913dd5338c4fe11a536ed1ea9b9
3
+ size 31461888
params_shard_12.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ab32a220d6f2a6a42c370b6bf2565ef6f1bf6a5c51b8424ef4d19325d7399830
3
+ size 31461888
params_shard_13.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b457e2caf9cee635946cd1b582840404a5671296c3e8fa3c729bf51f34ed49f5
3
+ size 31461888
params_shard_14.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c9cc626a7849d73934ab0edf64a94c4cbf5b2e7d3cbd46ba89ffc22c0b96217c
3
+ size 31461888
params_shard_15.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:49d6b6ce2642cf2320499d227c8a6a853d638a417d694516ea27511c261517fa
3
+ size 31461888
params_shard_16.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37cd57a9f8ab682973bac0a50be1592e08fa0d908382dedb60b3052bf6f85a74
3
+ size 31461888
params_shard_17.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9904b7eff91c0dccf6441b701d9e4191a8ab28d1e44724ce7cb994caae1f1c88
3
+ size 31461888
params_shard_18.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b9c958f5c1aacaeef99ae7b209b95bf3324d7fb321b49e3c8fa3c32dc3b8b67c
3
+ size 31461888
params_shard_19.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cee4cd069eea253b55db5c1e535f78f1be980023f36f48947702a202b3c09df8
3
+ size 31461888
params_shard_2.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f24490f63c81855bbfb9ff2e5e0048ce09b8ddd920ca77943373b827e45a09d
3
+ size 31461888
params_shard_20.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a1dda13b5bcec9840a31dc7abb3582634dacbe74db06f983bf432bccbf245be2
3
+ size 31461888
params_shard_21.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a926f3dbcd9e932a0033178e7dafe66e9d228fc4cb33c25e498b37fb816391f4
3
+ size 31461888
params_shard_22.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:afb0d211337d60e78f0bea4d349b8cfaa7ceea823f9cee4a70c25516183b6c50
3
+ size 31461888
params_shard_23.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5140fcdbf253e64a2dbe06d0251fb997b99005a00f8d19f6d4ee3886dfc74a30
3
+ size 31461888
params_shard_24.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d63037f1b3a041f4fbce9111a15c268bb09ca799fadfc318f1c8ed30db8c9ef1
3
+ size 31461888
params_shard_25.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02846856321c4e2d8223a7789588920db249eab99154d7dbfb906bf5f36221f2
3
+ size 31461888
params_shard_26.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:39e3051af3273a68591adcfdb8f2aa977700a5917d1d418ff2dce3ca96a57114
3
+ size 31461888
params_shard_27.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c0e033e67370325024a1b07d16e583c5b09f864b0f1658c34765b96cd345e994
3
+ size 31461888
params_shard_28.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:805038254db61b39a0e6a05180168fbf369b30d3f1590151d928477b7d989106
3
+ size 31461888
params_shard_3.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:23194f922c092e5b83cdaf3138dff240ac04af03d04c3093a6c9ef3500f53b0d
3
+ size 31461888
params_shard_4.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d8d07536a9adc381c13b8ef4ce9ba294b475ab62b9c254c5e00bbe2795f9509
3
+ size 31461888
params_shard_5.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eff5f43af8357851054b54776b953c0af3e67ee8aeb90ed9d911c578fb5facfb
3
+ size 31461888
params_shard_6.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f99d42a1ab0aae875342f066edc5b2c6d13688b783c85090f6076976b1f61982
3
+ size 31461888
params_shard_7.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cf9e34dd5021363c0378bd22c50051526e373540828751ec5a8768e8d4b50706
3
+ size 31461888
params_shard_8.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:29f2eba9f5ac79fa2d1559f424a13148cd1fa07e064a7e7096ee706be8ce50d6
3
+ size 31461888
params_shard_9.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:de9a02593a91d671810796b2d02063b6eb0795a7ca792fb46d134a26364d1460
3
+ size 31461888
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
3
+ size 11422654
tokenizer_config.json ADDED
@@ -0,0 +1,239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<tool_response>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "151666": {
190
+ "content": "</tool_response>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "151667": {
198
+ "content": "<think>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "151668": {
206
+ "content": "</think>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ }
213
+ },
214
+ "additional_special_tokens": [
215
+ "<|im_start|>",
216
+ "<|im_end|>",
217
+ "<|object_ref_start|>",
218
+ "<|object_ref_end|>",
219
+ "<|box_start|>",
220
+ "<|box_end|>",
221
+ "<|quad_start|>",
222
+ "<|quad_end|>",
223
+ "<|vision_start|>",
224
+ "<|vision_end|>",
225
+ "<|vision_pad|>",
226
+ "<|image_pad|>",
227
+ "<|video_pad|>"
228
+ ],
229
+ "bos_token": null,
230
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set content = message.content %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is defined and message.reasoning_content is not none %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in message.content %}\n {%- set content = message.content.split('</think>')[-1].lstrip('\\n') %}\n {%- set reasoning_content = message.content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- endif %}\n{%- endif %}",
231
+ "clean_up_tokenization_spaces": false,
232
+ "eos_token": "<|im_end|>",
233
+ "errors": "replace",
234
+ "model_max_length": 131072,
235
+ "pad_token": "<|endoftext|>",
236
+ "split_special_tokens": false,
237
+ "tokenizer_class": "Qwen2Tokenizer",
238
+ "unk_token": null
239
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff