mrfakename zR commited on
Commit
3ff914b
·
verified ·
0 Parent(s):

Duplicate from THUDM/GLM-Z1-Rumination-32B-0414

Browse files

Co-authored-by: zR <[email protected]>

.gitattributes ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Zhipu AI
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md ADDED
@@ -0,0 +1,133 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - zh
5
+ - en
6
+ pipeline_tag: text-generation
7
+ library_name: transformers
8
+ ---
9
+
10
+ # GLM-4-Z1-Rumination-32B-0414
11
+
12
+ ## Introduction
13
+
14
+ The GLM family welcomes a new generation of open-source models, the **GLM-4-32B-0414** series, featuring 32 billion parameters. Its performance is comparable to OpenAI's GPT series and DeepSeek's V3/R1 series, and it supports very user-friendly local deployment features. GLM-4-32B-Base-0414 was pre-trained on 15T of high-quality data, including a large amount of reasoning-type synthetic data, laying the foundation for subsequent reinforcement learning extensions. In the post-training stage, in addition to human preference alignment for dialogue scenarios, we also enhanced the model's performance in instruction following, engineering code, and function calling using techniques such as rejection sampling and reinforcement learning, strengthening the atomic capabilities required for agent tasks. GLM-4-32B-0414 achieves good results in areas such as engineering code, Artifact generation, function calling, search-based Q&A, and report generation. Some benchmarks even rival larger models like GPT-4o and DeepSeek-V3-0324 (671B).
15
+
16
+ **GLM-Z1-32B-0414** is a reasoning model with **deep thinking capabilities**. This was developed based on GLM-4-32B-0414 through cold start and extended reinforcement learning, as well as further training of the model on tasks involving mathematics, code, and logic. Compared to the base model, GLM-Z1-32B-0414 significantly improves mathematical abilities and the capability to solve complex tasks. During the training process, we also introduced general reinforcement learning based on pairwise ranking feedback, further enhancing the model's general capabilities.
17
+
18
+ **GLM-Z1-Rumination-32B-0414** is a deep reasoning model with **rumination capabilities** (benchmarked against OpenAI's Deep Research). Unlike typical deep thinking models, the rumination model employs longer periods of deep thought to solve more open-ended and complex problems (e.g., writing a comparative analysis of AI development in two cities and their future development plans). The rumination model integrates search tools during its deep thinking process to handle complex tasks and is trained by utilizing multiple rule-based rewards to guide and extend end-to-end reinforcement learning. Z1-Rumination shows significant improvements in research-style writing and complex retrieval tasks.
19
+
20
+ Finally, **GLM-Z1-9B-0414** is a surprise. We employed the aforementioned series of techniques to train a 9B small-sized model that maintains the open-source tradition. Despite its smaller scale, GLM-Z1-9B-0414 still exhibits excellent capabilities in mathematical reasoning and general tasks. Its overall performance is already at a leading level among open-source models of the same size. Especially in resource-constrained scenarios, this model achieves an excellent balance between efficiency and effectiveness, providing a powerful option for users seeking lightweight deployment.
21
+
22
+ ## Inference Code
23
+
24
+ Make Sure Using `transforemrs>=4.51.3`.
25
+
26
+ ```python
27
+ from transformers import AutoModelForCausalLM, AutoTokenizer
28
+
29
+ MODEL_PATH = "THUDM/GLM-Z1-Rumination-32B-0414"
30
+
31
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
32
+ model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, device_map="auto")
33
+
34
+ message = [{"role": "user", "content": "Let a, b be positive real numbers such that ab = a + b + 3. Determine the range of possible values for a + b."}]
35
+
36
+ inputs = tokenizer.apply_chat_template(
37
+ message,
38
+ return_tensors="pt",
39
+ add_generation_prompt=True,
40
+ return_dict=True,
41
+ ).to(model.device)
42
+
43
+ generate_kwargs = {
44
+ "input_ids": inputs["input_ids"],
45
+ "attention_mask": inputs["attention_mask"],
46
+ "temperature": 0.95,
47
+ "top_p": 0.7,
48
+ "do_sample": True,
49
+ }
50
+ out = model.generate(**generate_kwargs)
51
+ print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
52
+ ```
53
+
54
+ ## Function Call
55
+
56
+ By default, this model currently supports the following `function` calls:
57
+ - `search`: Search using a keyword and return search results
58
+ - `click`: Click on a specific webpage in the search results to view details
59
+ - `open`: Open a fixed URL to view detailed content
60
+ - `finsih`: Complete information gathering and begin writing
61
+
62
+ Below is a simple workflow to help you quickly connect the pipeline.
63
+
64
+ ```python
65
+ from transformers import AutoModelForCausalLM, AutoTokenizer
66
+
67
+ MODEL_PATH = "THUDM/GLM-Z1-Rumination-32B-0414"
68
+
69
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
70
+ model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, device_map="auto")
71
+
72
+ messages = [{"role": "user", "content": "Let a, b be positive real numbers such that ab = a + b + 3. Determine the range of possible values for a + b."}]
73
+
74
+ generate_kwargs = {
75
+ "temperature": 0.95,
76
+ "top_p": 0.7,
77
+ "do_sample": True,
78
+ }
79
+
80
+ def get_assistant():
81
+ inputs = tokenizer.apply_chat_template(
82
+ messages,
83
+ return_tensors="pt",
84
+ add_generation_prompt=True,
85
+ return_dict=True,
86
+ ).to(model.device)
87
+ out = model.generate(input_ids=input["input_ids"], **generate_kwargs)
88
+ return tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True).strip()
89
+
90
+ def get_observation(function_name, args):
91
+ if fucntion_name == "search":
92
+ mock_search_res = [
93
+ {"title": "t1", "url":"url1", "snippet": "snippet_content_1"},
94
+ {"title": "t2", "url":"url2", "snippet": "snippet_content_2"}
95
+ ]
96
+ content = "\n\n".join([f"【{i}†{res['title']}†{res['url']}\n{res['snippet']}】"] for i, res in mock_search_res)
97
+ elif function_name == "click":
98
+ mock_click_res = "main content"
99
+ content = mock_click_res
100
+ elif function_name == "open":
101
+ mock_open_res = "main_content"
102
+ content = mock_open_res
103
+ else:
104
+ raise ValueError("unspport function name!")
105
+
106
+ def get_func_name_args(llm_text):
107
+ function_call = re.sub(r'.*?</think>', '', llm_text, flags=re.DOTALL)
108
+ function_call = json.loads(function_call)
109
+ action = function_call['name']
110
+ params = function_call['arguments']
111
+ return action, params
112
+
113
+ def pipeline():
114
+ end_str = "{\"name\": \"finish\", \"arguments\": {}}"
115
+ response = get_assistant()
116
+ messages.append({"role": "assistant", "content": response})
117
+ max_turns, turns = 35, 1
118
+ while not response.endswith(end_str) and turns < max_turns:
119
+ action, params = get_func_name_args(response)
120
+ observation = get_observation(action, params)
121
+ messages.append({"role": "observation", "content": observation})
122
+ response = get_assistant()
123
+ messages.append({"role": "assistant", "content": response})
124
+ turns += 1
125
+
126
+ if response.endswith(end_str):
127
+ final_answer = get_assistant()
128
+ else:
129
+ final_answer = None
130
+ return final_answer
131
+
132
+ pipeline()
133
+ ```
chat_template.jinja ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [gMASK]<sop>
2
+ <|system|>
3
+ 你是一个专业的深度研究助手,通过提供的工具与模拟浏览器交互,来帮助用户完成深度信息调研和报告撰写任务。今年是 2025 年。
4
+
5
+ <核心要求>
6
+ - 首先分解用户请求,得到包含多个子要求的列表
7
+ - 制定初始研究计划
8
+ - 进行多轮迭代搜索和页面浏览(at least 10 function calls):
9
+ * 根据已获得的信息调整研究计划和关键词
10
+ * 打开页面阅读,从发现的内容中识别新的关键概念/名词
11
+ * 从搜索结果中提取新的关键词继续搜索
12
+ * 访问并仔细阅读相关页面,识别新的关键概念/名词
13
+
14
+ <重要配置>
15
+ - 采用语言
16
+ * 搜索关键词:英文
17
+ * 思考:英文
18
+
19
+ <可调用的工具列表>
20
+ [{"name": "search", "description": "Execute a search query and return search results. Use this function when you need to find information about a specific topic.", "parameters": {"type": "object", "properties": {"query": {"type": "string", "description": "Search query string, use English words unless it is a proper name in Chinese"}}, "required": ["query"], "additionalProperties": false}}, {"name": "click", "description": "Click a link in the search results and navigate to the corresponding page. Use this function when you need to view detailed content of a specific search result.", "parameters": {"type": "object", "properties": {"link_id": {"type": "integer", "description": "The link ID to click (from the sequence number in search results)"}}, "required": ["link_id"], "additionalProperties": false}}, {"name": "open", "description": "Open a specific website. Get content from any website with its URL.", "parameters": {"type": "object", "properties": {"url": {"type": "string", "description": "The target website URL or domain"}}, "required": ["url"], "additionalProperties": false}}, {"name": "finish", "description": "Finish the task. Use this function when you have found the information you need.", "parameters": {"type": "object", "properties": {}, "additionalProperties": false}}]
21
+
22
+ {%- for message in messages if message.role != 'system' %}
23
+ {%- set role = message['role'] %}
24
+ {%- set content = message['content'] %}
25
+ {%- set visible = content.split('</think>')[-1].strip() %}
26
+ {%- set meta = message.get("metadata", "") %}
27
+
28
+ {%- if role == 'user' %}
29
+ <|user|>
30
+ {{ visible }}
31
+ {%- elif role == 'assistant' and not meta %}
32
+ <|assistant|>
33
+ {{ visible }}
34
+ {%- elif role == 'assistant' and meta %}
35
+ <|assistant|>{{ meta }}
36
+ {{ visible }}
37
+ {%- elif role == 'observation' %}
38
+ <|observation|>
39
+ {{ visible }}
40
+ {%- endif %}
41
+ {%- endfor %}
42
+ {% if add_generation_prompt %}<|assistant|>{% endif %}
config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Glm4ForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "eos_token_id": [
8
+ 151329,
9
+ 151336,
10
+ 151338
11
+ ],
12
+ "head_dim": 128,
13
+ "hidden_act": "silu",
14
+ "hidden_size": 6144,
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 23040,
17
+ "max_position_embeddings": 131072,
18
+ "model_type": "glm4",
19
+ "num_attention_heads": 48,
20
+ "num_hidden_layers": 61,
21
+ "num_key_value_heads": 8,
22
+ "pad_token_id": 151329,
23
+ "partial_rotary_factor": 0.5,
24
+ "rms_norm_eps": 1e-05,
25
+ "rope_theta": 10000.0,
26
+ "tie_word_embeddings": false,
27
+ "torch_dtype": "bfloat16",
28
+ "transformers_version": "4.52.0.dev0",
29
+ "use_cache": true,
30
+ "vocab_size": 151552
31
+ }
generation_config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "eos_token_id": [
4
+ 151329,
5
+ 151336,
6
+ 151338
7
+ ],
8
+ "pad_token_id": 151329,
9
+ "transformers_version": "4.52.0.dev0"
10
+ }
model-00001-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8ede867f519f5cb55105d9406ac3b61fdfc8f40e792afc3ecadc30c46e4d4aca
3
+ size 4938944056
model-00002-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce6d5919f5aeee759556faccf42f3653b50d4a9481c3a247488b85a543b093eb
3
+ size 4844622992
model-00003-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8114c81fc02e2470d350fa0a2aea70fa057769e06bdb0d080be1db06a5a3da18
3
+ size 4561557104
model-00004-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b01cdb21f8d4c9a227de11ec435c0c0a2b9bc81a7290bc7268e1e9c40537998
3
+ size 4951627056
model-00005-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1812db71f7a31903c0c5418ab1a3adc91c55902c6852df26045ce4977ff7d8a8
3
+ size 4844623032
model-00006-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c31e145ca156c175876d4e8ec65de3a99c0d420f376b2a347aafc09eca541ce
3
+ size 4561557136
model-00007-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b8535e2cba43d7624cdd18da0d1a0aa28d0c2a9f97ec9518c4e367c03d328b56
3
+ size 4951627056
model-00008-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f86f8dbf644a65222ed713996ea9792dc96948bc5c79fd54e8de922828a3a05c
3
+ size 4844623032
model-00009-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:725c6717575715b7448169e2a7e30808a47c21cc43d8d32293e58f0ebe11065a
3
+ size 4561557136
model-00010-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aaedbfa4f3a0ed1dac5f046cbddfe7d823433425341bc52f6e26c2a82ba76a94
3
+ size 4951627056
model-00011-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a530952a68b1d7c63a8d8c64a79b7fd313331a56dea937b1b59cd3705d1c8eb
3
+ size 4844623032
model-00012-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d13b9fd7644d26cfa4e7b480c8230740ee62e94247c4349c0cd0c849e426380e
3
+ size 4561557136
model-00013-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ebe01bf46a66f0094eb25aa59eef8d144593d85aae29b65a8be232d21edbac14
3
+ size 4951627056
model-00014-of-00014.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a3f2a509726158c0e9a35dbfe3798bd4cc7035b0224c4f0569441d9cd960e137
3
+ size 3913398800
model.safetensors.index.json ADDED
@@ -0,0 +1,620 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 66283499520
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00014-of-00014.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00014.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00014.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00014.safetensors",
10
+ "model.layers.0.mlp.gate_up_proj.weight": "model-00001-of-00014.safetensors",
11
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00014.safetensors",
12
+ "model.layers.0.post_mlp_layernorm.weight": "model-00001-of-00014.safetensors",
13
+ "model.layers.0.post_self_attn_layernorm.weight": "model-00001-of-00014.safetensors",
14
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
15
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00014.safetensors",
16
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00014.safetensors",
17
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
18
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00014.safetensors",
19
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00014.safetensors",
20
+ "model.layers.1.mlp.gate_up_proj.weight": "model-00001-of-00014.safetensors",
21
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00014.safetensors",
22
+ "model.layers.1.post_mlp_layernorm.weight": "model-00001-of-00014.safetensors",
23
+ "model.layers.1.post_self_attn_layernorm.weight": "model-00001-of-00014.safetensors",
24
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
25
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00014.safetensors",
26
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00014.safetensors",
27
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
28
+ "model.layers.10.input_layernorm.weight": "model-00003-of-00014.safetensors",
29
+ "model.layers.10.mlp.down_proj.weight": "model-00003-of-00014.safetensors",
30
+ "model.layers.10.mlp.gate_up_proj.weight": "model-00003-of-00014.safetensors",
31
+ "model.layers.10.post_attention_layernorm.weight": "model-00003-of-00014.safetensors",
32
+ "model.layers.10.post_mlp_layernorm.weight": "model-00003-of-00014.safetensors",
33
+ "model.layers.10.post_self_attn_layernorm.weight": "model-00003-of-00014.safetensors",
34
+ "model.layers.10.self_attn.k_proj.weight": "model-00003-of-00014.safetensors",
35
+ "model.layers.10.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
36
+ "model.layers.10.self_attn.q_proj.weight": "model-00003-of-00014.safetensors",
37
+ "model.layers.10.self_attn.v_proj.weight": "model-00003-of-00014.safetensors",
38
+ "model.layers.11.input_layernorm.weight": "model-00003-of-00014.safetensors",
39
+ "model.layers.11.mlp.down_proj.weight": "model-00003-of-00014.safetensors",
40
+ "model.layers.11.mlp.gate_up_proj.weight": "model-00003-of-00014.safetensors",
41
+ "model.layers.11.post_attention_layernorm.weight": "model-00003-of-00014.safetensors",
42
+ "model.layers.11.post_mlp_layernorm.weight": "model-00003-of-00014.safetensors",
43
+ "model.layers.11.post_self_attn_layernorm.weight": "model-00003-of-00014.safetensors",
44
+ "model.layers.11.self_attn.k_proj.weight": "model-00003-of-00014.safetensors",
45
+ "model.layers.11.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
46
+ "model.layers.11.self_attn.q_proj.weight": "model-00003-of-00014.safetensors",
47
+ "model.layers.11.self_attn.v_proj.weight": "model-00003-of-00014.safetensors",
48
+ "model.layers.12.input_layernorm.weight": "model-00004-of-00014.safetensors",
49
+ "model.layers.12.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
50
+ "model.layers.12.mlp.gate_up_proj.weight": "model-00004-of-00014.safetensors",
51
+ "model.layers.12.post_attention_layernorm.weight": "model-00004-of-00014.safetensors",
52
+ "model.layers.12.post_mlp_layernorm.weight": "model-00004-of-00014.safetensors",
53
+ "model.layers.12.post_self_attn_layernorm.weight": "model-00004-of-00014.safetensors",
54
+ "model.layers.12.self_attn.k_proj.weight": "model-00003-of-00014.safetensors",
55
+ "model.layers.12.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
56
+ "model.layers.12.self_attn.q_proj.weight": "model-00003-of-00014.safetensors",
57
+ "model.layers.12.self_attn.v_proj.weight": "model-00003-of-00014.safetensors",
58
+ "model.layers.13.input_layernorm.weight": "model-00004-of-00014.safetensors",
59
+ "model.layers.13.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
60
+ "model.layers.13.mlp.gate_up_proj.weight": "model-00004-of-00014.safetensors",
61
+ "model.layers.13.post_attention_layernorm.weight": "model-00004-of-00014.safetensors",
62
+ "model.layers.13.post_mlp_layernorm.weight": "model-00004-of-00014.safetensors",
63
+ "model.layers.13.post_self_attn_layernorm.weight": "model-00004-of-00014.safetensors",
64
+ "model.layers.13.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
65
+ "model.layers.13.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
66
+ "model.layers.13.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
67
+ "model.layers.13.self_attn.v_proj.weight": "model-00004-of-00014.safetensors",
68
+ "model.layers.14.input_layernorm.weight": "model-00004-of-00014.safetensors",
69
+ "model.layers.14.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
70
+ "model.layers.14.mlp.gate_up_proj.weight": "model-00004-of-00014.safetensors",
71
+ "model.layers.14.post_attention_layernorm.weight": "model-00004-of-00014.safetensors",
72
+ "model.layers.14.post_mlp_layernorm.weight": "model-00004-of-00014.safetensors",
73
+ "model.layers.14.post_self_attn_layernorm.weight": "model-00004-of-00014.safetensors",
74
+ "model.layers.14.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
75
+ "model.layers.14.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
76
+ "model.layers.14.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
77
+ "model.layers.14.self_attn.v_proj.weight": "model-00004-of-00014.safetensors",
78
+ "model.layers.15.input_layernorm.weight": "model-00004-of-00014.safetensors",
79
+ "model.layers.15.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
80
+ "model.layers.15.mlp.gate_up_proj.weight": "model-00004-of-00014.safetensors",
81
+ "model.layers.15.post_attention_layernorm.weight": "model-00004-of-00014.safetensors",
82
+ "model.layers.15.post_mlp_layernorm.weight": "model-00004-of-00014.safetensors",
83
+ "model.layers.15.post_self_attn_layernorm.weight": "model-00004-of-00014.safetensors",
84
+ "model.layers.15.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
85
+ "model.layers.15.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
86
+ "model.layers.15.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
87
+ "model.layers.15.self_attn.v_proj.weight": "model-00004-of-00014.safetensors",
88
+ "model.layers.16.input_layernorm.weight": "model-00004-of-00014.safetensors",
89
+ "model.layers.16.mlp.down_proj.weight": "model-00004-of-00014.safetensors",
90
+ "model.layers.16.mlp.gate_up_proj.weight": "model-00004-of-00014.safetensors",
91
+ "model.layers.16.post_attention_layernorm.weight": "model-00004-of-00014.safetensors",
92
+ "model.layers.16.post_mlp_layernorm.weight": "model-00004-of-00014.safetensors",
93
+ "model.layers.16.post_self_attn_layernorm.weight": "model-00004-of-00014.safetensors",
94
+ "model.layers.16.self_attn.k_proj.weight": "model-00004-of-00014.safetensors",
95
+ "model.layers.16.self_attn.o_proj.weight": "model-00004-of-00014.safetensors",
96
+ "model.layers.16.self_attn.q_proj.weight": "model-00004-of-00014.safetensors",
97
+ "model.layers.16.self_attn.v_proj.weight": "model-00004-of-00014.safetensors",
98
+ "model.layers.17.input_layernorm.weight": "model-00005-of-00014.safetensors",
99
+ "model.layers.17.mlp.down_proj.weight": "model-00005-of-00014.safetensors",
100
+ "model.layers.17.mlp.gate_up_proj.weight": "model-00005-of-00014.safetensors",
101
+ "model.layers.17.post_attention_layernorm.weight": "model-00005-of-00014.safetensors",
102
+ "model.layers.17.post_mlp_layernorm.weight": "model-00005-of-00014.safetensors",
103
+ "model.layers.17.post_self_attn_layernorm.weight": "model-00005-of-00014.safetensors",
104
+ "model.layers.17.self_attn.k_proj.weight": "model-00005-of-00014.safetensors",
105
+ "model.layers.17.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
106
+ "model.layers.17.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
107
+ "model.layers.17.self_attn.v_proj.weight": "model-00005-of-00014.safetensors",
108
+ "model.layers.18.input_layernorm.weight": "model-00005-of-00014.safetensors",
109
+ "model.layers.18.mlp.down_proj.weight": "model-00005-of-00014.safetensors",
110
+ "model.layers.18.mlp.gate_up_proj.weight": "model-00005-of-00014.safetensors",
111
+ "model.layers.18.post_attention_layernorm.weight": "model-00005-of-00014.safetensors",
112
+ "model.layers.18.post_mlp_layernorm.weight": "model-00005-of-00014.safetensors",
113
+ "model.layers.18.post_self_attn_layernorm.weight": "model-00005-of-00014.safetensors",
114
+ "model.layers.18.self_attn.k_proj.weight": "model-00005-of-00014.safetensors",
115
+ "model.layers.18.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
116
+ "model.layers.18.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
117
+ "model.layers.18.self_attn.v_proj.weight": "model-00005-of-00014.safetensors",
118
+ "model.layers.19.input_layernorm.weight": "model-00005-of-00014.safetensors",
119
+ "model.layers.19.mlp.down_proj.weight": "model-00005-of-00014.safetensors",
120
+ "model.layers.19.mlp.gate_up_proj.weight": "model-00005-of-00014.safetensors",
121
+ "model.layers.19.post_attention_layernorm.weight": "model-00005-of-00014.safetensors",
122
+ "model.layers.19.post_mlp_layernorm.weight": "model-00005-of-00014.safetensors",
123
+ "model.layers.19.post_self_attn_layernorm.weight": "model-00005-of-00014.safetensors",
124
+ "model.layers.19.self_attn.k_proj.weight": "model-00005-of-00014.safetensors",
125
+ "model.layers.19.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
126
+ "model.layers.19.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
127
+ "model.layers.19.self_attn.v_proj.weight": "model-00005-of-00014.safetensors",
128
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00014.safetensors",
129
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00014.safetensors",
130
+ "model.layers.2.mlp.gate_up_proj.weight": "model-00001-of-00014.safetensors",
131
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00014.safetensors",
132
+ "model.layers.2.post_mlp_layernorm.weight": "model-00001-of-00014.safetensors",
133
+ "model.layers.2.post_self_attn_layernorm.weight": "model-00001-of-00014.safetensors",
134
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00014.safetensors",
135
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00014.safetensors",
136
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00014.safetensors",
137
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00014.safetensors",
138
+ "model.layers.20.input_layernorm.weight": "model-00005-of-00014.safetensors",
139
+ "model.layers.20.mlp.down_proj.weight": "model-00005-of-00014.safetensors",
140
+ "model.layers.20.mlp.gate_up_proj.weight": "model-00005-of-00014.safetensors",
141
+ "model.layers.20.post_attention_layernorm.weight": "model-00005-of-00014.safetensors",
142
+ "model.layers.20.post_mlp_layernorm.weight": "model-00005-of-00014.safetensors",
143
+ "model.layers.20.post_self_attn_layernorm.weight": "model-00005-of-00014.safetensors",
144
+ "model.layers.20.self_attn.k_proj.weight": "model-00005-of-00014.safetensors",
145
+ "model.layers.20.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
146
+ "model.layers.20.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
147
+ "model.layers.20.self_attn.v_proj.weight": "model-00005-of-00014.safetensors",
148
+ "model.layers.21.input_layernorm.weight": "model-00006-of-00014.safetensors",
149
+ "model.layers.21.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
150
+ "model.layers.21.mlp.gate_up_proj.weight": "model-00005-of-00014.safetensors",
151
+ "model.layers.21.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
152
+ "model.layers.21.post_mlp_layernorm.weight": "model-00006-of-00014.safetensors",
153
+ "model.layers.21.post_self_attn_layernorm.weight": "model-00006-of-00014.safetensors",
154
+ "model.layers.21.self_attn.k_proj.weight": "model-00005-of-00014.safetensors",
155
+ "model.layers.21.self_attn.o_proj.weight": "model-00005-of-00014.safetensors",
156
+ "model.layers.21.self_attn.q_proj.weight": "model-00005-of-00014.safetensors",
157
+ "model.layers.21.self_attn.v_proj.weight": "model-00005-of-00014.safetensors",
158
+ "model.layers.22.input_layernorm.weight": "model-00006-of-00014.safetensors",
159
+ "model.layers.22.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
160
+ "model.layers.22.mlp.gate_up_proj.weight": "model-00006-of-00014.safetensors",
161
+ "model.layers.22.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
162
+ "model.layers.22.post_mlp_layernorm.weight": "model-00006-of-00014.safetensors",
163
+ "model.layers.22.post_self_attn_layernorm.weight": "model-00006-of-00014.safetensors",
164
+ "model.layers.22.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
165
+ "model.layers.22.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
166
+ "model.layers.22.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
167
+ "model.layers.22.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
168
+ "model.layers.23.input_layernorm.weight": "model-00006-of-00014.safetensors",
169
+ "model.layers.23.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
170
+ "model.layers.23.mlp.gate_up_proj.weight": "model-00006-of-00014.safetensors",
171
+ "model.layers.23.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
172
+ "model.layers.23.post_mlp_layernorm.weight": "model-00006-of-00014.safetensors",
173
+ "model.layers.23.post_self_attn_layernorm.weight": "model-00006-of-00014.safetensors",
174
+ "model.layers.23.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
175
+ "model.layers.23.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
176
+ "model.layers.23.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
177
+ "model.layers.23.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
178
+ "model.layers.24.input_layernorm.weight": "model-00006-of-00014.safetensors",
179
+ "model.layers.24.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
180
+ "model.layers.24.mlp.gate_up_proj.weight": "model-00006-of-00014.safetensors",
181
+ "model.layers.24.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
182
+ "model.layers.24.post_mlp_layernorm.weight": "model-00006-of-00014.safetensors",
183
+ "model.layers.24.post_self_attn_layernorm.weight": "model-00006-of-00014.safetensors",
184
+ "model.layers.24.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
185
+ "model.layers.24.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
186
+ "model.layers.24.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
187
+ "model.layers.24.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
188
+ "model.layers.25.input_layernorm.weight": "model-00006-of-00014.safetensors",
189
+ "model.layers.25.mlp.down_proj.weight": "model-00006-of-00014.safetensors",
190
+ "model.layers.25.mlp.gate_up_proj.weight": "model-00006-of-00014.safetensors",
191
+ "model.layers.25.post_attention_layernorm.weight": "model-00006-of-00014.safetensors",
192
+ "model.layers.25.post_mlp_layernorm.weight": "model-00006-of-00014.safetensors",
193
+ "model.layers.25.post_self_attn_layernorm.weight": "model-00006-of-00014.safetensors",
194
+ "model.layers.25.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
195
+ "model.layers.25.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
196
+ "model.layers.25.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
197
+ "model.layers.25.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
198
+ "model.layers.26.input_layernorm.weight": "model-00007-of-00014.safetensors",
199
+ "model.layers.26.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
200
+ "model.layers.26.mlp.gate_up_proj.weight": "model-00007-of-00014.safetensors",
201
+ "model.layers.26.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
202
+ "model.layers.26.post_mlp_layernorm.weight": "model-00007-of-00014.safetensors",
203
+ "model.layers.26.post_self_attn_layernorm.weight": "model-00007-of-00014.safetensors",
204
+ "model.layers.26.self_attn.k_proj.weight": "model-00006-of-00014.safetensors",
205
+ "model.layers.26.self_attn.o_proj.weight": "model-00006-of-00014.safetensors",
206
+ "model.layers.26.self_attn.q_proj.weight": "model-00006-of-00014.safetensors",
207
+ "model.layers.26.self_attn.v_proj.weight": "model-00006-of-00014.safetensors",
208
+ "model.layers.27.input_layernorm.weight": "model-00007-of-00014.safetensors",
209
+ "model.layers.27.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
210
+ "model.layers.27.mlp.gate_up_proj.weight": "model-00007-of-00014.safetensors",
211
+ "model.layers.27.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
212
+ "model.layers.27.post_mlp_layernorm.weight": "model-00007-of-00014.safetensors",
213
+ "model.layers.27.post_self_attn_layernorm.weight": "model-00007-of-00014.safetensors",
214
+ "model.layers.27.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
215
+ "model.layers.27.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
216
+ "model.layers.27.self_attn.q_proj.weight": "model-00007-of-00014.safetensors",
217
+ "model.layers.27.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
218
+ "model.layers.28.input_layernorm.weight": "model-00007-of-00014.safetensors",
219
+ "model.layers.28.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
220
+ "model.layers.28.mlp.gate_up_proj.weight": "model-00007-of-00014.safetensors",
221
+ "model.layers.28.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
222
+ "model.layers.28.post_mlp_layernorm.weight": "model-00007-of-00014.safetensors",
223
+ "model.layers.28.post_self_attn_layernorm.weight": "model-00007-of-00014.safetensors",
224
+ "model.layers.28.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
225
+ "model.layers.28.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
226
+ "model.layers.28.self_attn.q_proj.weight": "model-00007-of-00014.safetensors",
227
+ "model.layers.28.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
228
+ "model.layers.29.input_layernorm.weight": "model-00007-of-00014.safetensors",
229
+ "model.layers.29.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
230
+ "model.layers.29.mlp.gate_up_proj.weight": "model-00007-of-00014.safetensors",
231
+ "model.layers.29.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
232
+ "model.layers.29.post_mlp_layernorm.weight": "model-00007-of-00014.safetensors",
233
+ "model.layers.29.post_self_attn_layernorm.weight": "model-00007-of-00014.safetensors",
234
+ "model.layers.29.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
235
+ "model.layers.29.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
236
+ "model.layers.29.self_attn.q_proj.weight": "model-00007-of-00014.safetensors",
237
+ "model.layers.29.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
238
+ "model.layers.3.input_layernorm.weight": "model-00002-of-00014.safetensors",
239
+ "model.layers.3.mlp.down_proj.weight": "model-00002-of-00014.safetensors",
240
+ "model.layers.3.mlp.gate_up_proj.weight": "model-00002-of-00014.safetensors",
241
+ "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00014.safetensors",
242
+ "model.layers.3.post_mlp_layernorm.weight": "model-00002-of-00014.safetensors",
243
+ "model.layers.3.post_self_attn_layernorm.weight": "model-00002-of-00014.safetensors",
244
+ "model.layers.3.self_attn.k_proj.weight": "model-00002-of-00014.safetensors",
245
+ "model.layers.3.self_attn.o_proj.weight": "model-00002-of-00014.safetensors",
246
+ "model.layers.3.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
247
+ "model.layers.3.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
248
+ "model.layers.30.input_layernorm.weight": "model-00007-of-00014.safetensors",
249
+ "model.layers.30.mlp.down_proj.weight": "model-00007-of-00014.safetensors",
250
+ "model.layers.30.mlp.gate_up_proj.weight": "model-00007-of-00014.safetensors",
251
+ "model.layers.30.post_attention_layernorm.weight": "model-00007-of-00014.safetensors",
252
+ "model.layers.30.post_mlp_layernorm.weight": "model-00007-of-00014.safetensors",
253
+ "model.layers.30.post_self_attn_layernorm.weight": "model-00007-of-00014.safetensors",
254
+ "model.layers.30.self_attn.k_proj.weight": "model-00007-of-00014.safetensors",
255
+ "model.layers.30.self_attn.o_proj.weight": "model-00007-of-00014.safetensors",
256
+ "model.layers.30.self_attn.q_proj.weight": "model-00007-of-00014.safetensors",
257
+ "model.layers.30.self_attn.v_proj.weight": "model-00007-of-00014.safetensors",
258
+ "model.layers.31.input_layernorm.weight": "model-00008-of-00014.safetensors",
259
+ "model.layers.31.mlp.down_proj.weight": "model-00008-of-00014.safetensors",
260
+ "model.layers.31.mlp.gate_up_proj.weight": "model-00008-of-00014.safetensors",
261
+ "model.layers.31.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
262
+ "model.layers.31.post_mlp_layernorm.weight": "model-00008-of-00014.safetensors",
263
+ "model.layers.31.post_self_attn_layernorm.weight": "model-00008-of-00014.safetensors",
264
+ "model.layers.31.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
265
+ "model.layers.31.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
266
+ "model.layers.31.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
267
+ "model.layers.31.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
268
+ "model.layers.32.input_layernorm.weight": "model-00008-of-00014.safetensors",
269
+ "model.layers.32.mlp.down_proj.weight": "model-00008-of-00014.safetensors",
270
+ "model.layers.32.mlp.gate_up_proj.weight": "model-00008-of-00014.safetensors",
271
+ "model.layers.32.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
272
+ "model.layers.32.post_mlp_layernorm.weight": "model-00008-of-00014.safetensors",
273
+ "model.layers.32.post_self_attn_layernorm.weight": "model-00008-of-00014.safetensors",
274
+ "model.layers.32.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
275
+ "model.layers.32.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
276
+ "model.layers.32.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
277
+ "model.layers.32.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
278
+ "model.layers.33.input_layernorm.weight": "model-00008-of-00014.safetensors",
279
+ "model.layers.33.mlp.down_proj.weight": "model-00008-of-00014.safetensors",
280
+ "model.layers.33.mlp.gate_up_proj.weight": "model-00008-of-00014.safetensors",
281
+ "model.layers.33.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
282
+ "model.layers.33.post_mlp_layernorm.weight": "model-00008-of-00014.safetensors",
283
+ "model.layers.33.post_self_attn_layernorm.weight": "model-00008-of-00014.safetensors",
284
+ "model.layers.33.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
285
+ "model.layers.33.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
286
+ "model.layers.33.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
287
+ "model.layers.33.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
288
+ "model.layers.34.input_layernorm.weight": "model-00008-of-00014.safetensors",
289
+ "model.layers.34.mlp.down_proj.weight": "model-00008-of-00014.safetensors",
290
+ "model.layers.34.mlp.gate_up_proj.weight": "model-00008-of-00014.safetensors",
291
+ "model.layers.34.post_attention_layernorm.weight": "model-00008-of-00014.safetensors",
292
+ "model.layers.34.post_mlp_layernorm.weight": "model-00008-of-00014.safetensors",
293
+ "model.layers.34.post_self_attn_layernorm.weight": "model-00008-of-00014.safetensors",
294
+ "model.layers.34.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
295
+ "model.layers.34.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
296
+ "model.layers.34.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
297
+ "model.layers.34.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
298
+ "model.layers.35.input_layernorm.weight": "model-00009-of-00014.safetensors",
299
+ "model.layers.35.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
300
+ "model.layers.35.mlp.gate_up_proj.weight": "model-00008-of-00014.safetensors",
301
+ "model.layers.35.post_attention_layernorm.weight": "model-00009-of-00014.safetensors",
302
+ "model.layers.35.post_mlp_layernorm.weight": "model-00009-of-00014.safetensors",
303
+ "model.layers.35.post_self_attn_layernorm.weight": "model-00009-of-00014.safetensors",
304
+ "model.layers.35.self_attn.k_proj.weight": "model-00008-of-00014.safetensors",
305
+ "model.layers.35.self_attn.o_proj.weight": "model-00008-of-00014.safetensors",
306
+ "model.layers.35.self_attn.q_proj.weight": "model-00008-of-00014.safetensors",
307
+ "model.layers.35.self_attn.v_proj.weight": "model-00008-of-00014.safetensors",
308
+ "model.layers.36.input_layernorm.weight": "model-00009-of-00014.safetensors",
309
+ "model.layers.36.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
310
+ "model.layers.36.mlp.gate_up_proj.weight": "model-00009-of-00014.safetensors",
311
+ "model.layers.36.post_attention_layernorm.weight": "model-00009-of-00014.safetensors",
312
+ "model.layers.36.post_mlp_layernorm.weight": "model-00009-of-00014.safetensors",
313
+ "model.layers.36.post_self_attn_layernorm.weight": "model-00009-of-00014.safetensors",
314
+ "model.layers.36.self_attn.k_proj.weight": "model-00009-of-00014.safetensors",
315
+ "model.layers.36.self_attn.o_proj.weight": "model-00009-of-00014.safetensors",
316
+ "model.layers.36.self_attn.q_proj.weight": "model-00009-of-00014.safetensors",
317
+ "model.layers.36.self_attn.v_proj.weight": "model-00009-of-00014.safetensors",
318
+ "model.layers.37.input_layernorm.weight": "model-00009-of-00014.safetensors",
319
+ "model.layers.37.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
320
+ "model.layers.37.mlp.gate_up_proj.weight": "model-00009-of-00014.safetensors",
321
+ "model.layers.37.post_attention_layernorm.weight": "model-00009-of-00014.safetensors",
322
+ "model.layers.37.post_mlp_layernorm.weight": "model-00009-of-00014.safetensors",
323
+ "model.layers.37.post_self_attn_layernorm.weight": "model-00009-of-00014.safetensors",
324
+ "model.layers.37.self_attn.k_proj.weight": "model-00009-of-00014.safetensors",
325
+ "model.layers.37.self_attn.o_proj.weight": "model-00009-of-00014.safetensors",
326
+ "model.layers.37.self_attn.q_proj.weight": "model-00009-of-00014.safetensors",
327
+ "model.layers.37.self_attn.v_proj.weight": "model-00009-of-00014.safetensors",
328
+ "model.layers.38.input_layernorm.weight": "model-00009-of-00014.safetensors",
329
+ "model.layers.38.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
330
+ "model.layers.38.mlp.gate_up_proj.weight": "model-00009-of-00014.safetensors",
331
+ "model.layers.38.post_attention_layernorm.weight": "model-00009-of-00014.safetensors",
332
+ "model.layers.38.post_mlp_layernorm.weight": "model-00009-of-00014.safetensors",
333
+ "model.layers.38.post_self_attn_layernorm.weight": "model-00009-of-00014.safetensors",
334
+ "model.layers.38.self_attn.k_proj.weight": "model-00009-of-00014.safetensors",
335
+ "model.layers.38.self_attn.o_proj.weight": "model-00009-of-00014.safetensors",
336
+ "model.layers.38.self_attn.q_proj.weight": "model-00009-of-00014.safetensors",
337
+ "model.layers.38.self_attn.v_proj.weight": "model-00009-of-00014.safetensors",
338
+ "model.layers.39.input_layernorm.weight": "model-00009-of-00014.safetensors",
339
+ "model.layers.39.mlp.down_proj.weight": "model-00009-of-00014.safetensors",
340
+ "model.layers.39.mlp.gate_up_proj.weight": "model-00009-of-00014.safetensors",
341
+ "model.layers.39.post_attention_layernorm.weight": "model-00009-of-00014.safetensors",
342
+ "model.layers.39.post_mlp_layernorm.weight": "model-00009-of-00014.safetensors",
343
+ "model.layers.39.post_self_attn_layernorm.weight": "model-00009-of-00014.safetensors",
344
+ "model.layers.39.self_attn.k_proj.weight": "model-00009-of-00014.safetensors",
345
+ "model.layers.39.self_attn.o_proj.weight": "model-00009-of-00014.safetensors",
346
+ "model.layers.39.self_attn.q_proj.weight": "model-00009-of-00014.safetensors",
347
+ "model.layers.39.self_attn.v_proj.weight": "model-00009-of-00014.safetensors",
348
+ "model.layers.4.input_layernorm.weight": "model-00002-of-00014.safetensors",
349
+ "model.layers.4.mlp.down_proj.weight": "model-00002-of-00014.safetensors",
350
+ "model.layers.4.mlp.gate_up_proj.weight": "model-00002-of-00014.safetensors",
351
+ "model.layers.4.post_attention_layernorm.weight": "model-00002-of-00014.safetensors",
352
+ "model.layers.4.post_mlp_layernorm.weight": "model-00002-of-00014.safetensors",
353
+ "model.layers.4.post_self_attn_layernorm.weight": "model-00002-of-00014.safetensors",
354
+ "model.layers.4.self_attn.k_proj.weight": "model-00002-of-00014.safetensors",
355
+ "model.layers.4.self_attn.o_proj.weight": "model-00002-of-00014.safetensors",
356
+ "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
357
+ "model.layers.4.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
358
+ "model.layers.40.input_layernorm.weight": "model-00010-of-00014.safetensors",
359
+ "model.layers.40.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
360
+ "model.layers.40.mlp.gate_up_proj.weight": "model-00010-of-00014.safetensors",
361
+ "model.layers.40.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
362
+ "model.layers.40.post_mlp_layernorm.weight": "model-00010-of-00014.safetensors",
363
+ "model.layers.40.post_self_attn_layernorm.weight": "model-00010-of-00014.safetensors",
364
+ "model.layers.40.self_attn.k_proj.weight": "model-00009-of-00014.safetensors",
365
+ "model.layers.40.self_attn.o_proj.weight": "model-00009-of-00014.safetensors",
366
+ "model.layers.40.self_attn.q_proj.weight": "model-00009-of-00014.safetensors",
367
+ "model.layers.40.self_attn.v_proj.weight": "model-00009-of-00014.safetensors",
368
+ "model.layers.41.input_layernorm.weight": "model-00010-of-00014.safetensors",
369
+ "model.layers.41.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
370
+ "model.layers.41.mlp.gate_up_proj.weight": "model-00010-of-00014.safetensors",
371
+ "model.layers.41.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
372
+ "model.layers.41.post_mlp_layernorm.weight": "model-00010-of-00014.safetensors",
373
+ "model.layers.41.post_self_attn_layernorm.weight": "model-00010-of-00014.safetensors",
374
+ "model.layers.41.self_attn.k_proj.weight": "model-00010-of-00014.safetensors",
375
+ "model.layers.41.self_attn.o_proj.weight": "model-00010-of-00014.safetensors",
376
+ "model.layers.41.self_attn.q_proj.weight": "model-00010-of-00014.safetensors",
377
+ "model.layers.41.self_attn.v_proj.weight": "model-00010-of-00014.safetensors",
378
+ "model.layers.42.input_layernorm.weight": "model-00010-of-00014.safetensors",
379
+ "model.layers.42.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
380
+ "model.layers.42.mlp.gate_up_proj.weight": "model-00010-of-00014.safetensors",
381
+ "model.layers.42.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
382
+ "model.layers.42.post_mlp_layernorm.weight": "model-00010-of-00014.safetensors",
383
+ "model.layers.42.post_self_attn_layernorm.weight": "model-00010-of-00014.safetensors",
384
+ "model.layers.42.self_attn.k_proj.weight": "model-00010-of-00014.safetensors",
385
+ "model.layers.42.self_attn.o_proj.weight": "model-00010-of-00014.safetensors",
386
+ "model.layers.42.self_attn.q_proj.weight": "model-00010-of-00014.safetensors",
387
+ "model.layers.42.self_attn.v_proj.weight": "model-00010-of-00014.safetensors",
388
+ "model.layers.43.input_layernorm.weight": "model-00010-of-00014.safetensors",
389
+ "model.layers.43.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
390
+ "model.layers.43.mlp.gate_up_proj.weight": "model-00010-of-00014.safetensors",
391
+ "model.layers.43.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
392
+ "model.layers.43.post_mlp_layernorm.weight": "model-00010-of-00014.safetensors",
393
+ "model.layers.43.post_self_attn_layernorm.weight": "model-00010-of-00014.safetensors",
394
+ "model.layers.43.self_attn.k_proj.weight": "model-00010-of-00014.safetensors",
395
+ "model.layers.43.self_attn.o_proj.weight": "model-00010-of-00014.safetensors",
396
+ "model.layers.43.self_attn.q_proj.weight": "model-00010-of-00014.safetensors",
397
+ "model.layers.43.self_attn.v_proj.weight": "model-00010-of-00014.safetensors",
398
+ "model.layers.44.input_layernorm.weight": "model-00010-of-00014.safetensors",
399
+ "model.layers.44.mlp.down_proj.weight": "model-00010-of-00014.safetensors",
400
+ "model.layers.44.mlp.gate_up_proj.weight": "model-00010-of-00014.safetensors",
401
+ "model.layers.44.post_attention_layernorm.weight": "model-00010-of-00014.safetensors",
402
+ "model.layers.44.post_mlp_layernorm.weight": "model-00010-of-00014.safetensors",
403
+ "model.layers.44.post_self_attn_layernorm.weight": "model-00010-of-00014.safetensors",
404
+ "model.layers.44.self_attn.k_proj.weight": "model-00010-of-00014.safetensors",
405
+ "model.layers.44.self_attn.o_proj.weight": "model-00010-of-00014.safetensors",
406
+ "model.layers.44.self_attn.q_proj.weight": "model-00010-of-00014.safetensors",
407
+ "model.layers.44.self_attn.v_proj.weight": "model-00010-of-00014.safetensors",
408
+ "model.layers.45.input_layernorm.weight": "model-00011-of-00014.safetensors",
409
+ "model.layers.45.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
410
+ "model.layers.45.mlp.gate_up_proj.weight": "model-00011-of-00014.safetensors",
411
+ "model.layers.45.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
412
+ "model.layers.45.post_mlp_layernorm.weight": "model-00011-of-00014.safetensors",
413
+ "model.layers.45.post_self_attn_layernorm.weight": "model-00011-of-00014.safetensors",
414
+ "model.layers.45.self_attn.k_proj.weight": "model-00011-of-00014.safetensors",
415
+ "model.layers.45.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
416
+ "model.layers.45.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
417
+ "model.layers.45.self_attn.v_proj.weight": "model-00011-of-00014.safetensors",
418
+ "model.layers.46.input_layernorm.weight": "model-00011-of-00014.safetensors",
419
+ "model.layers.46.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
420
+ "model.layers.46.mlp.gate_up_proj.weight": "model-00011-of-00014.safetensors",
421
+ "model.layers.46.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
422
+ "model.layers.46.post_mlp_layernorm.weight": "model-00011-of-00014.safetensors",
423
+ "model.layers.46.post_self_attn_layernorm.weight": "model-00011-of-00014.safetensors",
424
+ "model.layers.46.self_attn.k_proj.weight": "model-00011-of-00014.safetensors",
425
+ "model.layers.46.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
426
+ "model.layers.46.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
427
+ "model.layers.46.self_attn.v_proj.weight": "model-00011-of-00014.safetensors",
428
+ "model.layers.47.input_layernorm.weight": "model-00011-of-00014.safetensors",
429
+ "model.layers.47.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
430
+ "model.layers.47.mlp.gate_up_proj.weight": "model-00011-of-00014.safetensors",
431
+ "model.layers.47.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
432
+ "model.layers.47.post_mlp_layernorm.weight": "model-00011-of-00014.safetensors",
433
+ "model.layers.47.post_self_attn_layernorm.weight": "model-00011-of-00014.safetensors",
434
+ "model.layers.47.self_attn.k_proj.weight": "model-00011-of-00014.safetensors",
435
+ "model.layers.47.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
436
+ "model.layers.47.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
437
+ "model.layers.47.self_attn.v_proj.weight": "model-00011-of-00014.safetensors",
438
+ "model.layers.48.input_layernorm.weight": "model-00011-of-00014.safetensors",
439
+ "model.layers.48.mlp.down_proj.weight": "model-00011-of-00014.safetensors",
440
+ "model.layers.48.mlp.gate_up_proj.weight": "model-00011-of-00014.safetensors",
441
+ "model.layers.48.post_attention_layernorm.weight": "model-00011-of-00014.safetensors",
442
+ "model.layers.48.post_mlp_layernorm.weight": "model-00011-of-00014.safetensors",
443
+ "model.layers.48.post_self_attn_layernorm.weight": "model-00011-of-00014.safetensors",
444
+ "model.layers.48.self_attn.k_proj.weight": "model-00011-of-00014.safetensors",
445
+ "model.layers.48.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
446
+ "model.layers.48.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
447
+ "model.layers.48.self_attn.v_proj.weight": "model-00011-of-00014.safetensors",
448
+ "model.layers.49.input_layernorm.weight": "model-00012-of-00014.safetensors",
449
+ "model.layers.49.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
450
+ "model.layers.49.mlp.gate_up_proj.weight": "model-00011-of-00014.safetensors",
451
+ "model.layers.49.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
452
+ "model.layers.49.post_mlp_layernorm.weight": "model-00012-of-00014.safetensors",
453
+ "model.layers.49.post_self_attn_layernorm.weight": "model-00012-of-00014.safetensors",
454
+ "model.layers.49.self_attn.k_proj.weight": "model-00011-of-00014.safetensors",
455
+ "model.layers.49.self_attn.o_proj.weight": "model-00011-of-00014.safetensors",
456
+ "model.layers.49.self_attn.q_proj.weight": "model-00011-of-00014.safetensors",
457
+ "model.layers.49.self_attn.v_proj.weight": "model-00011-of-00014.safetensors",
458
+ "model.layers.5.input_layernorm.weight": "model-00002-of-00014.safetensors",
459
+ "model.layers.5.mlp.down_proj.weight": "model-00002-of-00014.safetensors",
460
+ "model.layers.5.mlp.gate_up_proj.weight": "model-00002-of-00014.safetensors",
461
+ "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00014.safetensors",
462
+ "model.layers.5.post_mlp_layernorm.weight": "model-00002-of-00014.safetensors",
463
+ "model.layers.5.post_self_attn_layernorm.weight": "model-00002-of-00014.safetensors",
464
+ "model.layers.5.self_attn.k_proj.weight": "model-00002-of-00014.safetensors",
465
+ "model.layers.5.self_attn.o_proj.weight": "model-00002-of-00014.safetensors",
466
+ "model.layers.5.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
467
+ "model.layers.5.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
468
+ "model.layers.50.input_layernorm.weight": "model-00012-of-00014.safetensors",
469
+ "model.layers.50.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
470
+ "model.layers.50.mlp.gate_up_proj.weight": "model-00012-of-00014.safetensors",
471
+ "model.layers.50.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
472
+ "model.layers.50.post_mlp_layernorm.weight": "model-00012-of-00014.safetensors",
473
+ "model.layers.50.post_self_attn_layernorm.weight": "model-00012-of-00014.safetensors",
474
+ "model.layers.50.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
475
+ "model.layers.50.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
476
+ "model.layers.50.self_attn.q_proj.weight": "model-00012-of-00014.safetensors",
477
+ "model.layers.50.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
478
+ "model.layers.51.input_layernorm.weight": "model-00012-of-00014.safetensors",
479
+ "model.layers.51.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
480
+ "model.layers.51.mlp.gate_up_proj.weight": "model-00012-of-00014.safetensors",
481
+ "model.layers.51.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
482
+ "model.layers.51.post_mlp_layernorm.weight": "model-00012-of-00014.safetensors",
483
+ "model.layers.51.post_self_attn_layernorm.weight": "model-00012-of-00014.safetensors",
484
+ "model.layers.51.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
485
+ "model.layers.51.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
486
+ "model.layers.51.self_attn.q_proj.weight": "model-00012-of-00014.safetensors",
487
+ "model.layers.51.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
488
+ "model.layers.52.input_layernorm.weight": "model-00012-of-00014.safetensors",
489
+ "model.layers.52.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
490
+ "model.layers.52.mlp.gate_up_proj.weight": "model-00012-of-00014.safetensors",
491
+ "model.layers.52.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
492
+ "model.layers.52.post_mlp_layernorm.weight": "model-00012-of-00014.safetensors",
493
+ "model.layers.52.post_self_attn_layernorm.weight": "model-00012-of-00014.safetensors",
494
+ "model.layers.52.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
495
+ "model.layers.52.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
496
+ "model.layers.52.self_attn.q_proj.weight": "model-00012-of-00014.safetensors",
497
+ "model.layers.52.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
498
+ "model.layers.53.input_layernorm.weight": "model-00012-of-00014.safetensors",
499
+ "model.layers.53.mlp.down_proj.weight": "model-00012-of-00014.safetensors",
500
+ "model.layers.53.mlp.gate_up_proj.weight": "model-00012-of-00014.safetensors",
501
+ "model.layers.53.post_attention_layernorm.weight": "model-00012-of-00014.safetensors",
502
+ "model.layers.53.post_mlp_layernorm.weight": "model-00012-of-00014.safetensors",
503
+ "model.layers.53.post_self_attn_layernorm.weight": "model-00012-of-00014.safetensors",
504
+ "model.layers.53.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
505
+ "model.layers.53.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
506
+ "model.layers.53.self_attn.q_proj.weight": "model-00012-of-00014.safetensors",
507
+ "model.layers.53.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
508
+ "model.layers.54.input_layernorm.weight": "model-00013-of-00014.safetensors",
509
+ "model.layers.54.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
510
+ "model.layers.54.mlp.gate_up_proj.weight": "model-00013-of-00014.safetensors",
511
+ "model.layers.54.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
512
+ "model.layers.54.post_mlp_layernorm.weight": "model-00013-of-00014.safetensors",
513
+ "model.layers.54.post_self_attn_layernorm.weight": "model-00013-of-00014.safetensors",
514
+ "model.layers.54.self_attn.k_proj.weight": "model-00012-of-00014.safetensors",
515
+ "model.layers.54.self_attn.o_proj.weight": "model-00012-of-00014.safetensors",
516
+ "model.layers.54.self_attn.q_proj.weight": "model-00012-of-00014.safetensors",
517
+ "model.layers.54.self_attn.v_proj.weight": "model-00012-of-00014.safetensors",
518
+ "model.layers.55.input_layernorm.weight": "model-00013-of-00014.safetensors",
519
+ "model.layers.55.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
520
+ "model.layers.55.mlp.gate_up_proj.weight": "model-00013-of-00014.safetensors",
521
+ "model.layers.55.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
522
+ "model.layers.55.post_mlp_layernorm.weight": "model-00013-of-00014.safetensors",
523
+ "model.layers.55.post_self_attn_layernorm.weight": "model-00013-of-00014.safetensors",
524
+ "model.layers.55.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
525
+ "model.layers.55.self_attn.o_proj.weight": "model-00013-of-00014.safetensors",
526
+ "model.layers.55.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
527
+ "model.layers.55.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
528
+ "model.layers.56.input_layernorm.weight": "model-00013-of-00014.safetensors",
529
+ "model.layers.56.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
530
+ "model.layers.56.mlp.gate_up_proj.weight": "model-00013-of-00014.safetensors",
531
+ "model.layers.56.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
532
+ "model.layers.56.post_mlp_layernorm.weight": "model-00013-of-00014.safetensors",
533
+ "model.layers.56.post_self_attn_layernorm.weight": "model-00013-of-00014.safetensors",
534
+ "model.layers.56.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
535
+ "model.layers.56.self_attn.o_proj.weight": "model-00013-of-00014.safetensors",
536
+ "model.layers.56.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
537
+ "model.layers.56.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
538
+ "model.layers.57.input_layernorm.weight": "model-00013-of-00014.safetensors",
539
+ "model.layers.57.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
540
+ "model.layers.57.mlp.gate_up_proj.weight": "model-00013-of-00014.safetensors",
541
+ "model.layers.57.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
542
+ "model.layers.57.post_mlp_layernorm.weight": "model-00013-of-00014.safetensors",
543
+ "model.layers.57.post_self_attn_layernorm.weight": "model-00013-of-00014.safetensors",
544
+ "model.layers.57.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
545
+ "model.layers.57.self_attn.o_proj.weight": "model-00013-of-00014.safetensors",
546
+ "model.layers.57.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
547
+ "model.layers.57.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
548
+ "model.layers.58.input_layernorm.weight": "model-00013-of-00014.safetensors",
549
+ "model.layers.58.mlp.down_proj.weight": "model-00013-of-00014.safetensors",
550
+ "model.layers.58.mlp.gate_up_proj.weight": "model-00013-of-00014.safetensors",
551
+ "model.layers.58.post_attention_layernorm.weight": "model-00013-of-00014.safetensors",
552
+ "model.layers.58.post_mlp_layernorm.weight": "model-00013-of-00014.safetensors",
553
+ "model.layers.58.post_self_attn_layernorm.weight": "model-00013-of-00014.safetensors",
554
+ "model.layers.58.self_attn.k_proj.weight": "model-00013-of-00014.safetensors",
555
+ "model.layers.58.self_attn.o_proj.weight": "model-00013-of-00014.safetensors",
556
+ "model.layers.58.self_attn.q_proj.weight": "model-00013-of-00014.safetensors",
557
+ "model.layers.58.self_attn.v_proj.weight": "model-00013-of-00014.safetensors",
558
+ "model.layers.59.input_layernorm.weight": "model-00014-of-00014.safetensors",
559
+ "model.layers.59.mlp.down_proj.weight": "model-00014-of-00014.safetensors",
560
+ "model.layers.59.mlp.gate_up_proj.weight": "model-00014-of-00014.safetensors",
561
+ "model.layers.59.post_attention_layernorm.weight": "model-00014-of-00014.safetensors",
562
+ "model.layers.59.post_mlp_layernorm.weight": "model-00014-of-00014.safetensors",
563
+ "model.layers.59.post_self_attn_layernorm.weight": "model-00014-of-00014.safetensors",
564
+ "model.layers.59.self_attn.k_proj.weight": "model-00014-of-00014.safetensors",
565
+ "model.layers.59.self_attn.o_proj.weight": "model-00014-of-00014.safetensors",
566
+ "model.layers.59.self_attn.q_proj.weight": "model-00014-of-00014.safetensors",
567
+ "model.layers.59.self_attn.v_proj.weight": "model-00014-of-00014.safetensors",
568
+ "model.layers.6.input_layernorm.weight": "model-00002-of-00014.safetensors",
569
+ "model.layers.6.mlp.down_proj.weight": "model-00002-of-00014.safetensors",
570
+ "model.layers.6.mlp.gate_up_proj.weight": "model-00002-of-00014.safetensors",
571
+ "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00014.safetensors",
572
+ "model.layers.6.post_mlp_layernorm.weight": "model-00002-of-00014.safetensors",
573
+ "model.layers.6.post_self_attn_layernorm.weight": "model-00002-of-00014.safetensors",
574
+ "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00014.safetensors",
575
+ "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00014.safetensors",
576
+ "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
577
+ "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
578
+ "model.layers.60.input_layernorm.weight": "model-00014-of-00014.safetensors",
579
+ "model.layers.60.mlp.down_proj.weight": "model-00014-of-00014.safetensors",
580
+ "model.layers.60.mlp.gate_up_proj.weight": "model-00014-of-00014.safetensors",
581
+ "model.layers.60.post_attention_layernorm.weight": "model-00014-of-00014.safetensors",
582
+ "model.layers.60.post_mlp_layernorm.weight": "model-00014-of-00014.safetensors",
583
+ "model.layers.60.post_self_attn_layernorm.weight": "model-00014-of-00014.safetensors",
584
+ "model.layers.60.self_attn.k_proj.weight": "model-00014-of-00014.safetensors",
585
+ "model.layers.60.self_attn.o_proj.weight": "model-00014-of-00014.safetensors",
586
+ "model.layers.60.self_attn.q_proj.weight": "model-00014-of-00014.safetensors",
587
+ "model.layers.60.self_attn.v_proj.weight": "model-00014-of-00014.safetensors",
588
+ "model.layers.7.input_layernorm.weight": "model-00003-of-00014.safetensors",
589
+ "model.layers.7.mlp.down_proj.weight": "model-00003-of-00014.safetensors",
590
+ "model.layers.7.mlp.gate_up_proj.weight": "model-00002-of-00014.safetensors",
591
+ "model.layers.7.post_attention_layernorm.weight": "model-00003-of-00014.safetensors",
592
+ "model.layers.7.post_mlp_layernorm.weight": "model-00003-of-00014.safetensors",
593
+ "model.layers.7.post_self_attn_layernorm.weight": "model-00003-of-00014.safetensors",
594
+ "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00014.safetensors",
595
+ "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00014.safetensors",
596
+ "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00014.safetensors",
597
+ "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00014.safetensors",
598
+ "model.layers.8.input_layernorm.weight": "model-00003-of-00014.safetensors",
599
+ "model.layers.8.mlp.down_proj.weight": "model-00003-of-00014.safetensors",
600
+ "model.layers.8.mlp.gate_up_proj.weight": "model-00003-of-00014.safetensors",
601
+ "model.layers.8.post_attention_layernorm.weight": "model-00003-of-00014.safetensors",
602
+ "model.layers.8.post_mlp_layernorm.weight": "model-00003-of-00014.safetensors",
603
+ "model.layers.8.post_self_attn_layernorm.weight": "model-00003-of-00014.safetensors",
604
+ "model.layers.8.self_attn.k_proj.weight": "model-00003-of-00014.safetensors",
605
+ "model.layers.8.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
606
+ "model.layers.8.self_attn.q_proj.weight": "model-00003-of-00014.safetensors",
607
+ "model.layers.8.self_attn.v_proj.weight": "model-00003-of-00014.safetensors",
608
+ "model.layers.9.input_layernorm.weight": "model-00003-of-00014.safetensors",
609
+ "model.layers.9.mlp.down_proj.weight": "model-00003-of-00014.safetensors",
610
+ "model.layers.9.mlp.gate_up_proj.weight": "model-00003-of-00014.safetensors",
611
+ "model.layers.9.post_attention_layernorm.weight": "model-00003-of-00014.safetensors",
612
+ "model.layers.9.post_mlp_layernorm.weight": "model-00003-of-00014.safetensors",
613
+ "model.layers.9.post_self_attn_layernorm.weight": "model-00003-of-00014.safetensors",
614
+ "model.layers.9.self_attn.k_proj.weight": "model-00003-of-00014.safetensors",
615
+ "model.layers.9.self_attn.o_proj.weight": "model-00003-of-00014.safetensors",
616
+ "model.layers.9.self_attn.q_proj.weight": "model-00003-of-00014.safetensors",
617
+ "model.layers.9.self_attn.v_proj.weight": "model-00003-of-00014.safetensors",
618
+ "model.norm.weight": "model-00014-of-00014.safetensors"
619
+ }
620
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|endoftext|>",
4
+ "[MASK]",
5
+ "[gMASK]",
6
+ "[sMASK]",
7
+ "<sop>",
8
+ "<eop>",
9
+ "<|system|>",
10
+ "<|user|>",
11
+ "<|assistant|>",
12
+ "<|observation|>",
13
+ "<|begin_of_image|>",
14
+ "<|end_of_image|>",
15
+ "<|begin_of_video|>",
16
+ "<|end_of_video|>"
17
+ ],
18
+ "eos_token": {
19
+ "content": "<|endoftext|>",
20
+ "lstrip": false,
21
+ "normalized": false,
22
+ "rstrip": false,
23
+ "single_word": false
24
+ },
25
+ "pad_token": {
26
+ "content": "<|endoftext|>",
27
+ "lstrip": false,
28
+ "normalized": false,
29
+ "rstrip": false,
30
+ "single_word": false
31
+ }
32
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76ebeac0d8bd7879ead7b43c16b44981f277e47225de2bd7de9ae1a6cc664a8c
3
+ size 19966496
tokenizer_config.json ADDED
@@ -0,0 +1,146 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "151329": {
4
+ "content": "<|endoftext|>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "151330": {
12
+ "content": "[MASK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "151331": {
20
+ "content": "[gMASK]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "151332": {
28
+ "content": "[sMASK]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "151333": {
36
+ "content": "<sop>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "151334": {
44
+ "content": "<eop>",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ },
51
+ "151335": {
52
+ "content": "<|system|>",
53
+ "lstrip": false,
54
+ "normalized": false,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": true
58
+ },
59
+ "151336": {
60
+ "content": "<|user|>",
61
+ "lstrip": false,
62
+ "normalized": false,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": true
66
+ },
67
+ "151337": {
68
+ "content": "<|assistant|>",
69
+ "lstrip": false,
70
+ "normalized": false,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": true
74
+ },
75
+ "151338": {
76
+ "content": "<|observation|>",
77
+ "lstrip": false,
78
+ "normalized": false,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": true
82
+ },
83
+ "151339": {
84
+ "content": "<|begin_of_image|>",
85
+ "lstrip": false,
86
+ "normalized": false,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": true
90
+ },
91
+ "151340": {
92
+ "content": "<|end_of_image|>",
93
+ "lstrip": false,
94
+ "normalized": false,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": true
98
+ },
99
+ "151341": {
100
+ "content": "<|begin_of_video|>",
101
+ "lstrip": false,
102
+ "normalized": false,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": true
106
+ },
107
+ "151342": {
108
+ "content": "<|end_of_video|>",
109
+ "lstrip": false,
110
+ "normalized": false,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": true
114
+ }
115
+ },
116
+ "additional_special_tokens": [
117
+ "<|endoftext|>",
118
+ "[MASK]",
119
+ "[gMASK]",
120
+ "[sMASK]",
121
+ "<sop>",
122
+ "<eop>",
123
+ "<|system|>",
124
+ "<|user|>",
125
+ "<|assistant|>",
126
+ "<|observation|>",
127
+ "<|begin_of_image|>",
128
+ "<|end_of_image|>",
129
+ "<|begin_of_video|>",
130
+ "<|end_of_video|>"
131
+ ],
132
+ "chat_template": "[gMASK]<sop><|system|>\n你是一个专业的深度研究助手,通过提供的工具与模拟浏览器交互,来帮助用户完成深度信息调研和报告撰写任务。今年是 2025 年。\n\n<核心要求>\n- 首先分解用户请求,得到包含多个子要求的列表\n- 制定初始研究计划\n- 进行多轮迭代搜索和页面浏览(at least 10 function calls):\n * 根据已获得的信息调整研究计划和关键词\n * 打开页面阅读,从发现的内容中识别新的关键概念/名词\n * 从搜索结果中提取新的关键词继续搜索\n * 访问并仔细阅读相关页面,识别新的关键概念/名词\n\n<重要配置>\n- 采用语言\n * 搜索关键词:英语\n * 思考:英语\n\n<可调用的工具列表>\n\n[{\"name\": \"search\", \"description\": \"Execute a search query and return search results. Use this function when you need to find information about a specific topic.\", \"parameters\": {\"type\": \"object\", \"properties\": {\"query\": {\"type\": \"string\", \"description\": \"Search query string, use English words unless it is a proper name in Chinese\"}}, \"required\": [\"query\"], \"additionalProperties\": false}}, {\"name\": \"click\", \"description\": \"Click a link in the search results and navigate to the corresponding page. Use this function when you need to view detailed content of a specific search result.\", \"parameters\": {\"type\": \"object\", \"properties\": {\"link_id\": {\"type\": \"integer\", \"description\": \"The link ID to click (from the sequence number in search results)\"}}, \"required\": [\"link_id\"], \"additionalProperties\": false}}, {\"name\": \"open\", \"description\": \"Open a specific website. Get content from any website with its URL.\", \"parameters\": {\"type\": \"object\", \"properties\": {\"url\": {\"type\": \"string\", \"description\": \"The target website URL or domain\"}}, \"required\": [\"url\"], \"additionalProperties\": false}}, {\"name\": \"finish\", \"description\": \"Finish the task. Use this function when you have found the information you need.\", \"parameters\": {\"type\": \"object\", \"properties\": {}, \"additionalProperties\": false}}]\n\n{%- for message in messages if message.role != 'system' %}{%- set role = message['role'] %}{%- set content = message['content'] %}{%- set visible = content.split('</think>')[-1].strip() %}{%- set meta = message.get(\"metadata\", \"\") %}{%- if role == 'user' %}<|user|>\n{{ visible }}{%- elif role == 'assistant' and not meta %}<|assistant|>\n{{ visible }}{%- elif role == 'assistant' and meta %}<|assistant|>{{ meta }} \n{{ visible }}{%- elif role == 'observation' %}<|observation|>\n{{ visible }}{%- endif %}{%- endfor %}{% if add_generation_prompt %}<|assistant|>{% endif %}",
133
+ "clean_up_tokenization_spaces": false,
134
+ "do_lower_case": false,
135
+ "eos_token": "<|endoftext|>",
136
+ "extra_special_tokens": {},
137
+ "model_input_names": [
138
+ "input_ids",
139
+ "attention_mask"
140
+ ],
141
+ "model_max_length": 128000,
142
+ "pad_token": "<|endoftext|>",
143
+ "padding_side": "left",
144
+ "remove_space": false,
145
+ "tokenizer_class": "PreTrainedTokenizer"
146
+ }