kundeshwar20 commited on
Commit
d4581ce
·
verified ·
1 Parent(s): ab531a4

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,193 +1,57 @@
1
  ---
2
- language:
3
- - hi
4
- - en
5
- base_model:
6
- - bharatgenai/Param-1-2.9B-Instruct
7
- pipeline_tag: text-generation
8
  tags:
9
- - Ayurvedic
 
 
 
10
  ---
11
 
12
- <div align="center">
13
- <img src="https://huggingface.co/bharatgenai/Param-1-2.9B-Instruct/resolve/main/BharatGen%20Logo%20(1).png" width="60%" alt="BharatGen" />
14
- </div>
15
- <hr>
16
- <div align="center">
17
- <a href="#" style="margin: 4px; pointer-events: none; cursor: default;">
18
- <img alt="Paper" src="https://img.shields.io/badge/Paper-Coming%20Soon-lightgrey?style=flat" />
19
- </a>
20
- <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank" style="margin: 4px;">
21
- <img alt="License" src="https://img.shields.io/badge/License-CC--BY--4.0-blue.svg" />
22
- </a>
23
- <a href="#" target="_blank" style="margin: 4px;">
24
- <img alt="Blog" src="https://img.shields.io/badge/Blog-Read%20More-brightgreen?style=flat" />
25
- </a>
26
- </div>
27
 
28
- # AyurParam
29
- BharatGen introduces AyurParam, a domain-specialized large language model fine-tuned from Param-1-2.9B-Instruct on a high-quality Ayurveda dataset. It is designed to handle Ayurvedic queries, classical text interpretation, clinical guidance, and wellness knowledge. Ayurveda offers vast traditional medical wisdom, yet most language models lack domain-specific understanding. AyurParam bridges this gap by combining Param-1’s bilingual strengths with a curated Ayurvedic knowledge base, enabling contextually rich and culturally grounded responses.
30
 
31
- ## 🏗 Model Architecture
32
- AyurParam inherits the architecture of Param-1-2.9B-Instruct:
33
- * Hidden size: 204
34
- * Intermediate size: 7168
35
- * Attention heads: 16
36
- * Hidden layers: 32
37
- * Key-value heads: 8
38
- * Max position embeddings: 2048
39
- * Activation: SiLU
40
- * Positional Embeddings: Rotary (RoPE, theta=10000)
41
- * Attention Mechanism: Grouped-query attention
42
- * Precision: bf16-mixed
43
- * Base model: Param-1-2.9B-Instruct
44
 
45
- ## 📚 Data Preparation
46
- AyurParam’s training corpus was carefully crafted to ensure deep Ayurvedic knowledge, Sanskrit/English bilingual accessibility, and clinical relevance.
47
- Steps involved:
48
- 1. Source Gathering
49
- * 15k+ passages from classical Ayurvedic texts (digitized and curated).
50
- * 10k+ passages from AYUSH ministry guidelines, research papers, and clinical case discussions.
51
- 2. Question Generation
52
- * 5 curated Q&A pairs generated per passage using an open-source LLM + domain expert review.
53
- 3. Domain Taxonomy & Personas
54
- * Built an Ayurveda-specific taxonomy (Dosha, Dhatu, Mala, Srotas, Nidana, Chikitsa, etc.).
55
- * Defined multiple personas: student, vaidya (physician), researcher, policymaker, wellness coach.
56
- 4. Dataset Construction
57
- * 1.5M Q&A pairs grounded in taxonomy and personas.
58
- * 4M multi-turn conversation samples created.
59
- * Sanskrit terminology preserved with transliteration and explanations.
60
-
61
-
62
- ## 🏋️ Training Setup
63
- * Base model: Param-1-2.9B-Instruct
64
- * Training framework: Hugging Face + TRL (SFT) + torchrun multi-node setup
65
- * Prompt template: Custom-designed for Ayurvedic inference
66
- * Scheduler: Linear with warmup
67
- * Epochs: 3
68
- * Total training samples: ~8M
69
- * Test samples: ~800k
70
- * Base learning rate: 5e-6
71
- * Minimum learning rate: 0
72
- * Additional tokens: <user>, <assistant>, <context>, <system_prompt>
73
- * Vocab size: 256k + 4
74
- * Global batch size: 1024
75
- * Micro batch size: 4
76
- * Gradient accumulation steps: 32
77
-
78
-
79
- ## 🚀 Inference Example
80
  ```python
81
- from transformers import AutoTokenizer, AutoModelForCausalLM
82
- import torch
83
-
84
- model_name = "bharatgenai/AyurParam"
85
- tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=False)
86
- model = AutoModelForCausalLM.from_pretrained(
87
- model_name,
88
- trust_remote_code=True,
89
- torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.bfloat32,
90
- device_map="auto"
91
- )
92
 
93
- # Example Ayurvedic query
94
- user_input = "What is the Samprapti (pathogenesis) of Amavata according to Ayurveda?"
95
-
96
- # Prompt styles
97
- # 1. Generic QA: <user> ... <assistant>
98
- # 2. Context-based QA: <context> ... <user> ... <assistant>
99
- # 3. Multi-turn conversation (supports up to 5 turns): <user> ... <assistant> ... <user> ... <assistant>
100
-
101
- prompt = f"<user> {user_input} <assistant>"
102
- inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
103
-
104
- with torch.no_grad():
105
- output = model.generate(
106
- **inputs,
107
- max_new_tokens=300,
108
- do_sample=True,
109
- top_k=50,
110
- top_p=0.95,
111
- temperature=0.6,
112
- eos_token_id=tokenizer.eos_token_id,
113
- use_cache=False
114
- )
115
-
116
- print(tokenizer.decode(output[0], skip_special_tokens=True))
117
  ```
118
 
 
119
 
120
- ## 📊 Benchmark Results: Ayur Param vs Baselines
121
- - [BhashaBench-Ayur benchmark](https://huggingface.co/datasets/bharatgenai/BhashaBench-Ayur)
122
- ---
123
 
124
- ## 1. Overall Performance
125
 
126
- ### Similar Range Models
127
- | Model | bba | bba_English | bba_Hindi |
128
- |-----------------------|-------|-------------|-----------|
129
- | Llama-3.2-1B-Instruct | 26.41 | 26.77 | 25.82 |
130
- | Qwen2.5-3B-Instruct | 32.68 | 35.22 | 28.46 |
131
- | granite-3.1-2b | 31.10 | 33.39 | 27.30 |
132
- | Llama-3.2-3B-Instruct | 33.20 | 35.31 | 29.67 |
133
- | gemma-2-2b-it | 28.40 | 29.38 | 26.79 |
134
- | **AyurParam** | **39.97** | **41.12** | **38.04** |
135
 
136
- ### Larger Models
137
- | Model | bba | bba_English | bba_Hindi |
138
- |-----------------------------------------|-------|-------------|-----------|
139
- | Indic-gemma-7B-Navarasa-2.0 | 35.13 | 37.12 | 31.83 |
140
- | Pangea-7B | 37.41 | 40.69 | 31.93 |
141
- | aya-23-8B | 31.97 | 33.84 | 28.87 |
142
- | gpt-oss-20b | 36.34 | 38.30 | 33.09 |
143
- | Llama-3.1-8B-Instruct | 34.76 | 36.86 | 31.26 |
144
- | gemma-2-27b-it | 37.99 | 40.45 | 33.89 |
145
- | Nemotron-4-Mini-Hindi-4B-Instruct | 33.54 | 33.38 | 33.82 |
146
- | **AyurParam** | **39.97** | **41.12** | **38.04** |
147
 
148
- ---
149
-
150
- ## 2. Question Difficulty
151
-
152
- ### Similar Range Models
153
- | Difficulty | Llama-3.2-1B | Qwen2.5-3B | granite-3.1-2b | Llama-3.2-3B | gemma-2-2b-it | **AyurParam** |
154
- |------------|--------------|------------|----------------|--------------|---------------|----------------|
155
- | **Easy** | 27.44 | 35.55 | 33.90 | 36.42 | 29.96 | **43.93** |
156
- | **Medium** | 25.23 | 29.57 | 28.06 | 29.66 | 26.83 | **35.95** |
157
- | **Hard** | 25.39 | 28.23 | 26.81 | 28.51 | 24.96 | **31.21** |
158
 
159
- ### Larger Models
160
- | Difficulty | Indic-gemma-7B | Pangea-7B | aya-23-8B | gpt-oss-20b | Llama-3.1-8B | gemma-2-27b-it | Nemotron-4-Mini-Hindi-4B | **AyurParam** |
161
- |------------|----------------|-----------|-----------|-------------|--------------|----------------|--------------------------|----------------|
162
- | **Easy** | 38.54 | 41.45 | 35.51 | 42.03 | 39.43 | 43.47 | 36.08 |**43.93** |
163
- | **Medium** | 31.72 | 32.94 | 28.29 | 30.27 | 29.36 | 31.90 | 30.80 |**35.95** |
164
- | **Hard** | 27.23 | 31.77 | 25.11 | 26.67 | 30.50 | 30.78 | 29.50 |**31.21** |
165
-
166
- ---
167
 
168
- ## 3. Question Type
169
-
170
- ### Similar Range Models
171
- | Type | Llama-3.2-1B | Qwen2.5-3B | granite-3.1-2b | Llama-3.2-3B | gemma-2-2b-it | **AyurParam** |
172
- |----------------------|--------------|------------|----------------|--------------|---------------|----------------|
173
- | Assertion/Reasoning | 59.26 | 51.85 | 33.33 | 40.74 | 33.33 | **44.44** |
174
- | Fill in the blanks | 26.97 | 29.21 | 21.35 | 34.83 | 32.02 | **29.78** |
175
- | MCQ | 26.34 | 32.70 | 31.22 | 33.17 | 28.33 | **40.12** |
176
- | Match the column | 26.83 | 29.27 | 29.27 | 29.27 | 36.59 | **24.39** |
177
-
178
- ### Larger Models
179
- | Type | Indic-gemma-7B | Pangea-7B | aya-23-8B | gpt-oss-20b | Llama-3.1-8B | gemma-2-27b-it | Nemotron-4-Mini-Hindi-4B | **AyurParam** |
180
- |----------------------|----------------|-----------|-----------|-------------|--------------|----------------|--------------------------|----------------|
181
- | Assertion/Reasoning | 59.26 | 62.96 | 18.52 | 25.93 | 29.63 | 55.56 | 37.04 | **44.44** |
182
- | Fill in the blanks | 35.39 | 24.16 | 30.90 | 32.02 | 26.97 | 35.96 | 30.34 | **29.78** |
183
- | MCQ | 35.10 | 37.53 | 32.05 | 36.39 | 34.83 | 37.98 | 33.60 | **40.12** |
184
- | Match the column | 31.71 | 34.15 | 17.07 | 46.34 | 46.34 | 39.02 | 24.39 | **24.39** |
185
-
186
- ---
187
- From the above results, **Ayur Param not only outperforms all similar-sized models** but also achieves **competitive or better performance than larger models** across multiple metrics.
188
 
189
 
190
- ## Contact
191
- For any questions or feedback, please contact:
192
- - Sravan Kumar ([email protected])
193
- - Kundeshwar Pundalik (kundeshwar.pundalik@tihiitb.org)
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: transformers
3
+ model_name: output_dir_no_thinking_ayurveda
 
 
 
 
4
  tags:
5
+ - generated_from_trainer
6
+ - sft
7
+ - trl
8
+ licence: license
9
  ---
10
 
11
+ # Model Card for output_dir_no_thinking_ayurveda
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
+ This model is a fine-tuned version of [None](https://huggingface.co/None).
14
+ It has been trained using [TRL](https://github.com/huggingface/trl).
15
 
16
+ ## Quick start
 
 
 
 
 
 
 
 
 
 
 
 
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ```python
19
+ from transformers import pipeline
 
 
 
 
 
 
 
 
 
 
20
 
21
+ question = "What is the Samprapti of Amavata according to Ayurveda?"
22
+ generator = pipeline("text-generation", model="None", device="cuda")
23
+ output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
24
+ print(output["generated_text"])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  ```
26
 
27
+ ## Training procedure
28
 
29
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/bgpt-iitb/ift-training-ayurveda/runs/a94yxz4s)
 
 
30
 
 
31
 
32
+ This model was trained with SFT.
 
 
 
 
 
 
 
 
33
 
34
+ ### Framework versions
 
 
 
 
 
 
 
 
 
 
35
 
36
+ - TRL: 0.19.1
37
+ - Transformers: 4.53.1
38
+ - Pytorch: 2.7.0
39
+ - Datasets: 3.6.0
40
+ - Tokenizers: 0.21.2
 
 
 
 
 
41
 
42
+ ## Citations
 
 
 
 
 
 
 
43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
 
46
+ Cite TRL as:
47
+
48
+ ```bibtex
49
+ @misc{vonwerra2022trl,
50
+ title = {{TRL: Transformer Reinforcement Learning}},
51
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
52
+ year = 2020,
53
+ journal = {GitHub repository},
54
+ publisher = {GitHub},
55
+ howpublished = {\url{https://github.com/huggingface/trl}}
56
+ }
57
+ ```
chat_template_bck.jinja ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {{'<extra_id_0>System'}}{% for message in messages %}{% if message['role'] == 'system' %}{{'
2
+ ' + message['content'].strip()}}{% if tools or contexts %}{{'
3
+ '}}{% endif %}{% endif %}{% endfor %}{% if tools %}{% for tool in tools %}{{ '
4
+ <tool> ' + tool.strip() + ' </tool>' }}{% endfor %}{% endif %}{% if contexts %}{% if tools %}{{'
5
+ '}}{% endif %}{% for context in contexts %}{{ '
6
+ <context> ' + context.strip() + ' </context>' }}{% endfor %}{% endif %}{{'
7
+
8
+ '}}{% for message in messages %}{% if message['role'] == 'user' %}{{ '<extra_id_1>User
9
+ ' + message['content'].strip() + '
10
+ ' }}{% elif message['role'] == 'assistant' %}{{ '<extra_id_1>Assistant
11
+ ' + message['content'].strip() + '
12
+ ' }}{% elif message['role'] == 'tool' %}{{ '<extra_id_1>Tool
13
+ ' + message['content'].strip() + '
14
+ ' }}{% endif %}{% endfor %}{{'<extra_id_1>Assistant
15
+ '}}
config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 2,
8
+ "eos_token_id": 3,
9
+ "head_dim": 128,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 2048,
12
+ "initializer_range": 0.01,
13
+ "intermediate_size": 7168,
14
+ "max_position_embeddings": 2048,
15
+ "mlp_bias": false,
16
+ "model_type": "llama",
17
+ "num_attention_heads": 16,
18
+ "num_hidden_layers": 32,
19
+ "num_key_value_heads": 8,
20
+ "pretraining_tp": 1,
21
+ "rms_norm_eps": 1e-05,
22
+ "rope_scaling": null,
23
+ "rope_theta": 10000.0,
24
+ "tie_word_embeddings": false,
25
+ "torch_dtype": "float32",
26
+ "transformers_version": "4.53.1",
27
+ "use_cache": true,
28
+ "vocab_size": 256006
29
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 2,
4
+ "eos_token_id": 3,
5
+ "transformers_version": "4.53.1"
6
+ }
model-00001-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f9956b761cb689bed51e7a2713c0a08823520244760318a6cb482a9e07fe3d3c
3
+ size 4983092120
model-00002-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2dedc69b1641247f6fd0f50370b3cf2306ea5824481d1646aa1c3f42e49491bb
3
+ size 4362432144
model-00003-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a669c9ba882c812afd0d5582ffb7fa751e2d5cef4f1864991bed16146ce76d94
3
+ size 2097201280
model.safetensors.index.json ADDED
@@ -0,0 +1,299 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_parameters": 2860673024,
4
+ "total_size": 11442692096
5
+ },
6
+ "weight_map": {
7
+ "lm_head.weight": "model-00003-of-00003.safetensors",
8
+ "model.embed_tokens.weight": "model-00001-of-00003.safetensors",
9
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00003.safetensors",
10
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
11
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
12
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
13
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
14
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
15
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
16
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
17
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
18
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00003.safetensors",
19
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
20
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
21
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
22
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
23
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
24
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
25
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
26
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
27
+ "model.layers.10.input_layernorm.weight": "model-00001-of-00003.safetensors",
28
+ "model.layers.10.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
29
+ "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
30
+ "model.layers.10.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
31
+ "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
32
+ "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
33
+ "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
34
+ "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
35
+ "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
36
+ "model.layers.11.input_layernorm.weight": "model-00001-of-00003.safetensors",
37
+ "model.layers.11.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
38
+ "model.layers.11.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
39
+ "model.layers.11.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
40
+ "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
41
+ "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
42
+ "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
43
+ "model.layers.11.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
44
+ "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
45
+ "model.layers.12.input_layernorm.weight": "model-00002-of-00003.safetensors",
46
+ "model.layers.12.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
47
+ "model.layers.12.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
48
+ "model.layers.12.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
49
+ "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
50
+ "model.layers.12.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
51
+ "model.layers.12.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
52
+ "model.layers.12.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
53
+ "model.layers.12.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
54
+ "model.layers.13.input_layernorm.weight": "model-00002-of-00003.safetensors",
55
+ "model.layers.13.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
56
+ "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
57
+ "model.layers.13.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
58
+ "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
59
+ "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
60
+ "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
61
+ "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
62
+ "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
63
+ "model.layers.14.input_layernorm.weight": "model-00002-of-00003.safetensors",
64
+ "model.layers.14.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
65
+ "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
66
+ "model.layers.14.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
67
+ "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
68
+ "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
69
+ "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
70
+ "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
71
+ "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
72
+ "model.layers.15.input_layernorm.weight": "model-00002-of-00003.safetensors",
73
+ "model.layers.15.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
74
+ "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
75
+ "model.layers.15.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
76
+ "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
77
+ "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
78
+ "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
79
+ "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
80
+ "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
81
+ "model.layers.16.input_layernorm.weight": "model-00002-of-00003.safetensors",
82
+ "model.layers.16.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
83
+ "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
84
+ "model.layers.16.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
85
+ "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
86
+ "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
87
+ "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
88
+ "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
89
+ "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
90
+ "model.layers.17.input_layernorm.weight": "model-00002-of-00003.safetensors",
91
+ "model.layers.17.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
92
+ "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
93
+ "model.layers.17.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
94
+ "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
95
+ "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
96
+ "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
97
+ "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
98
+ "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
99
+ "model.layers.18.input_layernorm.weight": "model-00002-of-00003.safetensors",
100
+ "model.layers.18.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
101
+ "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
102
+ "model.layers.18.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
103
+ "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
104
+ "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
105
+ "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
106
+ "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
107
+ "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
108
+ "model.layers.19.input_layernorm.weight": "model-00002-of-00003.safetensors",
109
+ "model.layers.19.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
110
+ "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
111
+ "model.layers.19.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
112
+ "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
113
+ "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
114
+ "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
115
+ "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
116
+ "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
117
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00003.safetensors",
118
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
119
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
120
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
121
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
122
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
123
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
124
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
125
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
126
+ "model.layers.20.input_layernorm.weight": "model-00002-of-00003.safetensors",
127
+ "model.layers.20.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
128
+ "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
129
+ "model.layers.20.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
130
+ "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
131
+ "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
132
+ "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
133
+ "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
134
+ "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
135
+ "model.layers.21.input_layernorm.weight": "model-00002-of-00003.safetensors",
136
+ "model.layers.21.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
137
+ "model.layers.21.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
138
+ "model.layers.21.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
139
+ "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
140
+ "model.layers.21.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
141
+ "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
142
+ "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
143
+ "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
144
+ "model.layers.22.input_layernorm.weight": "model-00002-of-00003.safetensors",
145
+ "model.layers.22.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
146
+ "model.layers.22.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
147
+ "model.layers.22.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
148
+ "model.layers.22.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
149
+ "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
150
+ "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
151
+ "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
152
+ "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
153
+ "model.layers.23.input_layernorm.weight": "model-00002-of-00003.safetensors",
154
+ "model.layers.23.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
155
+ "model.layers.23.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
156
+ "model.layers.23.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
157
+ "model.layers.23.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
158
+ "model.layers.23.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
159
+ "model.layers.23.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
160
+ "model.layers.23.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
161
+ "model.layers.23.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
162
+ "model.layers.24.input_layernorm.weight": "model-00002-of-00003.safetensors",
163
+ "model.layers.24.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
164
+ "model.layers.24.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
165
+ "model.layers.24.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
166
+ "model.layers.24.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
167
+ "model.layers.24.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
168
+ "model.layers.24.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
169
+ "model.layers.24.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
170
+ "model.layers.24.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
171
+ "model.layers.25.input_layernorm.weight": "model-00002-of-00003.safetensors",
172
+ "model.layers.25.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
173
+ "model.layers.25.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
174
+ "model.layers.25.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
175
+ "model.layers.25.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
176
+ "model.layers.25.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
177
+ "model.layers.25.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
178
+ "model.layers.25.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
179
+ "model.layers.25.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
180
+ "model.layers.26.input_layernorm.weight": "model-00002-of-00003.safetensors",
181
+ "model.layers.26.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
182
+ "model.layers.26.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
183
+ "model.layers.26.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
184
+ "model.layers.26.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
185
+ "model.layers.26.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
186
+ "model.layers.26.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
187
+ "model.layers.26.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
188
+ "model.layers.26.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
189
+ "model.layers.27.input_layernorm.weight": "model-00002-of-00003.safetensors",
190
+ "model.layers.27.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
191
+ "model.layers.27.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
192
+ "model.layers.27.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
193
+ "model.layers.27.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
194
+ "model.layers.27.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
195
+ "model.layers.27.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
196
+ "model.layers.27.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
197
+ "model.layers.27.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
198
+ "model.layers.28.input_layernorm.weight": "model-00002-of-00003.safetensors",
199
+ "model.layers.28.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
200
+ "model.layers.28.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
201
+ "model.layers.28.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
202
+ "model.layers.28.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
203
+ "model.layers.28.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
204
+ "model.layers.28.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
205
+ "model.layers.28.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
206
+ "model.layers.28.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
207
+ "model.layers.29.input_layernorm.weight": "model-00002-of-00003.safetensors",
208
+ "model.layers.29.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
209
+ "model.layers.29.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
210
+ "model.layers.29.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
211
+ "model.layers.29.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
212
+ "model.layers.29.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
213
+ "model.layers.29.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
214
+ "model.layers.29.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
215
+ "model.layers.29.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
216
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00003.safetensors",
217
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
218
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
219
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
220
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
221
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
222
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
223
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
224
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
225
+ "model.layers.30.input_layernorm.weight": "model-00002-of-00003.safetensors",
226
+ "model.layers.30.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
227
+ "model.layers.30.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
228
+ "model.layers.30.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
229
+ "model.layers.30.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
230
+ "model.layers.30.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
231
+ "model.layers.30.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
232
+ "model.layers.30.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
233
+ "model.layers.30.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
234
+ "model.layers.31.input_layernorm.weight": "model-00002-of-00003.safetensors",
235
+ "model.layers.31.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
236
+ "model.layers.31.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
237
+ "model.layers.31.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
238
+ "model.layers.31.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
239
+ "model.layers.31.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
240
+ "model.layers.31.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
241
+ "model.layers.31.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
242
+ "model.layers.31.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
243
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00003.safetensors",
244
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
245
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
246
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
247
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
248
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
249
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
250
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
251
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
252
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00003.safetensors",
253
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
254
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
255
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
256
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
257
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
258
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
259
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
260
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
261
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00003.safetensors",
262
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
263
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
264
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
265
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
266
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
267
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
268
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
269
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
270
+ "model.layers.7.input_layernorm.weight": "model-00001-of-00003.safetensors",
271
+ "model.layers.7.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
272
+ "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
273
+ "model.layers.7.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
274
+ "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
275
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
276
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
277
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
278
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
279
+ "model.layers.8.input_layernorm.weight": "model-00001-of-00003.safetensors",
280
+ "model.layers.8.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
281
+ "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
282
+ "model.layers.8.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
283
+ "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
284
+ "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
285
+ "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
286
+ "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
287
+ "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
288
+ "model.layers.9.input_layernorm.weight": "model-00001-of-00003.safetensors",
289
+ "model.layers.9.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
290
+ "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
291
+ "model.layers.9.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
292
+ "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
293
+ "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
294
+ "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
295
+ "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
296
+ "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
297
+ "model.norm.weight": "model-00002-of-00003.safetensors"
298
+ }
299
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ {
4
+ "content": "<context>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ },
10
+ {
11
+ "content": "<user>",
12
+ "lstrip": false,
13
+ "normalized": false,
14
+ "rstrip": false,
15
+ "single_word": false
16
+ },
17
+ {
18
+ "content": "<assistant>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ {
25
+ "content": "<system_prompt>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ },
31
+ {
32
+ "content": "<actual_response>",
33
+ "lstrip": false,
34
+ "normalized": false,
35
+ "rstrip": false,
36
+ "single_word": false
37
+ },
38
+ {
39
+ "content": "</actual_response>",
40
+ "lstrip": false,
41
+ "normalized": false,
42
+ "rstrip": false,
43
+ "single_word": false
44
+ }
45
+ ],
46
+ "bos_token": {
47
+ "content": "<s>",
48
+ "lstrip": false,
49
+ "normalized": false,
50
+ "rstrip": false,
51
+ "single_word": false
52
+ },
53
+ "eos_token": {
54
+ "content": "</actual_response>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false
59
+ },
60
+ "pad_token": "</s>"
61
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ad2997f2eed1a7113a98d652ad2a546bbd64746ea227614d232ab6172312b105
3
+ size 34810848
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json.bak ADDED
The diff for this file is too large to render. See raw diff
 
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a604b5bd466fb30d1d84d4f4ac264c0baac426899a77030c590662df198875f3
3
+ size 6225
training_metrics.csv ADDED
The diff for this file is too large to render. See raw diff