DreamGenX commited on
Commit
a9bb756
·
verified ·
0 Parent(s):

dreamgen/opus-v1-34b

Browse files
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,166 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - unsloth
7
+ - axolotl
8
+ ---
9
+
10
+ # DreamGen Opus V1
11
+
12
+ <div style="display: flex; flex-direction: row; align-items: center;">
13
+ <img src="/dreamgen/opus-v1-34b/resolve/main/images/logo-1024.png" alt="model logo" style="
14
+ border-radius: 12px;
15
+ margin-right: 12px;
16
+ margin-top: 0px;
17
+ margin-bottom: 0px;
18
+ max-width: 100px;
19
+ height: auto;
20
+ "/>
21
+
22
+ Models for **(steerable) story-writing and role-playing**.
23
+ <br/>[All Opus V1 models, including quants](https://huggingface.co/collections/dreamgen/opus-v1-65d092a6f8ab7fc669111b31).
24
+
25
+ </div>
26
+
27
+ ## Resources
28
+
29
+ - [**Opus V1 prompting guide**](https://dreamgen.com/docs/models/opus/v1) with many (interactive) examples and prompts that you can copy.
30
+ - [**Google Colab**](https://colab.research.google.com/drive/1J178fH6IdQOXNi-Njgdacf5QgAxsdT20?usp=sharing) for interactive role-play using `opus-v1.2-7b`.
31
+ - [Python code](example/prompt/format.py) to format the prompt correctly.
32
+
33
+ <img src="/dreamgen/opus-v1-34b/resolve/main/images/story_writing.webp" alt="story writing on dreamgen.com" style="
34
+ padding: 12px;
35
+ border-radius: 12px;
36
+ border: 2px solid #f9a8d4;
37
+ background: rgb(9, 9, 11);
38
+ "/>
39
+
40
+ ## Prompting
41
+
42
+ <details>
43
+ <summary>The models use an extended version of ChatML.</summary>
44
+
45
+ ```
46
+ <|im_start|>system
47
+ (Story description in the right format here)
48
+ (Typically consists of plot description, style description and characters)<|im_end|>
49
+ <|im_start|>user
50
+ (Your instruction on how the story should continue)<|im_end|>
51
+ <|im_start|>text names= Alice
52
+ (Continuation of the story from the Alice character)<|im_end|>
53
+ <|im_start|>text
54
+ (Continuation of the story from no character in particular (pure narration))<|im_end|>
55
+ <|im_start|>user
56
+ (Your instruction on how the story should continue)<|im_end|>
57
+ <|im_start|>text names= Bob
58
+ (Continuation of the story from the Bob character)<|im_end|>
59
+ ```
60
+
61
+ The Opus V1 extension is the addition of the `text` role, and the addition / modification of role names.
62
+
63
+ Pay attention to the following:
64
+
65
+ - The `text` messages can (but don't have to have) `names`, names are used to indicate the "active" character during role-play.
66
+ - There can be multiple subsequent message with a `text` role, especially if names are involved.
67
+ - There can be multiple names attached to a message.
68
+ - The format for names is `names= {{name[0]}}; {{name[1]}}`, beware of the spaces after `names=` and after the `;`. This spacing leads to most natural tokenization for the names.
69
+ </details>
70
+
71
+ While the main goal for the models is great story-writing and role-playing performance, the models are also capable of several writing related tasks as well as general assistance.
72
+
73
+ Here's how you can prompt the model for the following tasks
74
+
75
+ - Steerable [Story-writing](https://dreamgen.com/docs/models/opus/v1#task-story-writing) and [Role-playing](https://dreamgen.com/docs/models/opus/v1#task-role-playing):
76
+ - Input:
77
+ - System prompt: You provide story / role-play description, which consists of:
78
+ - Plot description
79
+ - Style description
80
+ - Characters and their descriptions
81
+ - Conversation turns:
82
+ - Text / message turn: This represents part of the story or role play
83
+ - Instruction: This tells the model what should happen next
84
+ - Output: Continuation of the story / role-play.
85
+ - [Story plot summarization](https://dreamgen.com/docs/models/opus/v1#task-plot-description)
86
+ - Input: A story, or a few chapters of a story.
87
+ - Output: A description of the story or chapters.
88
+ - [Story character description](https://dreamgen.com/docs/models/opus/v1#task-char-description)
89
+ - Input: A story, or a few chapters of a story, set of characters.
90
+ - Output: A description of the characters.
91
+ - [Story style description](https://dreamgen.com/docs/models/opus/v1#task-style-description)
92
+ - Input: A story, or a few chapters of a story.
93
+ - Output: A description the style of the story.
94
+ - [Story description to chapters](https://dreamgen.com/docs/models/opus/v1#task-story-description-to-chapter-descriptions)
95
+ - Input: A brief plot description and the desired number of chapters.
96
+ - Output: A description for each chapter.
97
+ - And more...
98
+
99
+ ### Sampling params
100
+
101
+ For story-writing and role-play, I recommend "Min P" based sampling with `min_p` in the range `[0.01, 0.1]` and with `temperature` in the range `[0.5, 1.5]`, depending on your preferences. A good starting point would be `min_p=0.1; temperature=0.8`.
102
+
103
+ You may also benefit from setting presence, frequency and repetition penalties, especially at lower temperatures.
104
+
105
+ ## Dataset
106
+
107
+ The fine-tuning dataset consisted of ~100M tokens of steerable story-writing, role-playing, writing-assistant and general-assistant examples. Each example was up to 31000 tokens long.
108
+
109
+ All story-writing and role-playing examples were based on human-written text.
110
+
111
+ ![token count distribution](images/token_count_cum__token_bucket.png)
112
+
113
+ ## Running the model
114
+
115
+ The model is should be compatible with any software that supports the base model, but beware of prompting and tokenization.
116
+
117
+ I recommend using these model versions:
118
+
119
+ - 7B: [no quant (opus-v1.2-7b)](https://huggingface.co/dreamgen/opus-v1.2-7b)
120
+ - 34B: [no quant (opus-v1-34b)](https://huggingface.co/dreamgen/opus-v1-34b) or [awq (opus-v1-34b-awq)](https://huggingface.co/dreamgen/opus-v1-34b-awq)
121
+
122
+ ### Running on DreamGen.com (free)
123
+
124
+ You can try the model for free on [dreamgen.com](https://dreamgen.com) — note that an account is required.
125
+
126
+ ### Running Locally
127
+
128
+ - **Make sure your prompt is as close as possible to the Opus V1**
129
+ - Regardless of which backend you use, it's important that you format your prompt well and that the tokenization works correctly.
130
+ - [Read the prompt guide](https://dreamgen.com/docs/models/opus/v1)
131
+ - [Read the prompt formatting code](example/prompt/format.py)
132
+ - Make sure `<|im_start|>` and `<|im_end|>` are tokenized correctly
133
+ - **vLLM**
134
+ - [**Google Colab**](https://colab.research.google.com/drive/1J178fH6IdQOXNi-Njgdacf5QgAxsdT20?usp=sharing): This is a simple interactive Google Colab to do role-play with the 7B model, it should fit on the T4 GPU.
135
+ - [Code](example/prompt/interactive.py): This is simple script for interactive chat for one hard-coded scenario.
136
+ - **SillyTavern**
137
+ - [Settings](https://huggingface.co/dreamgen/opus-v1-34b/tree/main/configs/silly_tavern), v2 kindly provided by @MarinaraSpaghetti
138
+ - [Settings screenshot](configs/silly_tavern/settings_screenshot.webp)
139
+ - This is just an attempt at approximating the Opus V1 prompt, it won't be perfect
140
+ - **LM Studio**
141
+ - [Config](configs/lmstudio/preset.json)
142
+ - Just like ChatML, just changed "assistant" to "text" role.
143
+ - **HuggingFace**
144
+ - [Chat template](tokenizer_config.json#L51)
145
+ - Just like ChatML, just changed "assistant" to "text" role.
146
+
147
+ ## Known Issues
148
+
149
+ - **34B tokenization**:
150
+ - There seems to be a mismatch between the tokenizer of the base and fine-tuned model. It's unclear whether this also affected training, or whether it's just incorrectly saved tokenizer (you can see `tokenizer.json` was not saved ([bug report](https://github.com/OpenAccess-AI-Collective/axolotl/issues/1322))).
151
+ - This affects BOS and EOS (which aren't really used by Yi) and the tokenization of the first input token.
152
+ - Overall impact should be minor.
153
+ - **34B repetition**:
154
+ - The 34B sometimes gets stuck repeating the same word, or synonyms. This seems to be a common problem across various Yi 34B fine-tunes.
155
+ - **GGUF**:
156
+ - The conversion might be messed up and in my tests even `Q_8` of the `opus-v1.2-7b` is much worse than the `fp16` version.
157
+ - **Ooba**:
158
+ - The tokenization might be messed up. Some users reported that `<|im_start|>` and `<|im_end|>` are tokenized as multiple tokens.
159
+
160
+ ## Community
161
+
162
+ Join the DreamGen community on [**Discord**](https://dreamgen.com/discord) to get early access to new models.
163
+
164
+ ## License
165
+
166
+ - This model is intended for personal use only, other use is not permitted.
config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "dreamgen/opus-v1-34b",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 1,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 7168,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 20480,
14
+ "max_position_embeddings": 200000,
15
+ "model_type": "llama",
16
+ "num_attention_heads": 56,
17
+ "num_hidden_layers": 60,
18
+ "num_key_value_heads": 8,
19
+ "pad_token_id": 0,
20
+ "pretraining_tp": 1,
21
+ "rms_norm_eps": 1e-05,
22
+ "rope_scaling": null,
23
+ "rope_theta": 5000000.0,
24
+ "tie_word_embeddings": false,
25
+ "torch_dtype": "bfloat16",
26
+ "transformers_version": "4.37.0",
27
+ "use_cache": false,
28
+ "vocab_size": 64000
29
+ }
configs/lmstudio/preset.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "OpusV1StoryWriting",
3
+ "inference_params": {
4
+ "input_prefix": "<|im_end|>\n<|im_start|>user\n",
5
+ "input_suffix": "<|im_end|>\n<|im_start|>text\n",
6
+ "antiprompt": ["<|im_start|>", "<|im_end|>"],
7
+ "pre_prompt_prefix": "<|im_start|>system\n",
8
+ "pre_prompt_suffix": "",
9
+ "pre_prompt": "You are an intelligent, skilled, versatile writer.\n\nYour task is to write a story based on the information below.\n\n## Overall plot description:\n\n"
10
+ }
11
+ }
configs/silly_tavern/settings_screenshot.webp ADDED
configs/silly_tavern/v1/context_settings.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "story_string": "<|im_start|>system\nYou are an intelligent, skilled, versatile writer.\n\nYour task is to write a story based on the information below.\n\n\n## Overall plot description:\n\n{{scenario}}\n\n{{description}}\n\n\n## Characters:\n\n### {{char}}\n{{personality}}\n{{persona}}<|im_end|>",
3
+ "example_separator": "",
4
+ "chat_start": "",
5
+ "use_stop_strings": false,
6
+ "always_force_name2": false,
7
+ "trim_sentences": false,
8
+ "include_newline": false,
9
+ "single_line": false,
10
+ "name": "ChatMLOpusV1_ST2"
11
+ }
configs/silly_tavern/v1/instruct_mode_settings.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "system_prompt": "",
3
+ "input_sequence": "<|im_start|>text names= {{user}}\n",
4
+ "output_sequence": "<|im_end|>\n<|im_start|>text names= {{char}}\n",
5
+ "first_output_sequence": "",
6
+ "last_output_sequence": "",
7
+ "system_sequence_prefix": "",
8
+ "system_sequence_suffix": "",
9
+ "stop_sequence": "",
10
+ "separator_sequence": "<|im_end|>\n",
11
+ "wrap": false,
12
+ "macro": true,
13
+ "names": false,
14
+ "names_force_groups": false,
15
+ "activation_regex": "",
16
+ "name": "ChatMLOpusV1_ST1"
17
+ }
configs/silly_tavern/v2/context_settings.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "story_string": "<|im_start|>system\n{{#if system}}{{system}}\n\n\n{{/if}}## Overall plot description:\n\n{{#if wiBefore}}{{wiBefore}}\n\n{{/if}}{{#if scenario}}{{scenario}}\n\n\n{{/if}}## Characters:\n\n### {{char}}\n{{#if description}}{{description}}\n{{/if}}{{#if personality}}{{personality}}\n\n{{/if}}### {{user}}\n{{#if persona}}{{persona}}\n\n{{/if}}{{#if wiAfter}}{{wiAfter}}\n\n{{/if}}{{#if mesExamples}}## {{char}}'s example message:\n\n{{mesExamples}}{{/if}}",
3
+ "example_separator": "",
4
+ "chat_start": "",
5
+ "use_stop_strings": false,
6
+ "always_force_name2": false,
7
+ "trim_sentences": true,
8
+ "include_newline": false,
9
+ "single_line": false,
10
+ "name": "ChatMLOpusV1_ST2"
11
+ }
configs/silly_tavern/v2/instruct_mode_settings.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "system_prompt": "You are an intelligent, skilled, versatile writer.\n\nYour task is to write a role-play based on the information below.\n\n\n## Style description:\n\nThis role-play is written as a third-person introspective narrative in past tense. Scenes are described vividly, with great detail.",
3
+ "input_sequence": "<|im_end|>\n<|im_start|>text names= {{user}}\n",
4
+ "output_sequence": "<|im_end|>\n<|im_start|>text names= {{char}}\n",
5
+ "first_output_sequence": "",
6
+ "last_output_sequence": "<|im_end|>\n<|im_start|>user\nLength: 400 words\n{{char}} replies to {{user}} in detailed and elaborate way.<|im_end|>\n<|im_start|>text names= {{char}}\n",
7
+ "system_sequence_prefix": "",
8
+ "system_sequence_suffix": "",
9
+ "stop_sequence": "",
10
+ "separator_sequence": "",
11
+ "wrap": false,
12
+ "macro": true,
13
+ "names": false,
14
+ "names_force_groups": false,
15
+ "activation_regex": "",
16
+ "name": "ChatMLOpusV1_ST2"
17
+ }
example/interactive.py ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # python interactive.py
2
+
3
+ # %%
4
+
5
+ import fileinput
6
+
7
+
8
+ from vllm import LLM, SamplingParams
9
+
10
+ from prompt.format import (
11
+ format_opus_v1_prompt,
12
+ OpusV1Character,
13
+ OpusV1Prompt,
14
+ OpusV1StorySystemPrompt,
15
+ OpusV1Turn,
16
+ )
17
+
18
+
19
+ # %%
20
+
21
+
22
+ def main():
23
+ sampling_params = SamplingParams(
24
+ # I usually stay between 0.0 and 1.0, especially for the Yi models I found lower tends to be better.
25
+ # For assistant tasks, I usually use 0.0.
26
+ temperature=0.8,
27
+ min_p=0.05,
28
+ presence_penalty=0.1,
29
+ frequency_penalty=0.1,
30
+ repetition_penalty=1.1,
31
+ max_tokens=200,
32
+ ignore_eos=True,
33
+ skip_special_tokens=False,
34
+ spaces_between_special_tokens=False,
35
+ stop=["<|im_end|>"],
36
+ include_stop_str_in_output=False,
37
+ )
38
+
39
+ # Set max_model_len to fit in memory.
40
+ model = LLM(
41
+ "dreamgen/opus-v1.2-7b",
42
+ max_model_len=2000,
43
+ enforce_eager=True,
44
+ swap_space=0,
45
+ gpu_memory_utilization=0.85,
46
+ )
47
+
48
+ plot_description = """
49
+ This is a fanfiction from the Harry Potter universe. In this alternate reality, Harry Potter is evil and secretly siding with Slytherin.
50
+ Up until now, Harry was pretending to be friends with Hermione and Ron, that changes when he invites Hermione to his chambers where he tricks her to drink Amorentia, the most powerful love potion.
51
+ """
52
+
53
+ char1 = OpusV1Character(
54
+ name="Harry Potter",
55
+ description="""Harry Potter in this fanfiction is secretly a member of Slytherin and is using his powers for evil rather than for good. Up until now, he was pretending to be friends with Hermione and Ron.""",
56
+ )
57
+ char2 = OpusV1Character(
58
+ name="Hermione Granger",
59
+ description="""Hermione appears just like in the original books.""",
60
+ )
61
+
62
+ story_prompt = OpusV1StorySystemPrompt(
63
+ plot_description=plot_description,
64
+ style_description="",
65
+ characters=[char1, char2],
66
+ )
67
+
68
+ turns = [
69
+ OpusV1Turn(
70
+ role="user",
71
+ content="""Harry invites Hermione into his chamber and offers her water, which Hermione happily accepts and drinks.""".strip(),
72
+ ),
73
+ OpusV1Turn(
74
+ role="text",
75
+ names=[char1.name],
76
+ content="""“Come in,” said Harry, waving at the doorway behind Hermione’s back.""".strip(),
77
+ ),
78
+ ]
79
+
80
+ def run():
81
+ turns.append(OpusV1Turn(role="text", content="", names=[char2.name], open=True))
82
+
83
+ prompt = OpusV1Prompt(story=story_prompt, turns=turns)
84
+
85
+ output = model.generate(
86
+ format_opus_v1_prompt(prompt), sampling_params, use_tqdm=False
87
+ )
88
+
89
+ response = OpusV1Turn(
90
+ role="text", content=output[0].outputs[0].text.strip(), names=[char2.name]
91
+ )
92
+ turns.append(response)
93
+ print(pretty_turn(response), flush=True)
94
+ print(f"[{char1.name}]: ", end="", flush=True)
95
+
96
+ print("## Plot description:\n")
97
+ print(plot_description.strip() + "\n\n")
98
+
99
+ for turn in turns:
100
+ print(pretty_turn(turn))
101
+
102
+ run()
103
+
104
+ for line in fileinput.input():
105
+ line = line.strip()
106
+ if line.startswith("/ins"):
107
+ content = line[4:].strip()
108
+ role = "user"
109
+ names = []
110
+ else:
111
+ content = line
112
+ role = "text"
113
+ names = [char1.name]
114
+
115
+ turns.append(OpusV1Turn(role=role, content=content, names=names))
116
+ run()
117
+
118
+
119
+ def pretty_turn(turn):
120
+ if turn.role == "user":
121
+ return f"/ins {turn.content.strip()}"
122
+ else:
123
+ if len(turn.names) > 0:
124
+ return f"[{turn.names[0]}]: {turn.content.strip()}"
125
+ else:
126
+ return turn.content.strip()
127
+
128
+
129
+ main()
example/prompt/__init__.py ADDED
File without changes
example/prompt/format.py ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # %%
2
+ from typing import Optional, List
3
+ from dataclasses import field, dataclass
4
+
5
+
6
+ @dataclass
7
+ class OpusV1Turn:
8
+ role: str
9
+ content: str
10
+ names: List[str] = field(default_factory=list)
11
+ # If set to true, will not append <|im_end|>, so the model will continue the turn.
12
+ # In RP you can for example use the following to force a specific character response:
13
+ # role="text"
14
+ # names=["Jack"]
15
+ # open="true"
16
+ open: bool = False
17
+
18
+
19
+ @dataclass
20
+ class OpusV1Character:
21
+ name: str
22
+ description: str
23
+
24
+
25
+ @dataclass
26
+ class OpusV1StorySystemPrompt:
27
+ format: str = "prose"
28
+ plot_description: str = ""
29
+ style_description: str = ""
30
+ characters: List[OpusV1Character] = field(default_factory=list)
31
+
32
+
33
+ @dataclass
34
+ class OpusV1Prompt:
35
+ story: Optional[OpusV1StorySystemPrompt] = None
36
+ turns: List[OpusV1Turn] = field(default_factory=list)
37
+
38
+
39
+ def format_opus_v1_prompt(prompt) -> str:
40
+ turns = prompt.turns
41
+ if prompt.story is not None:
42
+ system = format_opus_v1_system_prompt(prompt.story)
43
+ turns = [OpusV1Turn(role="system", content=system)] + turns
44
+
45
+ parts = []
46
+ for i, turn in enumerate(turns):
47
+ assert turn.role in ["user", "text", "system", "assistant"]
48
+ assert turn.role != "system" or i == 0
49
+
50
+ is_last = i == len(turns) - 1
51
+ open = is_last and turn.open
52
+ parts.append(format_turn(turn.role, turn.content, turn.names, open=open))
53
+ return "".join(parts)
54
+
55
+
56
+ def format_turn(
57
+ role: str, content: str, names: List[str] = [], open: bool = False
58
+ ) -> str:
59
+ im_start = "<|im_start|>"
60
+ im_end = "<|im_end|>"
61
+
62
+ body = im_start + role
63
+ if len(names) > 0:
64
+ body += f" names= {'; '.join(names)}"
65
+
66
+ body += "\n"
67
+ if open:
68
+ return body + content.lstrip()
69
+ else:
70
+ return body + content.strip() + im_end + "\n"
71
+
72
+
73
+ def format_opus_v1_system_prompt(prompt) -> str:
74
+ format_text = "story" if prompt.format == "prose" else "role-play"
75
+ system = f"""
76
+ You are an intelligent, skilled, versatile writer.
77
+
78
+ Your task is to write a {format_text} based on the information below.
79
+
80
+ Write the {format_text} as if it's a book.
81
+ """.strip()
82
+
83
+ if len(prompt.plot_description) > 0:
84
+ system += "\n\n\n## Plot description:\n\n"
85
+ system += prompt.plot_description.strip()
86
+ if len(prompt.style_description) > 0:
87
+ system += "\n\n\n## Style description:\n\n"
88
+ system += prompt.style_description.strip()
89
+ if len(prompt.characters) > 0:
90
+ system += "\n\n\n## Characters:\n\n"
91
+ for character in prompt.characters:
92
+ system += f"### {character.name}\n\n"
93
+ system += character.description.strip()
94
+ system += "\n\n"
95
+
96
+ return system.strip()
example/simple.py ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # python simple.py
2
+
3
+ # %%
4
+
5
+ from vllm import LLM, SamplingParams
6
+
7
+ from prompt.format import (
8
+ format_opus_v1_prompt,
9
+ OpusV1Character,
10
+ OpusV1Prompt,
11
+ OpusV1StorySystemPrompt,
12
+ OpusV1Turn,
13
+ )
14
+
15
+
16
+ # %%
17
+
18
+
19
+ def build_story_prompt() -> OpusV1Prompt:
20
+ plot_description = """
21
+ This is a fanfiction from the Harry Potter universe. In this alternate reality, Harry Potter is evil and secretly siding with Slytherin.
22
+ Up until now, Harry was pretending to be friends with Hermione and Ron, that changes when he invites Hermione to his chambers where he tricks her to drink Amorentia, the most powerful love potion.
23
+ """
24
+
25
+ harry_description = """
26
+ Harry Potter in this fanfiction is secretly a member of Slytherin and is using his powers for evil rather than for good. Up until now, he was pretending to be friends with Hermione and Ron.
27
+ """
28
+
29
+ hermione_description = """
30
+ Hermione appears just like in the original books.
31
+ """
32
+
33
+ story_prompt = OpusV1StorySystemPrompt(
34
+ plot_description=plot_description,
35
+ style_description="",
36
+ characters=[
37
+ OpusV1Character(name="Harry Potter", description=harry_description),
38
+ OpusV1Character(name="Hermione Granger", description=hermione_description),
39
+ ],
40
+ )
41
+
42
+ return OpusV1Prompt(
43
+ story=story_prompt,
44
+ turns=[
45
+ OpusV1Turn(
46
+ role="user",
47
+ content="""
48
+ The story starts with Harry welcoming Hermione into his chambers, who he invited there earlier that day. He offers her water to drink, but it contains a love potion.
49
+ """.strip(),
50
+ ),
51
+ OpusV1Turn(
52
+ role="text",
53
+ content="""
54
+ “Come in,” said Harry, waving at the doorway behind Hermione’s back.
55
+
56
+ “Hello?” she said, stepping inside, “what did you want me to come up here for?”
57
+
58
+ “Well, I thought we could get away from all the noise down there, have a chat about what we plan to do for Christmas…” Harry said, fumbling for words. He had never really been any good with girls. “But anyway, please, take a seat and let me get us some water!” he said, darting over to the sideboard.
59
+
60
+ He returned quickly with two glasses of water. Hermione took hers and thanked him, taking in a big gulp. As soon as she swallowed, Harry saw her eyes widen as her heart began beating wildly in her chest.
61
+
62
+ It worked! Harry thought, grinning to himself. Amorentia truly was the world’s best love potion, its effects lasting twice as long and being five times stronger.
63
+ """.strip(),
64
+ open=True,
65
+ ),
66
+ ],
67
+ )
68
+
69
+
70
+ def build_assistant_prompt() -> OpusV1Prompt:
71
+ return OpusV1Prompt(
72
+ turns=[
73
+ OpusV1Turn(
74
+ role="system",
75
+ content="You are an intelligent, knowledgeable, helpful, general-purpose assistant.",
76
+ ),
77
+ OpusV1Turn(
78
+ role="user",
79
+ content="Give me a sentence where every word begins with 'S'",
80
+ ),
81
+ ]
82
+ )
83
+
84
+
85
+ # %%
86
+
87
+
88
+ def main():
89
+ sampling_params = SamplingParams(
90
+ # I usually stay between 0.0 and 1.0, especially for the Yi models I found lower tends to be better.
91
+ # For assistant tasks, I usually use 0.0.
92
+ temperature=0.0,
93
+ min_p=0.05,
94
+ presence_penalty=0.1,
95
+ frequency_penalty=0.1,
96
+ repetition_penalty=1.1,
97
+ max_tokens=200,
98
+ ignore_eos=True,
99
+ skip_special_tokens=False,
100
+ spaces_between_special_tokens=False,
101
+ )
102
+
103
+ # Set max_model_len to fit in memory.
104
+ model = LLM(
105
+ "dreamgen/opus-v1.2-7b",
106
+ max_model_len=2000,
107
+ enforce_eager=True,
108
+ swap_space=0,
109
+ gpu_memory_utilization=0.85,
110
+ )
111
+
112
+ story_prompt = build_story_prompt()
113
+ print(format_opus_v1_prompt(story_prompt))
114
+
115
+ output = model.generate(format_opus_v1_prompt(story_prompt), sampling_params)
116
+ print(output[0].outputs[0].text)
117
+
118
+ # Expected:
119
+ """
120
+ It would make her fall deeply in love with him, and then he could use her to get what he wanted.
121
+
122
+ “Harry, what’s going on? You look so happy!” Hermione asked, smiling at him.
123
+
124
+ “Oh, well, I guess I am,” Harry replied, trying not to laugh. “I mean, I’ve always known that you were the one for me.”
125
+
126
+ “Really?” Hermione asked, blushing slightly. “I didn’t know that.”
127
+
128
+ “Yeah, I’ve always had feelings for you,” Harry said, leaning forward and placing his hand on top of hers. “And now that I’ve got you alone, I can finally tell you how much I care about you.”
129
+ """
130
+
131
+
132
+ main()
generation_config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "do_sample": true,
5
+ "eos_token_id": 2,
6
+ "pad_token_id": 0,
7
+ "transformers_version": "4.37.0"
8
+ }
images/logo-1024.png ADDED
images/logo.webp ADDED
images/role_playing.webp ADDED
images/role_playing_long.webp ADDED
images/story_writing.webp ADDED
images/token_count_cum__token_bucket.png ADDED
pytorch_model-00001-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3314f1344b6b88028ff9281ae85c69853ab26dffa7ed043b3f8ad37d9c0d2c93
3
+ size 4793138432
pytorch_model-00002-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:30d551a5a3f311f7e272f3c27223ea499ca760b35f8cf26c4cc0fbceaf8180c7
3
+ size 4756468272
pytorch_model-00003-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0dc2657ca82671e899ed13d2f84ac1e89e7c3be9c6ab63b5b27fcd95569cbe87
3
+ size 4991380008
pytorch_model-00004-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2bc5b25a194aea40b9b09f7e791031ee25721389a13fb8622f9838ba586b1c75
3
+ size 4756468372
pytorch_model-00005-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bd8adb5cb5554b43e5fa6d139bd2e97309409415e09d8b61d00f592ca4382b2f
3
+ size 4756468336
pytorch_model-00006-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ec61e59c58022f61bbb49eeae32543e0effd716698a3e67e57befedd73aa4bb
3
+ size 4991380072
pytorch_model-00007-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:da4592fad7f30b77971a658520870c328f55f32d87b96c88d29758056b4a484c
3
+ size 4756468372
pytorch_model-00008-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ed515144ea680f96bcea537f7036bb62fc7624af0a7dc1f974cd4fee5421283e
3
+ size 4756468336
pytorch_model-00009-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b8b0c175eac1e72072060a5c1dc8670d63d6ec1e0d4853fe2c1d8b1fd8489c44
3
+ size 4991380072
pytorch_model-00010-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cb6059590f5070584730aef697726b474e8c90a32d287e698d05564a9aa06329
3
+ size 4756468372
pytorch_model-00011-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:94c4d465e4eeb649f27aa4fe94619f60ca18a2f8c339c4883c03200fbeffe531
3
+ size 4756468336
pytorch_model-00012-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:94aff24aa2b92fc30e9e0b074b69cfe915fc052e82056aad3366ee499718dbd6
3
+ size 4991380072
pytorch_model-00013-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a7dc7e9f18f7597e5630415f240587c31d2c848731f92e61a91a96cd5bc5e452
3
+ size 4756468372
pytorch_model-00014-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:08084d3efc14c2f0d1faabd7ad8f4f45e37a576469c08c310e54d1d94ded63ae
3
+ size 4756468336
pytorch_model-00015-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b74def9921ae4c5fba3264c5527d728f7bca4432c64289e67f1dfad067c1c11d
3
+ size 1211150921
pytorch_model.bin.index.json ADDED
@@ -0,0 +1,550 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 68777834496
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "pytorch_model-00015-of-00015.bin",
7
+ "model.embed_tokens.weight": "pytorch_model-00001-of-00015.bin",
8
+ "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00015.bin",
9
+ "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00015.bin",
10
+ "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00015.bin",
11
+ "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00015.bin",
12
+ "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00015.bin",
13
+ "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00015.bin",
14
+ "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00015.bin",
15
+ "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00015.bin",
16
+ "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00015.bin",
17
+ "model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00015.bin",
18
+ "model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00015.bin",
19
+ "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00015.bin",
20
+ "model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00015.bin",
21
+ "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00015.bin",
22
+ "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00015.bin",
23
+ "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00015.bin",
24
+ "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00015.bin",
25
+ "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00015.bin",
26
+ "model.layers.10.input_layernorm.weight": "pytorch_model-00003-of-00015.bin",
27
+ "model.layers.10.mlp.down_proj.weight": "pytorch_model-00003-of-00015.bin",
28
+ "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00003-of-00015.bin",
29
+ "model.layers.10.mlp.up_proj.weight": "pytorch_model-00003-of-00015.bin",
30
+ "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00003-of-00015.bin",
31
+ "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00003-of-00015.bin",
32
+ "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00003-of-00015.bin",
33
+ "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00003-of-00015.bin",
34
+ "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00003-of-00015.bin",
35
+ "model.layers.11.input_layernorm.weight": "pytorch_model-00003-of-00015.bin",
36
+ "model.layers.11.mlp.down_proj.weight": "pytorch_model-00003-of-00015.bin",
37
+ "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00003-of-00015.bin",
38
+ "model.layers.11.mlp.up_proj.weight": "pytorch_model-00003-of-00015.bin",
39
+ "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00003-of-00015.bin",
40
+ "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00003-of-00015.bin",
41
+ "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00003-of-00015.bin",
42
+ "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00003-of-00015.bin",
43
+ "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00003-of-00015.bin",
44
+ "model.layers.12.input_layernorm.weight": "pytorch_model-00004-of-00015.bin",
45
+ "model.layers.12.mlp.down_proj.weight": "pytorch_model-00004-of-00015.bin",
46
+ "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00004-of-00015.bin",
47
+ "model.layers.12.mlp.up_proj.weight": "pytorch_model-00004-of-00015.bin",
48
+ "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00004-of-00015.bin",
49
+ "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00003-of-00015.bin",
50
+ "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00003-of-00015.bin",
51
+ "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00003-of-00015.bin",
52
+ "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00003-of-00015.bin",
53
+ "model.layers.13.input_layernorm.weight": "pytorch_model-00004-of-00015.bin",
54
+ "model.layers.13.mlp.down_proj.weight": "pytorch_model-00004-of-00015.bin",
55
+ "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00004-of-00015.bin",
56
+ "model.layers.13.mlp.up_proj.weight": "pytorch_model-00004-of-00015.bin",
57
+ "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00004-of-00015.bin",
58
+ "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00004-of-00015.bin",
59
+ "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00004-of-00015.bin",
60
+ "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00004-of-00015.bin",
61
+ "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00004-of-00015.bin",
62
+ "model.layers.14.input_layernorm.weight": "pytorch_model-00004-of-00015.bin",
63
+ "model.layers.14.mlp.down_proj.weight": "pytorch_model-00004-of-00015.bin",
64
+ "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00004-of-00015.bin",
65
+ "model.layers.14.mlp.up_proj.weight": "pytorch_model-00004-of-00015.bin",
66
+ "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00004-of-00015.bin",
67
+ "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00004-of-00015.bin",
68
+ "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00004-of-00015.bin",
69
+ "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00004-of-00015.bin",
70
+ "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00004-of-00015.bin",
71
+ "model.layers.15.input_layernorm.weight": "pytorch_model-00004-of-00015.bin",
72
+ "model.layers.15.mlp.down_proj.weight": "pytorch_model-00004-of-00015.bin",
73
+ "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00004-of-00015.bin",
74
+ "model.layers.15.mlp.up_proj.weight": "pytorch_model-00004-of-00015.bin",
75
+ "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00004-of-00015.bin",
76
+ "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00004-of-00015.bin",
77
+ "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00004-of-00015.bin",
78
+ "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00004-of-00015.bin",
79
+ "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00004-of-00015.bin",
80
+ "model.layers.16.input_layernorm.weight": "pytorch_model-00005-of-00015.bin",
81
+ "model.layers.16.mlp.down_proj.weight": "pytorch_model-00005-of-00015.bin",
82
+ "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00004-of-00015.bin",
83
+ "model.layers.16.mlp.up_proj.weight": "pytorch_model-00005-of-00015.bin",
84
+ "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00005-of-00015.bin",
85
+ "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00004-of-00015.bin",
86
+ "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00004-of-00015.bin",
87
+ "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00004-of-00015.bin",
88
+ "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00004-of-00015.bin",
89
+ "model.layers.17.input_layernorm.weight": "pytorch_model-00005-of-00015.bin",
90
+ "model.layers.17.mlp.down_proj.weight": "pytorch_model-00005-of-00015.bin",
91
+ "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00005-of-00015.bin",
92
+ "model.layers.17.mlp.up_proj.weight": "pytorch_model-00005-of-00015.bin",
93
+ "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00005-of-00015.bin",
94
+ "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00005-of-00015.bin",
95
+ "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00005-of-00015.bin",
96
+ "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00005-of-00015.bin",
97
+ "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00005-of-00015.bin",
98
+ "model.layers.18.input_layernorm.weight": "pytorch_model-00005-of-00015.bin",
99
+ "model.layers.18.mlp.down_proj.weight": "pytorch_model-00005-of-00015.bin",
100
+ "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00005-of-00015.bin",
101
+ "model.layers.18.mlp.up_proj.weight": "pytorch_model-00005-of-00015.bin",
102
+ "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00005-of-00015.bin",
103
+ "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00005-of-00015.bin",
104
+ "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00005-of-00015.bin",
105
+ "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00005-of-00015.bin",
106
+ "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00005-of-00015.bin",
107
+ "model.layers.19.input_layernorm.weight": "pytorch_model-00005-of-00015.bin",
108
+ "model.layers.19.mlp.down_proj.weight": "pytorch_model-00005-of-00015.bin",
109
+ "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00005-of-00015.bin",
110
+ "model.layers.19.mlp.up_proj.weight": "pytorch_model-00005-of-00015.bin",
111
+ "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00005-of-00015.bin",
112
+ "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00005-of-00015.bin",
113
+ "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00005-of-00015.bin",
114
+ "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00005-of-00015.bin",
115
+ "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00005-of-00015.bin",
116
+ "model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00015.bin",
117
+ "model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00015.bin",
118
+ "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00015.bin",
119
+ "model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00015.bin",
120
+ "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00015.bin",
121
+ "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00015.bin",
122
+ "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00015.bin",
123
+ "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00015.bin",
124
+ "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00015.bin",
125
+ "model.layers.20.input_layernorm.weight": "pytorch_model-00006-of-00015.bin",
126
+ "model.layers.20.mlp.down_proj.weight": "pytorch_model-00006-of-00015.bin",
127
+ "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00005-of-00015.bin",
128
+ "model.layers.20.mlp.up_proj.weight": "pytorch_model-00005-of-00015.bin",
129
+ "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00006-of-00015.bin",
130
+ "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00005-of-00015.bin",
131
+ "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00005-of-00015.bin",
132
+ "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00005-of-00015.bin",
133
+ "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00005-of-00015.bin",
134
+ "model.layers.21.input_layernorm.weight": "pytorch_model-00006-of-00015.bin",
135
+ "model.layers.21.mlp.down_proj.weight": "pytorch_model-00006-of-00015.bin",
136
+ "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00006-of-00015.bin",
137
+ "model.layers.21.mlp.up_proj.weight": "pytorch_model-00006-of-00015.bin",
138
+ "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00006-of-00015.bin",
139
+ "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00006-of-00015.bin",
140
+ "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00006-of-00015.bin",
141
+ "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00006-of-00015.bin",
142
+ "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00006-of-00015.bin",
143
+ "model.layers.22.input_layernorm.weight": "pytorch_model-00006-of-00015.bin",
144
+ "model.layers.22.mlp.down_proj.weight": "pytorch_model-00006-of-00015.bin",
145
+ "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00006-of-00015.bin",
146
+ "model.layers.22.mlp.up_proj.weight": "pytorch_model-00006-of-00015.bin",
147
+ "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00006-of-00015.bin",
148
+ "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00006-of-00015.bin",
149
+ "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00006-of-00015.bin",
150
+ "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00006-of-00015.bin",
151
+ "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00006-of-00015.bin",
152
+ "model.layers.23.input_layernorm.weight": "pytorch_model-00006-of-00015.bin",
153
+ "model.layers.23.mlp.down_proj.weight": "pytorch_model-00006-of-00015.bin",
154
+ "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00006-of-00015.bin",
155
+ "model.layers.23.mlp.up_proj.weight": "pytorch_model-00006-of-00015.bin",
156
+ "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00006-of-00015.bin",
157
+ "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00006-of-00015.bin",
158
+ "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00006-of-00015.bin",
159
+ "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00006-of-00015.bin",
160
+ "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00006-of-00015.bin",
161
+ "model.layers.24.input_layernorm.weight": "pytorch_model-00006-of-00015.bin",
162
+ "model.layers.24.mlp.down_proj.weight": "pytorch_model-00006-of-00015.bin",
163
+ "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00006-of-00015.bin",
164
+ "model.layers.24.mlp.up_proj.weight": "pytorch_model-00006-of-00015.bin",
165
+ "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00006-of-00015.bin",
166
+ "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00006-of-00015.bin",
167
+ "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00006-of-00015.bin",
168
+ "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00006-of-00015.bin",
169
+ "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00006-of-00015.bin",
170
+ "model.layers.25.input_layernorm.weight": "pytorch_model-00007-of-00015.bin",
171
+ "model.layers.25.mlp.down_proj.weight": "pytorch_model-00007-of-00015.bin",
172
+ "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00007-of-00015.bin",
173
+ "model.layers.25.mlp.up_proj.weight": "pytorch_model-00007-of-00015.bin",
174
+ "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00007-of-00015.bin",
175
+ "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00006-of-00015.bin",
176
+ "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00006-of-00015.bin",
177
+ "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00006-of-00015.bin",
178
+ "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00006-of-00015.bin",
179
+ "model.layers.26.input_layernorm.weight": "pytorch_model-00007-of-00015.bin",
180
+ "model.layers.26.mlp.down_proj.weight": "pytorch_model-00007-of-00015.bin",
181
+ "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00007-of-00015.bin",
182
+ "model.layers.26.mlp.up_proj.weight": "pytorch_model-00007-of-00015.bin",
183
+ "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00007-of-00015.bin",
184
+ "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00007-of-00015.bin",
185
+ "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00007-of-00015.bin",
186
+ "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00007-of-00015.bin",
187
+ "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00007-of-00015.bin",
188
+ "model.layers.27.input_layernorm.weight": "pytorch_model-00007-of-00015.bin",
189
+ "model.layers.27.mlp.down_proj.weight": "pytorch_model-00007-of-00015.bin",
190
+ "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00007-of-00015.bin",
191
+ "model.layers.27.mlp.up_proj.weight": "pytorch_model-00007-of-00015.bin",
192
+ "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00007-of-00015.bin",
193
+ "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00007-of-00015.bin",
194
+ "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00007-of-00015.bin",
195
+ "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00007-of-00015.bin",
196
+ "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00007-of-00015.bin",
197
+ "model.layers.28.input_layernorm.weight": "pytorch_model-00007-of-00015.bin",
198
+ "model.layers.28.mlp.down_proj.weight": "pytorch_model-00007-of-00015.bin",
199
+ "model.layers.28.mlp.gate_proj.weight": "pytorch_model-00007-of-00015.bin",
200
+ "model.layers.28.mlp.up_proj.weight": "pytorch_model-00007-of-00015.bin",
201
+ "model.layers.28.post_attention_layernorm.weight": "pytorch_model-00007-of-00015.bin",
202
+ "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00007-of-00015.bin",
203
+ "model.layers.28.self_attn.o_proj.weight": "pytorch_model-00007-of-00015.bin",
204
+ "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00007-of-00015.bin",
205
+ "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00007-of-00015.bin",
206
+ "model.layers.29.input_layernorm.weight": "pytorch_model-00008-of-00015.bin",
207
+ "model.layers.29.mlp.down_proj.weight": "pytorch_model-00008-of-00015.bin",
208
+ "model.layers.29.mlp.gate_proj.weight": "pytorch_model-00007-of-00015.bin",
209
+ "model.layers.29.mlp.up_proj.weight": "pytorch_model-00008-of-00015.bin",
210
+ "model.layers.29.post_attention_layernorm.weight": "pytorch_model-00008-of-00015.bin",
211
+ "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00007-of-00015.bin",
212
+ "model.layers.29.self_attn.o_proj.weight": "pytorch_model-00007-of-00015.bin",
213
+ "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00007-of-00015.bin",
214
+ "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00007-of-00015.bin",
215
+ "model.layers.3.input_layernorm.weight": "pytorch_model-00002-of-00015.bin",
216
+ "model.layers.3.mlp.down_proj.weight": "pytorch_model-00002-of-00015.bin",
217
+ "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00015.bin",
218
+ "model.layers.3.mlp.up_proj.weight": "pytorch_model-00002-of-00015.bin",
219
+ "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00002-of-00015.bin",
220
+ "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00015.bin",
221
+ "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00015.bin",
222
+ "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00015.bin",
223
+ "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00015.bin",
224
+ "model.layers.30.input_layernorm.weight": "pytorch_model-00008-of-00015.bin",
225
+ "model.layers.30.mlp.down_proj.weight": "pytorch_model-00008-of-00015.bin",
226
+ "model.layers.30.mlp.gate_proj.weight": "pytorch_model-00008-of-00015.bin",
227
+ "model.layers.30.mlp.up_proj.weight": "pytorch_model-00008-of-00015.bin",
228
+ "model.layers.30.post_attention_layernorm.weight": "pytorch_model-00008-of-00015.bin",
229
+ "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00008-of-00015.bin",
230
+ "model.layers.30.self_attn.o_proj.weight": "pytorch_model-00008-of-00015.bin",
231
+ "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00008-of-00015.bin",
232
+ "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00008-of-00015.bin",
233
+ "model.layers.31.input_layernorm.weight": "pytorch_model-00008-of-00015.bin",
234
+ "model.layers.31.mlp.down_proj.weight": "pytorch_model-00008-of-00015.bin",
235
+ "model.layers.31.mlp.gate_proj.weight": "pytorch_model-00008-of-00015.bin",
236
+ "model.layers.31.mlp.up_proj.weight": "pytorch_model-00008-of-00015.bin",
237
+ "model.layers.31.post_attention_layernorm.weight": "pytorch_model-00008-of-00015.bin",
238
+ "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00008-of-00015.bin",
239
+ "model.layers.31.self_attn.o_proj.weight": "pytorch_model-00008-of-00015.bin",
240
+ "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00008-of-00015.bin",
241
+ "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00008-of-00015.bin",
242
+ "model.layers.32.input_layernorm.weight": "pytorch_model-00008-of-00015.bin",
243
+ "model.layers.32.mlp.down_proj.weight": "pytorch_model-00008-of-00015.bin",
244
+ "model.layers.32.mlp.gate_proj.weight": "pytorch_model-00008-of-00015.bin",
245
+ "model.layers.32.mlp.up_proj.weight": "pytorch_model-00008-of-00015.bin",
246
+ "model.layers.32.post_attention_layernorm.weight": "pytorch_model-00008-of-00015.bin",
247
+ "model.layers.32.self_attn.k_proj.weight": "pytorch_model-00008-of-00015.bin",
248
+ "model.layers.32.self_attn.o_proj.weight": "pytorch_model-00008-of-00015.bin",
249
+ "model.layers.32.self_attn.q_proj.weight": "pytorch_model-00008-of-00015.bin",
250
+ "model.layers.32.self_attn.v_proj.weight": "pytorch_model-00008-of-00015.bin",
251
+ "model.layers.33.input_layernorm.weight": "pytorch_model-00009-of-00015.bin",
252
+ "model.layers.33.mlp.down_proj.weight": "pytorch_model-00009-of-00015.bin",
253
+ "model.layers.33.mlp.gate_proj.weight": "pytorch_model-00008-of-00015.bin",
254
+ "model.layers.33.mlp.up_proj.weight": "pytorch_model-00008-of-00015.bin",
255
+ "model.layers.33.post_attention_layernorm.weight": "pytorch_model-00009-of-00015.bin",
256
+ "model.layers.33.self_attn.k_proj.weight": "pytorch_model-00008-of-00015.bin",
257
+ "model.layers.33.self_attn.o_proj.weight": "pytorch_model-00008-of-00015.bin",
258
+ "model.layers.33.self_attn.q_proj.weight": "pytorch_model-00008-of-00015.bin",
259
+ "model.layers.33.self_attn.v_proj.weight": "pytorch_model-00008-of-00015.bin",
260
+ "model.layers.34.input_layernorm.weight": "pytorch_model-00009-of-00015.bin",
261
+ "model.layers.34.mlp.down_proj.weight": "pytorch_model-00009-of-00015.bin",
262
+ "model.layers.34.mlp.gate_proj.weight": "pytorch_model-00009-of-00015.bin",
263
+ "model.layers.34.mlp.up_proj.weight": "pytorch_model-00009-of-00015.bin",
264
+ "model.layers.34.post_attention_layernorm.weight": "pytorch_model-00009-of-00015.bin",
265
+ "model.layers.34.self_attn.k_proj.weight": "pytorch_model-00009-of-00015.bin",
266
+ "model.layers.34.self_attn.o_proj.weight": "pytorch_model-00009-of-00015.bin",
267
+ "model.layers.34.self_attn.q_proj.weight": "pytorch_model-00009-of-00015.bin",
268
+ "model.layers.34.self_attn.v_proj.weight": "pytorch_model-00009-of-00015.bin",
269
+ "model.layers.35.input_layernorm.weight": "pytorch_model-00009-of-00015.bin",
270
+ "model.layers.35.mlp.down_proj.weight": "pytorch_model-00009-of-00015.bin",
271
+ "model.layers.35.mlp.gate_proj.weight": "pytorch_model-00009-of-00015.bin",
272
+ "model.layers.35.mlp.up_proj.weight": "pytorch_model-00009-of-00015.bin",
273
+ "model.layers.35.post_attention_layernorm.weight": "pytorch_model-00009-of-00015.bin",
274
+ "model.layers.35.self_attn.k_proj.weight": "pytorch_model-00009-of-00015.bin",
275
+ "model.layers.35.self_attn.o_proj.weight": "pytorch_model-00009-of-00015.bin",
276
+ "model.layers.35.self_attn.q_proj.weight": "pytorch_model-00009-of-00015.bin",
277
+ "model.layers.35.self_attn.v_proj.weight": "pytorch_model-00009-of-00015.bin",
278
+ "model.layers.36.input_layernorm.weight": "pytorch_model-00009-of-00015.bin",
279
+ "model.layers.36.mlp.down_proj.weight": "pytorch_model-00009-of-00015.bin",
280
+ "model.layers.36.mlp.gate_proj.weight": "pytorch_model-00009-of-00015.bin",
281
+ "model.layers.36.mlp.up_proj.weight": "pytorch_model-00009-of-00015.bin",
282
+ "model.layers.36.post_attention_layernorm.weight": "pytorch_model-00009-of-00015.bin",
283
+ "model.layers.36.self_attn.k_proj.weight": "pytorch_model-00009-of-00015.bin",
284
+ "model.layers.36.self_attn.o_proj.weight": "pytorch_model-00009-of-00015.bin",
285
+ "model.layers.36.self_attn.q_proj.weight": "pytorch_model-00009-of-00015.bin",
286
+ "model.layers.36.self_attn.v_proj.weight": "pytorch_model-00009-of-00015.bin",
287
+ "model.layers.37.input_layernorm.weight": "pytorch_model-00009-of-00015.bin",
288
+ "model.layers.37.mlp.down_proj.weight": "pytorch_model-00009-of-00015.bin",
289
+ "model.layers.37.mlp.gate_proj.weight": "pytorch_model-00009-of-00015.bin",
290
+ "model.layers.37.mlp.up_proj.weight": "pytorch_model-00009-of-00015.bin",
291
+ "model.layers.37.post_attention_layernorm.weight": "pytorch_model-00009-of-00015.bin",
292
+ "model.layers.37.self_attn.k_proj.weight": "pytorch_model-00009-of-00015.bin",
293
+ "model.layers.37.self_attn.o_proj.weight": "pytorch_model-00009-of-00015.bin",
294
+ "model.layers.37.self_attn.q_proj.weight": "pytorch_model-00009-of-00015.bin",
295
+ "model.layers.37.self_attn.v_proj.weight": "pytorch_model-00009-of-00015.bin",
296
+ "model.layers.38.input_layernorm.weight": "pytorch_model-00010-of-00015.bin",
297
+ "model.layers.38.mlp.down_proj.weight": "pytorch_model-00010-of-00015.bin",
298
+ "model.layers.38.mlp.gate_proj.weight": "pytorch_model-00010-of-00015.bin",
299
+ "model.layers.38.mlp.up_proj.weight": "pytorch_model-00010-of-00015.bin",
300
+ "model.layers.38.post_attention_layernorm.weight": "pytorch_model-00010-of-00015.bin",
301
+ "model.layers.38.self_attn.k_proj.weight": "pytorch_model-00009-of-00015.bin",
302
+ "model.layers.38.self_attn.o_proj.weight": "pytorch_model-00009-of-00015.bin",
303
+ "model.layers.38.self_attn.q_proj.weight": "pytorch_model-00009-of-00015.bin",
304
+ "model.layers.38.self_attn.v_proj.weight": "pytorch_model-00009-of-00015.bin",
305
+ "model.layers.39.input_layernorm.weight": "pytorch_model-00010-of-00015.bin",
306
+ "model.layers.39.mlp.down_proj.weight": "pytorch_model-00010-of-00015.bin",
307
+ "model.layers.39.mlp.gate_proj.weight": "pytorch_model-00010-of-00015.bin",
308
+ "model.layers.39.mlp.up_proj.weight": "pytorch_model-00010-of-00015.bin",
309
+ "model.layers.39.post_attention_layernorm.weight": "pytorch_model-00010-of-00015.bin",
310
+ "model.layers.39.self_attn.k_proj.weight": "pytorch_model-00010-of-00015.bin",
311
+ "model.layers.39.self_attn.o_proj.weight": "pytorch_model-00010-of-00015.bin",
312
+ "model.layers.39.self_attn.q_proj.weight": "pytorch_model-00010-of-00015.bin",
313
+ "model.layers.39.self_attn.v_proj.weight": "pytorch_model-00010-of-00015.bin",
314
+ "model.layers.4.input_layernorm.weight": "pytorch_model-00002-of-00015.bin",
315
+ "model.layers.4.mlp.down_proj.weight": "pytorch_model-00002-of-00015.bin",
316
+ "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00002-of-00015.bin",
317
+ "model.layers.4.mlp.up_proj.weight": "pytorch_model-00002-of-00015.bin",
318
+ "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00002-of-00015.bin",
319
+ "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00002-of-00015.bin",
320
+ "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00002-of-00015.bin",
321
+ "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00002-of-00015.bin",
322
+ "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00002-of-00015.bin",
323
+ "model.layers.40.input_layernorm.weight": "pytorch_model-00010-of-00015.bin",
324
+ "model.layers.40.mlp.down_proj.weight": "pytorch_model-00010-of-00015.bin",
325
+ "model.layers.40.mlp.gate_proj.weight": "pytorch_model-00010-of-00015.bin",
326
+ "model.layers.40.mlp.up_proj.weight": "pytorch_model-00010-of-00015.bin",
327
+ "model.layers.40.post_attention_layernorm.weight": "pytorch_model-00010-of-00015.bin",
328
+ "model.layers.40.self_attn.k_proj.weight": "pytorch_model-00010-of-00015.bin",
329
+ "model.layers.40.self_attn.o_proj.weight": "pytorch_model-00010-of-00015.bin",
330
+ "model.layers.40.self_attn.q_proj.weight": "pytorch_model-00010-of-00015.bin",
331
+ "model.layers.40.self_attn.v_proj.weight": "pytorch_model-00010-of-00015.bin",
332
+ "model.layers.41.input_layernorm.weight": "pytorch_model-00010-of-00015.bin",
333
+ "model.layers.41.mlp.down_proj.weight": "pytorch_model-00010-of-00015.bin",
334
+ "model.layers.41.mlp.gate_proj.weight": "pytorch_model-00010-of-00015.bin",
335
+ "model.layers.41.mlp.up_proj.weight": "pytorch_model-00010-of-00015.bin",
336
+ "model.layers.41.post_attention_layernorm.weight": "pytorch_model-00010-of-00015.bin",
337
+ "model.layers.41.self_attn.k_proj.weight": "pytorch_model-00010-of-00015.bin",
338
+ "model.layers.41.self_attn.o_proj.weight": "pytorch_model-00010-of-00015.bin",
339
+ "model.layers.41.self_attn.q_proj.weight": "pytorch_model-00010-of-00015.bin",
340
+ "model.layers.41.self_attn.v_proj.weight": "pytorch_model-00010-of-00015.bin",
341
+ "model.layers.42.input_layernorm.weight": "pytorch_model-00011-of-00015.bin",
342
+ "model.layers.42.mlp.down_proj.weight": "pytorch_model-00011-of-00015.bin",
343
+ "model.layers.42.mlp.gate_proj.weight": "pytorch_model-00010-of-00015.bin",
344
+ "model.layers.42.mlp.up_proj.weight": "pytorch_model-00011-of-00015.bin",
345
+ "model.layers.42.post_attention_layernorm.weight": "pytorch_model-00011-of-00015.bin",
346
+ "model.layers.42.self_attn.k_proj.weight": "pytorch_model-00010-of-00015.bin",
347
+ "model.layers.42.self_attn.o_proj.weight": "pytorch_model-00010-of-00015.bin",
348
+ "model.layers.42.self_attn.q_proj.weight": "pytorch_model-00010-of-00015.bin",
349
+ "model.layers.42.self_attn.v_proj.weight": "pytorch_model-00010-of-00015.bin",
350
+ "model.layers.43.input_layernorm.weight": "pytorch_model-00011-of-00015.bin",
351
+ "model.layers.43.mlp.down_proj.weight": "pytorch_model-00011-of-00015.bin",
352
+ "model.layers.43.mlp.gate_proj.weight": "pytorch_model-00011-of-00015.bin",
353
+ "model.layers.43.mlp.up_proj.weight": "pytorch_model-00011-of-00015.bin",
354
+ "model.layers.43.post_attention_layernorm.weight": "pytorch_model-00011-of-00015.bin",
355
+ "model.layers.43.self_attn.k_proj.weight": "pytorch_model-00011-of-00015.bin",
356
+ "model.layers.43.self_attn.o_proj.weight": "pytorch_model-00011-of-00015.bin",
357
+ "model.layers.43.self_attn.q_proj.weight": "pytorch_model-00011-of-00015.bin",
358
+ "model.layers.43.self_attn.v_proj.weight": "pytorch_model-00011-of-00015.bin",
359
+ "model.layers.44.input_layernorm.weight": "pytorch_model-00011-of-00015.bin",
360
+ "model.layers.44.mlp.down_proj.weight": "pytorch_model-00011-of-00015.bin",
361
+ "model.layers.44.mlp.gate_proj.weight": "pytorch_model-00011-of-00015.bin",
362
+ "model.layers.44.mlp.up_proj.weight": "pytorch_model-00011-of-00015.bin",
363
+ "model.layers.44.post_attention_layernorm.weight": "pytorch_model-00011-of-00015.bin",
364
+ "model.layers.44.self_attn.k_proj.weight": "pytorch_model-00011-of-00015.bin",
365
+ "model.layers.44.self_attn.o_proj.weight": "pytorch_model-00011-of-00015.bin",
366
+ "model.layers.44.self_attn.q_proj.weight": "pytorch_model-00011-of-00015.bin",
367
+ "model.layers.44.self_attn.v_proj.weight": "pytorch_model-00011-of-00015.bin",
368
+ "model.layers.45.input_layernorm.weight": "pytorch_model-00011-of-00015.bin",
369
+ "model.layers.45.mlp.down_proj.weight": "pytorch_model-00011-of-00015.bin",
370
+ "model.layers.45.mlp.gate_proj.weight": "pytorch_model-00011-of-00015.bin",
371
+ "model.layers.45.mlp.up_proj.weight": "pytorch_model-00011-of-00015.bin",
372
+ "model.layers.45.post_attention_layernorm.weight": "pytorch_model-00011-of-00015.bin",
373
+ "model.layers.45.self_attn.k_proj.weight": "pytorch_model-00011-of-00015.bin",
374
+ "model.layers.45.self_attn.o_proj.weight": "pytorch_model-00011-of-00015.bin",
375
+ "model.layers.45.self_attn.q_proj.weight": "pytorch_model-00011-of-00015.bin",
376
+ "model.layers.45.self_attn.v_proj.weight": "pytorch_model-00011-of-00015.bin",
377
+ "model.layers.46.input_layernorm.weight": "pytorch_model-00012-of-00015.bin",
378
+ "model.layers.46.mlp.down_proj.weight": "pytorch_model-00012-of-00015.bin",
379
+ "model.layers.46.mlp.gate_proj.weight": "pytorch_model-00011-of-00015.bin",
380
+ "model.layers.46.mlp.up_proj.weight": "pytorch_model-00011-of-00015.bin",
381
+ "model.layers.46.post_attention_layernorm.weight": "pytorch_model-00012-of-00015.bin",
382
+ "model.layers.46.self_attn.k_proj.weight": "pytorch_model-00011-of-00015.bin",
383
+ "model.layers.46.self_attn.o_proj.weight": "pytorch_model-00011-of-00015.bin",
384
+ "model.layers.46.self_attn.q_proj.weight": "pytorch_model-00011-of-00015.bin",
385
+ "model.layers.46.self_attn.v_proj.weight": "pytorch_model-00011-of-00015.bin",
386
+ "model.layers.47.input_layernorm.weight": "pytorch_model-00012-of-00015.bin",
387
+ "model.layers.47.mlp.down_proj.weight": "pytorch_model-00012-of-00015.bin",
388
+ "model.layers.47.mlp.gate_proj.weight": "pytorch_model-00012-of-00015.bin",
389
+ "model.layers.47.mlp.up_proj.weight": "pytorch_model-00012-of-00015.bin",
390
+ "model.layers.47.post_attention_layernorm.weight": "pytorch_model-00012-of-00015.bin",
391
+ "model.layers.47.self_attn.k_proj.weight": "pytorch_model-00012-of-00015.bin",
392
+ "model.layers.47.self_attn.o_proj.weight": "pytorch_model-00012-of-00015.bin",
393
+ "model.layers.47.self_attn.q_proj.weight": "pytorch_model-00012-of-00015.bin",
394
+ "model.layers.47.self_attn.v_proj.weight": "pytorch_model-00012-of-00015.bin",
395
+ "model.layers.48.input_layernorm.weight": "pytorch_model-00012-of-00015.bin",
396
+ "model.layers.48.mlp.down_proj.weight": "pytorch_model-00012-of-00015.bin",
397
+ "model.layers.48.mlp.gate_proj.weight": "pytorch_model-00012-of-00015.bin",
398
+ "model.layers.48.mlp.up_proj.weight": "pytorch_model-00012-of-00015.bin",
399
+ "model.layers.48.post_attention_layernorm.weight": "pytorch_model-00012-of-00015.bin",
400
+ "model.layers.48.self_attn.k_proj.weight": "pytorch_model-00012-of-00015.bin",
401
+ "model.layers.48.self_attn.o_proj.weight": "pytorch_model-00012-of-00015.bin",
402
+ "model.layers.48.self_attn.q_proj.weight": "pytorch_model-00012-of-00015.bin",
403
+ "model.layers.48.self_attn.v_proj.weight": "pytorch_model-00012-of-00015.bin",
404
+ "model.layers.49.input_layernorm.weight": "pytorch_model-00012-of-00015.bin",
405
+ "model.layers.49.mlp.down_proj.weight": "pytorch_model-00012-of-00015.bin",
406
+ "model.layers.49.mlp.gate_proj.weight": "pytorch_model-00012-of-00015.bin",
407
+ "model.layers.49.mlp.up_proj.weight": "pytorch_model-00012-of-00015.bin",
408
+ "model.layers.49.post_attention_layernorm.weight": "pytorch_model-00012-of-00015.bin",
409
+ "model.layers.49.self_attn.k_proj.weight": "pytorch_model-00012-of-00015.bin",
410
+ "model.layers.49.self_attn.o_proj.weight": "pytorch_model-00012-of-00015.bin",
411
+ "model.layers.49.self_attn.q_proj.weight": "pytorch_model-00012-of-00015.bin",
412
+ "model.layers.49.self_attn.v_proj.weight": "pytorch_model-00012-of-00015.bin",
413
+ "model.layers.5.input_layernorm.weight": "pytorch_model-00002-of-00015.bin",
414
+ "model.layers.5.mlp.down_proj.weight": "pytorch_model-00002-of-00015.bin",
415
+ "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00002-of-00015.bin",
416
+ "model.layers.5.mlp.up_proj.weight": "pytorch_model-00002-of-00015.bin",
417
+ "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00002-of-00015.bin",
418
+ "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00002-of-00015.bin",
419
+ "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00002-of-00015.bin",
420
+ "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00002-of-00015.bin",
421
+ "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00002-of-00015.bin",
422
+ "model.layers.50.input_layernorm.weight": "pytorch_model-00012-of-00015.bin",
423
+ "model.layers.50.mlp.down_proj.weight": "pytorch_model-00012-of-00015.bin",
424
+ "model.layers.50.mlp.gate_proj.weight": "pytorch_model-00012-of-00015.bin",
425
+ "model.layers.50.mlp.up_proj.weight": "pytorch_model-00012-of-00015.bin",
426
+ "model.layers.50.post_attention_layernorm.weight": "pytorch_model-00012-of-00015.bin",
427
+ "model.layers.50.self_attn.k_proj.weight": "pytorch_model-00012-of-00015.bin",
428
+ "model.layers.50.self_attn.o_proj.weight": "pytorch_model-00012-of-00015.bin",
429
+ "model.layers.50.self_attn.q_proj.weight": "pytorch_model-00012-of-00015.bin",
430
+ "model.layers.50.self_attn.v_proj.weight": "pytorch_model-00012-of-00015.bin",
431
+ "model.layers.51.input_layernorm.weight": "pytorch_model-00013-of-00015.bin",
432
+ "model.layers.51.mlp.down_proj.weight": "pytorch_model-00013-of-00015.bin",
433
+ "model.layers.51.mlp.gate_proj.weight": "pytorch_model-00013-of-00015.bin",
434
+ "model.layers.51.mlp.up_proj.weight": "pytorch_model-00013-of-00015.bin",
435
+ "model.layers.51.post_attention_layernorm.weight": "pytorch_model-00013-of-00015.bin",
436
+ "model.layers.51.self_attn.k_proj.weight": "pytorch_model-00012-of-00015.bin",
437
+ "model.layers.51.self_attn.o_proj.weight": "pytorch_model-00012-of-00015.bin",
438
+ "model.layers.51.self_attn.q_proj.weight": "pytorch_model-00012-of-00015.bin",
439
+ "model.layers.51.self_attn.v_proj.weight": "pytorch_model-00012-of-00015.bin",
440
+ "model.layers.52.input_layernorm.weight": "pytorch_model-00013-of-00015.bin",
441
+ "model.layers.52.mlp.down_proj.weight": "pytorch_model-00013-of-00015.bin",
442
+ "model.layers.52.mlp.gate_proj.weight": "pytorch_model-00013-of-00015.bin",
443
+ "model.layers.52.mlp.up_proj.weight": "pytorch_model-00013-of-00015.bin",
444
+ "model.layers.52.post_attention_layernorm.weight": "pytorch_model-00013-of-00015.bin",
445
+ "model.layers.52.self_attn.k_proj.weight": "pytorch_model-00013-of-00015.bin",
446
+ "model.layers.52.self_attn.o_proj.weight": "pytorch_model-00013-of-00015.bin",
447
+ "model.layers.52.self_attn.q_proj.weight": "pytorch_model-00013-of-00015.bin",
448
+ "model.layers.52.self_attn.v_proj.weight": "pytorch_model-00013-of-00015.bin",
449
+ "model.layers.53.input_layernorm.weight": "pytorch_model-00013-of-00015.bin",
450
+ "model.layers.53.mlp.down_proj.weight": "pytorch_model-00013-of-00015.bin",
451
+ "model.layers.53.mlp.gate_proj.weight": "pytorch_model-00013-of-00015.bin",
452
+ "model.layers.53.mlp.up_proj.weight": "pytorch_model-00013-of-00015.bin",
453
+ "model.layers.53.post_attention_layernorm.weight": "pytorch_model-00013-of-00015.bin",
454
+ "model.layers.53.self_attn.k_proj.weight": "pytorch_model-00013-of-00015.bin",
455
+ "model.layers.53.self_attn.o_proj.weight": "pytorch_model-00013-of-00015.bin",
456
+ "model.layers.53.self_attn.q_proj.weight": "pytorch_model-00013-of-00015.bin",
457
+ "model.layers.53.self_attn.v_proj.weight": "pytorch_model-00013-of-00015.bin",
458
+ "model.layers.54.input_layernorm.weight": "pytorch_model-00013-of-00015.bin",
459
+ "model.layers.54.mlp.down_proj.weight": "pytorch_model-00013-of-00015.bin",
460
+ "model.layers.54.mlp.gate_proj.weight": "pytorch_model-00013-of-00015.bin",
461
+ "model.layers.54.mlp.up_proj.weight": "pytorch_model-00013-of-00015.bin",
462
+ "model.layers.54.post_attention_layernorm.weight": "pytorch_model-00013-of-00015.bin",
463
+ "model.layers.54.self_attn.k_proj.weight": "pytorch_model-00013-of-00015.bin",
464
+ "model.layers.54.self_attn.o_proj.weight": "pytorch_model-00013-of-00015.bin",
465
+ "model.layers.54.self_attn.q_proj.weight": "pytorch_model-00013-of-00015.bin",
466
+ "model.layers.54.self_attn.v_proj.weight": "pytorch_model-00013-of-00015.bin",
467
+ "model.layers.55.input_layernorm.weight": "pytorch_model-00014-of-00015.bin",
468
+ "model.layers.55.mlp.down_proj.weight": "pytorch_model-00014-of-00015.bin",
469
+ "model.layers.55.mlp.gate_proj.weight": "pytorch_model-00013-of-00015.bin",
470
+ "model.layers.55.mlp.up_proj.weight": "pytorch_model-00014-of-00015.bin",
471
+ "model.layers.55.post_attention_layernorm.weight": "pytorch_model-00014-of-00015.bin",
472
+ "model.layers.55.self_attn.k_proj.weight": "pytorch_model-00013-of-00015.bin",
473
+ "model.layers.55.self_attn.o_proj.weight": "pytorch_model-00013-of-00015.bin",
474
+ "model.layers.55.self_attn.q_proj.weight": "pytorch_model-00013-of-00015.bin",
475
+ "model.layers.55.self_attn.v_proj.weight": "pytorch_model-00013-of-00015.bin",
476
+ "model.layers.56.input_layernorm.weight": "pytorch_model-00014-of-00015.bin",
477
+ "model.layers.56.mlp.down_proj.weight": "pytorch_model-00014-of-00015.bin",
478
+ "model.layers.56.mlp.gate_proj.weight": "pytorch_model-00014-of-00015.bin",
479
+ "model.layers.56.mlp.up_proj.weight": "pytorch_model-00014-of-00015.bin",
480
+ "model.layers.56.post_attention_layernorm.weight": "pytorch_model-00014-of-00015.bin",
481
+ "model.layers.56.self_attn.k_proj.weight": "pytorch_model-00014-of-00015.bin",
482
+ "model.layers.56.self_attn.o_proj.weight": "pytorch_model-00014-of-00015.bin",
483
+ "model.layers.56.self_attn.q_proj.weight": "pytorch_model-00014-of-00015.bin",
484
+ "model.layers.56.self_attn.v_proj.weight": "pytorch_model-00014-of-00015.bin",
485
+ "model.layers.57.input_layernorm.weight": "pytorch_model-00014-of-00015.bin",
486
+ "model.layers.57.mlp.down_proj.weight": "pytorch_model-00014-of-00015.bin",
487
+ "model.layers.57.mlp.gate_proj.weight": "pytorch_model-00014-of-00015.bin",
488
+ "model.layers.57.mlp.up_proj.weight": "pytorch_model-00014-of-00015.bin",
489
+ "model.layers.57.post_attention_layernorm.weight": "pytorch_model-00014-of-00015.bin",
490
+ "model.layers.57.self_attn.k_proj.weight": "pytorch_model-00014-of-00015.bin",
491
+ "model.layers.57.self_attn.o_proj.weight": "pytorch_model-00014-of-00015.bin",
492
+ "model.layers.57.self_attn.q_proj.weight": "pytorch_model-00014-of-00015.bin",
493
+ "model.layers.57.self_attn.v_proj.weight": "pytorch_model-00014-of-00015.bin",
494
+ "model.layers.58.input_layernorm.weight": "pytorch_model-00014-of-00015.bin",
495
+ "model.layers.58.mlp.down_proj.weight": "pytorch_model-00014-of-00015.bin",
496
+ "model.layers.58.mlp.gate_proj.weight": "pytorch_model-00014-of-00015.bin",
497
+ "model.layers.58.mlp.up_proj.weight": "pytorch_model-00014-of-00015.bin",
498
+ "model.layers.58.post_attention_layernorm.weight": "pytorch_model-00014-of-00015.bin",
499
+ "model.layers.58.self_attn.k_proj.weight": "pytorch_model-00014-of-00015.bin",
500
+ "model.layers.58.self_attn.o_proj.weight": "pytorch_model-00014-of-00015.bin",
501
+ "model.layers.58.self_attn.q_proj.weight": "pytorch_model-00014-of-00015.bin",
502
+ "model.layers.58.self_attn.v_proj.weight": "pytorch_model-00014-of-00015.bin",
503
+ "model.layers.59.input_layernorm.weight": "pytorch_model-00015-of-00015.bin",
504
+ "model.layers.59.mlp.down_proj.weight": "pytorch_model-00015-of-00015.bin",
505
+ "model.layers.59.mlp.gate_proj.weight": "pytorch_model-00014-of-00015.bin",
506
+ "model.layers.59.mlp.up_proj.weight": "pytorch_model-00014-of-00015.bin",
507
+ "model.layers.59.post_attention_layernorm.weight": "pytorch_model-00015-of-00015.bin",
508
+ "model.layers.59.self_attn.k_proj.weight": "pytorch_model-00014-of-00015.bin",
509
+ "model.layers.59.self_attn.o_proj.weight": "pytorch_model-00014-of-00015.bin",
510
+ "model.layers.59.self_attn.q_proj.weight": "pytorch_model-00014-of-00015.bin",
511
+ "model.layers.59.self_attn.v_proj.weight": "pytorch_model-00014-of-00015.bin",
512
+ "model.layers.6.input_layernorm.weight": "pytorch_model-00002-of-00015.bin",
513
+ "model.layers.6.mlp.down_proj.weight": "pytorch_model-00002-of-00015.bin",
514
+ "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00002-of-00015.bin",
515
+ "model.layers.6.mlp.up_proj.weight": "pytorch_model-00002-of-00015.bin",
516
+ "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00002-of-00015.bin",
517
+ "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00002-of-00015.bin",
518
+ "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00002-of-00015.bin",
519
+ "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00002-of-00015.bin",
520
+ "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00002-of-00015.bin",
521
+ "model.layers.7.input_layernorm.weight": "pytorch_model-00003-of-00015.bin",
522
+ "model.layers.7.mlp.down_proj.weight": "pytorch_model-00003-of-00015.bin",
523
+ "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00002-of-00015.bin",
524
+ "model.layers.7.mlp.up_proj.weight": "pytorch_model-00002-of-00015.bin",
525
+ "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00003-of-00015.bin",
526
+ "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00002-of-00015.bin",
527
+ "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00002-of-00015.bin",
528
+ "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00002-of-00015.bin",
529
+ "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00002-of-00015.bin",
530
+ "model.layers.8.input_layernorm.weight": "pytorch_model-00003-of-00015.bin",
531
+ "model.layers.8.mlp.down_proj.weight": "pytorch_model-00003-of-00015.bin",
532
+ "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00003-of-00015.bin",
533
+ "model.layers.8.mlp.up_proj.weight": "pytorch_model-00003-of-00015.bin",
534
+ "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00003-of-00015.bin",
535
+ "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00003-of-00015.bin",
536
+ "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00003-of-00015.bin",
537
+ "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00003-of-00015.bin",
538
+ "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00003-of-00015.bin",
539
+ "model.layers.9.input_layernorm.weight": "pytorch_model-00003-of-00015.bin",
540
+ "model.layers.9.mlp.down_proj.weight": "pytorch_model-00003-of-00015.bin",
541
+ "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00003-of-00015.bin",
542
+ "model.layers.9.mlp.up_proj.weight": "pytorch_model-00003-of-00015.bin",
543
+ "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00003-of-00015.bin",
544
+ "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00003-of-00015.bin",
545
+ "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00003-of-00015.bin",
546
+ "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00003-of-00015.bin",
547
+ "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00003-of-00015.bin",
548
+ "model.norm.weight": "pytorch_model-00015-of-00015.bin"
549
+ }
550
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ {
4
+ "content": "<|im_start|>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ },
10
+ {
11
+ "content": "<|im_end|>",
12
+ "lstrip": false,
13
+ "normalized": false,
14
+ "rstrip": false,
15
+ "single_word": false
16
+ }
17
+ ],
18
+ "bos_token": {
19
+ "content": "<|startoftext|>",
20
+ "lstrip": false,
21
+ "normalized": false,
22
+ "rstrip": false,
23
+ "single_word": false
24
+ },
25
+ "eos_token": {
26
+ "content": "<|endoftext|>",
27
+ "lstrip": false,
28
+ "normalized": false,
29
+ "rstrip": false,
30
+ "single_word": false
31
+ },
32
+ "pad_token": {
33
+ "content": "<unk>",
34
+ "lstrip": false,
35
+ "normalized": false,
36
+ "rstrip": false,
37
+ "single_word": false
38
+ },
39
+ "unk_token": {
40
+ "content": "<unk>",
41
+ "lstrip": false,
42
+ "normalized": false,
43
+ "rstrip": false,
44
+ "single_word": false
45
+ }
46
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:386c49cf943d71aa110361135338c50e38beeff0a66593480421f37b319e1a39
3
+ size 1033105
tokenizer_config.json ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<|startoftext|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "<|endoftext|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "6": {
30
+ "content": "<|im_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "7": {
38
+ "content": "<|im_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ }
45
+ },
46
+ "additional_special_tokens": [
47
+ "<|im_start|>",
48
+ "<|im_end|>"
49
+ ],
50
+ "bos_token": "<|startoftext|>",
51
+ "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>'}}{% if message['role']=='assistant' %}{{'text'}}{% else %}{{message['role']}}{% endif %}{{'\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>text\n' }}{% endif %}",
52
+ "clean_up_tokenization_spaces": false,
53
+ "eos_token": "<|endoftext|>",
54
+ "legacy": true,
55
+ "model_max_length": 200000,
56
+ "pad_token": "<unk>",
57
+ "sp_model_kwargs": {},
58
+ "tokenizer_class": "LlamaTokenizer",
59
+ "unk_token": "<unk>",
60
+ "use_default_system_prompt": false
61
+ }