Prakamya commited on
Commit
55a294e
·
verified ·
1 Parent(s): 7bf637a

Upload folder using huggingface_hub

Browse files
LICENSE ADDED
@@ -0,0 +1,316 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Instella-VL-1B Model [RESEARCH-ONLY RAIL-MS]
2
+
3
+ Licensed Artifact(s):
4
+
5
+ - Model
6
+
7
+ - Source Code
8
+
9
+ Section I: PREAMBLE
10
+
11
+ BY ACCESSING, DOWNLOADING, INSTALLING, OR USING THE ARTIFACT, YOU AGREE
12
+ TO BE BOUND BY THIS LICENSE. IF YOU DO NOT AGREE TO ALL OF THE TERMS AND
13
+ CONDITIONS OF THIS LICENSE, DO NOT ACCESS, DOWNLOAD, INSTALL, OR USE THE
14
+ ARTIFACT.
15
+
16
+ 1. Definitions
17
+
18
+ (a) “Application” refers to a sequence of instructions or statements
19
+ written in machine code language, including object code (that is the
20
+ product of a compiler), binary code (data using a two-symbol system)
21
+ or an intermediate language (such as register transfer language).
22
+
23
+ (b) “Artifact” refers to a software application (in either binary or
24
+ source code format), Model, and/or Source Code, in accordance with
25
+ what is specified above as the “Licensed Artifact”.
26
+
27
+ (c) “Contribution” means any work, including any modifications or
28
+ additions to an Artifact, that is intentionally submitted to
29
+ Licensor for inclusion or incorporation in the Artifact directly or
30
+ indirectly by the rights owner. For the purposes of this definition,
31
+ “submitted” means any form of electronic, verbal, or written
32
+ communication sent to the Licensor or its representatives, including
33
+ but not limited to communication on electronic mailing lists, source
34
+ code control systems, and issue tracking systems that are managed
35
+ by, or on behalf of, the Licensor for the purpose of discussing,
36
+ sharing and improving the Artifact, but excluding communication that
37
+ is conspicuously marked or otherwise designated in writing by the
38
+ contributor as “Not a Contribution.”
39
+
40
+ (d) “Contributor” means Licensor or any other individual or legal entity
41
+ that creates or owns a Contribution that is added to or incorporated
42
+ into an Artifact or its Derivative.
43
+
44
+ (e) “Data” means a collection of information and/or content extracted
45
+ from the dataset used with a given Model, including to train,
46
+ pretrain, or otherwise evaluate the Model. The Data is not licensed
47
+ under this License.
48
+
49
+ (f) “Derivative” means a work derived from or based upon an Artifact,
50
+ and includes all modified versions of such Artifact.
51
+
52
+ (g) “Distribution” means any transmission, reproduction, publication or
53
+ other sharing of an Artifact or Derivative to a Third Party,
54
+ including providing a hosted service incorporating the Artifact,
55
+ which is made available by electronic or other remote means -
56
+ e.g. API-based or web access.
57
+
58
+ (h) “Harm” includes but is not limited to physical, mental,
59
+ psychological, financial and reputational damage, pain, or loss.
60
+
61
+ (i) “License” means the terms and conditions for use, reproduction, and
62
+ Distribution as defined in this document.
63
+
64
+ (j) “Licensor” means the rights owner (by virtue of creation or
65
+ documented transfer of ownership) or entity authorized by the rights
66
+ owner (e.g., exclusive licensee) that is granting the rights in this
67
+ License.
68
+
69
+ (k) “Model” means any machine-learning based assembly or assemblies
70
+ (including checkpoints), consisting of learnt weights, parameters
71
+ (including optimizer states), corresponding to the model
72
+ architecture as embodied in the Source Code.
73
+
74
+ (l) “Output” means the results of operating a Model as embodied in
75
+ informational content resulting therefrom.
76
+
77
+ (m) “Permitted Purpose” means for academic or research purposes only.
78
+
79
+ (n) “Source Code” means any collection of text written using
80
+ human-readable programming language, including the code and scripts
81
+ used to define, run, load, benchmark or evaluate a Model or any
82
+ component thereof, and/or used to prepare data for training or
83
+ evaluation, if any. Source Code includes any accompanying
84
+ documentation, tutorials, examples, etc, if any. For clarity, the
85
+ term “Source Code” as used in this License includes any and all
86
+ Derivatives of such Source Code.
87
+
88
+ (o) “Third Parties” means individuals or legal entities that are not
89
+ under common control with Licensor or You.
90
+
91
+ (p) “Use” includes accessing, using, copying, modifying, and/or
92
+ distributing an Artifact; in connection with a Model as Artifact,
93
+ Use also includes creating content, fine-tuning, updating, running,
94
+ training, evaluating and/or re-parametrizing such Model.
95
+
96
+ (q) “You” (or “Your”) means an individual or legal entity receiving and
97
+ exercising permissions granted by this License and/or making use of
98
+ the Artifact for permitted purposes and in any permitted field of
99
+ use, including usage of the Artifact in an end-use application -
100
+ e.g. chatbot, translator, image generator, etc.
101
+
102
+ Section II: INTELLECTUAL PROPERTY RIGHTS
103
+
104
+ Both copyright and patent grants may apply to the Artifact. The Artifact
105
+ is subject to additional terms and conditions as described in Section III
106
+ below.
107
+
108
+ 2. Grant of Copyright License. Conditioned upon compliance with Section
109
+ III below and subject to the terms and conditions of this License, each
110
+ Contributor hereby grants to You, only in connection with the Permitted
111
+ Purpose, a worldwide, non-exclusive, royalty-free copyright license to
112
+ reproduce, use, publicly display, publicly perform, sublicense, and
113
+ distribute the Artifact and Derivatives thereof.
114
+
115
+ 3. Grant of Patent License. Conditioned upon compliance with Section III
116
+ below and subject to the terms and conditions of this License, and only
117
+ where and as applicable, each Contributor hereby grants to You, only in
118
+ connection with the Permitted Purpose, a worldwide, non-exclusive,
119
+ royalty-free, irrevocable (except as stated in this paragraph) patent
120
+ license to make, have made, use, sell, offer to sell, import, and
121
+ otherwise transfer the Artifact where such license applies only to those
122
+ patent claims licensable by such Contributor that are necessarily
123
+ infringed by their Contribution(s) alone or by combination of their
124
+ Contribution(s) with the Artifact to which such Contribution(s) was
125
+ submitted. If You institute patent litigation against any entity
126
+ (including a cross-claim or counterclaim in a lawsuit) alleging that the
127
+ Artifact and/or a Contribution incorporated within the Artifact
128
+ constitutes direct or contributory patent infringement, then any patent
129
+ licenses granted to You under this License in connection with the
130
+ Artifact shall terminate as of the date such litigation is asserted or
131
+ filed.
132
+
133
+ Licensor and Contributor each have the right to grant the licenses
134
+ above.
135
+
136
+ Section III: CONDITIONS OF USAGE, DISTRIBUTION AND REDISTRIBUTION
137
+
138
+ 4. Use-based Restrictions. The restrictions contained in the AMD
139
+ Responsible AI Use Policy set forth in Attachment A are mandatory Use-
140
+ based restrictions. Therefore You may not Use the Artifact in violation
141
+ of such restrictions. You may Use the Artifact only subject to this
142
+ License; if Section II is held unenforceable or inapplicable, this
143
+ Section III will continue to govern any use of the Artifact. You shall
144
+ require all of Your users who Use the Artifact or its Derivative
145
+ to comply with the terms and conditions of this License, including
146
+ those contained in this paragraph, and only for the Permitted Purpose.
147
+
148
+ 5. The Output You Generate with a Model (as Artifact). Except as set
149
+ forth herein, Licensor claims no rights in the Output You generate. You
150
+ are accountable for the Output You generate and its subsequent uses. No
151
+ use of the Output may contravene any provision as stated in this
152
+ License.
153
+
154
+ 6. Distribution and Redistribution. You may host for Third Party remote
155
+ access purposes (e.g. software-as-a-service), reproduce and distribute
156
+ copies of the Artifact or its Derivatives in any medium, with or without
157
+ modifications, provided that You meet the following conditions:
158
+
159
+ 6.1. Use-based restrictions in paragraph 4 MUST be included as a
160
+ condition precedent to effect any type of legal agreement (e.g. a
161
+ license) governing the use and/or distribution of the Artifact or
162
+ its Derivatives, and You shall give such notice to any subsequent
163
+ Third Party recipients;
164
+ 6.2. You shall give any Third Party recipients of the Artifact or its
165
+ Derivatives a copy of this License;
166
+ 6.3. You shall cause any modified files to carry prominent notices
167
+ stating that You changed the files;
168
+ 6.4. You shall retain all copyright, patent, trademark, and attribution
169
+ notices excluding those notices that do not pertain to any part of
170
+ the Artifact or its Derivatives.
171
+ 6.5. You and any Third Party recipients of the Artifact or its
172
+ Derivative shall adhere to the Permitted Purpose.
173
+
174
+ You may add Your own copyright statement to Your modifications and may
175
+ provide additional or different license terms and conditions with
176
+ respect to paragraph 6.1., to govern the use, reproduction, or
177
+ Distribution of Your modifications, or for any Derivative, provided that
178
+ Your use, reproduction, and Distribution of the Artifact or its
179
+ Derivative otherwise complies with the conditions stated in this
180
+ License. In other words, the Use-based restrictions in Attachment A form
181
+ the minimum set of terms for You to license to Third Parties any
182
+ Artifact or its Derivative, but You may add more restrictive terms if
183
+ You deem it necessary.
184
+
185
+ Section IV: OTHER PROVISIONS
186
+
187
+ 7. Updates and Runtime Restrictions. To the maximum extent permitted by
188
+ law, Licensor reserves the right to restrict (remotely or otherwise)
189
+ usage of the Artifact in violation of this License or update the
190
+ Artifact through electronic means.
191
+
192
+ 8. Trademarks and Related. Nothing in this License permits You to make
193
+ use of Licensors’ trademarks, trade names, logos or to otherwise suggest
194
+ endorsement or misrepresent the relationship between the parties; and
195
+ any rights not expressly granted herein are reserved by the Licensors.
196
+
197
+ 9. Disclaimer of Warranty. Unless required by applicable law or agreed
198
+ to in writing, Licensor provides the Artifact (and each Contributor
199
+ provides its Contributions) on an “AS IS” BASIS, WITHOUT WARRANTIES OR
200
+ CONDITIONS OF ANY KIND, either express or implied, including, without
201
+ limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT,
202
+ MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely
203
+ responsible for determining the appropriateness of using the Artifact,
204
+ and assume any risks associated with Your exercise of permissions under
205
+ this License.
206
+
207
+ 10. Limitation of Liability. In no event and under no legal theory,
208
+ whether in tort (including negligence), contract, or otherwise, unless
209
+ required by applicable law (such as deliberate and grossly negligent
210
+ acts) or agreed to in writing, shall any Contributor be liable to You
211
+ for damages, including any direct, indirect, special, incidental, or
212
+ consequential damages of any character arising as a result of this
213
+ License or out of the use or inability to use the Artifact (including
214
+ but not limited to damages for loss of goodwill, work stoppage, computer
215
+ failure or malfunction, or any and all other commercial damages or
216
+ losses), even if such Contributor has been advised of the possibility of
217
+ such damages.
218
+
219
+ 11. If any provision of this License is held to be invalid, illegal or
220
+ unenforceable, the remaining provisions shall be unaffected thereby and
221
+ remain valid as if such provision had not been set forth herein.
222
+
223
+ 12. Term and Termination. The term of this License will commence upon
224
+ the earlier of Your (a) acceptance of this License or (b) accessing the
225
+ Artifact; and will continue in full force and effect until terminated in
226
+ accordance with the terms and conditions herein. Licensor may terminate
227
+ this License if You are in breach of any term or condition of this
228
+ License. Upon termination of this License, all licenses granted to You
229
+ will terminate and You must promptly delete and cease use of the
230
+ Artifact. Sections 1, 7, 8, 9, 10, 11, and 12 survive termination of
231
+ this License.
232
+
233
+ END OF TERMS AND CONDITIONS
234
+
235
+ Attachment A
236
+
237
+ AMD Responsible AI Use Policy
238
+
239
+ AMD is committed to the responsible use of its Artificial Intelligence
240
+ (AI) products and technologies (“AMD AI”). AMD AI may include
241
+ artificial intelligence or machine learning technologies that use
242
+ algorithms to analyze data and generate output using predictions based
243
+ on patterns in data. This policy explains the uses that AMD
244
+ specifically prohibits.
245
+
246
+ If you use any AMD AI, you are agreeing to use the AMD AI in compliance
247
+ with applicable laws and not for any of the following prohibited uses.
248
+
249
+ Prohibited Uses:
250
+
251
+ 1) No Illegal Acts. Do not use AMD AI in violation of any applicable
252
+ national, state, local, or other jurisdictional law, rule, regulation,
253
+ or sanction.
254
+
255
+ 2) No Explicit Content. Do not use AMD AI to submit (as input),
256
+ generate, or disseminate content depicting violent or sexually explicit
257
+ content or to create sexual chatbots.
258
+
259
+ 3) No Harm. Do not use AMD AI for any potentially harmful uses,
260
+ including fraud, deception, discrimination, abuse, or harassment,
261
+ including the following:
262
+
263
+ a) Harm or abuse of a minor, including grooming and child sexual
264
+ exploitation.
265
+
266
+ b) Impersonation of human beings for purposes of deception.
267
+
268
+ c) Generation or dissemination of information you know to be false
269
+ for the purpose of harming others.
270
+
271
+ d) Intentionally defame, disparage, or otherwise harass others.
272
+
273
+ e) Intentionally attempting to materially distort the behavior of a
274
+ person in a manner that causes or is likely to cause that person
275
+ or another person physical or psychological harm.
276
+
277
+ f) Providing medical advice or interpretation of medical results that
278
+ is intended to be a substitute for professional medical advice,
279
+ diagnosis, or treatment.
280
+
281
+ g) Engaging in the unlawful or unauthorized practice of any
282
+ profession, including financial, legal, medical, health, or
283
+ related professional practices.
284
+
285
+ h) Judgment of, discrimination against, or harm to individuals or
286
+ groups based on legally protected characteristics or categories,
287
+ online or offline social behavior, or known or predicted personal
288
+ or personality characteristics, including any of the foregoing
289
+ uses in social credit systems.
290
+
291
+ 4) No High-Risk Activity. Do not use AMD AI in any high-risk activities
292
+ or applications that create a risk of personal injury, death, or
293
+ severe property or environmental damage, including in weapons or
294
+ military applications.
295
+
296
+ 5) No Personal Information. Do not use AMD AI to collect, process, or
297
+ disclose personal data, including heath or sensitive personal
298
+ information, without the necessary rights or consents.
299
+
300
+ 6) No Infringement. Do not use AMD AI to generate or disseminate any
301
+ information that infringes upon or misappropriates the intellectual
302
+ property rights of others, including copyright, trademark, patent, and
303
+ trade secret rights, rights to privacy, and publicity rights.
304
+
305
+ 7) No Malware. Do not use AMD AI to generate or disseminate malware or
306
+ any other content to be used for the purpose of facilitating unpermitted
307
+ access to, or use of, computer systems or data.
308
+
309
+ 8) No Obfuscation. Do not inappropriately obfuscate or fail to disclose
310
+ to end users the presence of AI in any application in which AMD AI is
311
+ deployed, along with any known risks or dangers of using AI without
312
+ appropriate safeguards, oversight and human control.
313
+
314
+ 9) No Reliance. Do not rely on any information generated using AMD AI
315
+ without assessing it for accuracy, potential for harm, or other specific
316
+ risks applicable to the use case.
NOTICE ADDED
@@ -0,0 +1,444 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ NOTICES Instella_VL_1B
2
+
3
+
4
+ Copyright Statements
5
+
6
+ Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All Rights Reserved.
7
+
8
+ License Text https://spdx.org/licenses/Apache-2.0.html
9
+
10
+ Apache License
11
+ Version 2.0, January 2004
12
+ http://www.apache.org/licenses/
13
+
14
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
15
+
16
+ 1. Definitions.
17
+ "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
18
+
19
+ "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
20
+
21
+ "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
24
+
25
+ "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
26
+ "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
27
+
28
+ "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
29
+
30
+ "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
31
+
32
+ "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
33
+
34
+ "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
35
+
36
+ 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
37
+ 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
38
+ 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
39
+ (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and
40
+ (b) You must cause any modified files to carry prominent notices stating that You changed the files; and
41
+ (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
42
+ (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
43
+ You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
44
+
45
+ 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
46
+ 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
47
+ 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
48
+ 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
49
+ 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
50
+ END OF TERMS AND CONDITIONS
51
+
52
+ amd-AMD-OLMo-1B-SFT v-u (Apache-2.0)
53
+
54
+
55
+ Copyright Statements
56
+
57
+ Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All Rights Reserved.
58
+
59
+ License Text https://spdx.org/licenses/Apache-2.0.html
60
+
61
+ Apache License
62
+ Version 2.0, January 2004
63
+ http://www.apache.org/licenses/
64
+
65
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
66
+
67
+ 1. Definitions.
68
+ "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
69
+
70
+ "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
71
+
72
+ "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
73
+
74
+ "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
75
+
76
+ "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
77
+
78
+ "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
79
+
80
+ "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
81
+
82
+ "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
83
+
84
+ "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
85
+
86
+ "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
87
+
88
+ 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
89
+ 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
90
+ 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
91
+ (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and
92
+ (b) You must cause any modified files to carry prominent notices stating that You changed the files; and
93
+ (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
94
+ (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
95
+ You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
96
+
97
+ 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
98
+ 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
99
+ 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
100
+ 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
101
+ 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
102
+ END OF TERMS AND CONDITIONS
103
+
104
+ Dependencies on FastChat v-u (Apache-2.0)
105
+
106
+
107
+ Copyright Statements
108
+
109
+ Modification Copyright© 2025 Advanced Micro Devices, Inc. All rights reserved."
110
+ Copyright 2023 Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li
111
+
112
+ License Text https://spdx.org/licenses/MIT.html
113
+
114
+ Apache License
115
+ Version 2.0, January 2004
116
+ http://www.apache.org/licenses/
117
+
118
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
119
+
120
+ 1. Definitions.
121
+ "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
122
+
123
+ "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
124
+
125
+ "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
126
+
127
+ "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
128
+
129
+ "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
130
+
131
+ "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
132
+
133
+ "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
134
+
135
+ "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
136
+
137
+ "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
138
+
139
+ "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
140
+
141
+ 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
142
+ 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
143
+ 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
144
+ (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and
145
+ (b) You must cause any modified files to carry prominent notices stating that You changed the files; and
146
+ (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
147
+ (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
148
+ You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
149
+
150
+ 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
151
+ 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
152
+ 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
153
+ 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
154
+ 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
155
+ END OF TERMS AND CONDITIONS
156
+
157
+ Dependencies on LLaVA-NeXT v-u (Apache-2.0)
158
+
159
+
160
+ Copyright Statements
161
+
162
+ "Modification Copyright© 2025 Advanced Micro Devices, Inc. All rights reserved."
163
+ Copyright 2023 Haotian Liu
164
+
165
+ License Text https://spdx.org/licenses/Apache-2.0.html
166
+
167
+ Apache License
168
+ Version 2.0, January 2004
169
+ http://www.apache.org/licenses/
170
+
171
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
172
+
173
+ 1. Definitions.
174
+ "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
175
+
176
+ "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
177
+
178
+ "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
179
+
180
+ "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
181
+
182
+ "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
183
+
184
+ "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
185
+
186
+ "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
187
+
188
+ "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
189
+
190
+ "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
191
+
192
+ "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
193
+
194
+ 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
195
+ 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
196
+ 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
197
+ (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and
198
+ (b) You must cause any modified files to carry prominent notices stating that You changed the files; and
199
+ (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
200
+ (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
201
+ You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
202
+
203
+ 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
204
+ 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
205
+ 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
206
+ 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
207
+ 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
208
+ END OF TERMS AND CONDITIONS
209
+
210
+ Dependencies on OpenGVLab-InternVL v-u (MIT)
211
+
212
+
213
+ Copyright Statements
214
+
215
+ Modification Copyright© 2025 Advanced Micro Devices, Inc. All rights reserved."
216
+ Copyright (c) 2023 OpenGVLab
217
+
218
+ License Text https://spdx.org/licenses/MIT.html
219
+
220
+ # "Modification Copyright© 2025 Advanced Micro Devices, Inc. All rights reserved."
221
+
222
+ # --------------------------------------------------------
223
+ # InternVL
224
+ # Copyright (c) 2023 OpenGVLab
225
+ # Licensed under The MIT License [see LICENSE for details]
226
+ # --------------------------------------------------------
227
+
228
+ LLaVA-NeXT v-u (Apache-2.0)
229
+
230
+
231
+ Copyright Statements
232
+
233
+ Copyright 2022 The HuggingFace Team. All rights reserved.
234
+ Copyright 2023 Haotian Liu
235
+ Copyright 2024 Duc Q. Nguyen, Haotian Liu and Bo Li
236
+ Copyright 2022 EleutherAI and the HuggingFace Inc. team. All rights reserved
237
+ Copyright 2023 DDPO-pytorch authors (Kevin Black), The HuggingFace Team, metric-space. All rights reserved.
238
+
239
+
240
+ License Text https://spdx.org/licenses/Apache-2.0.html
241
+
242
+ Apache License
243
+ Version 2.0, January 2004
244
+ http://www.apache.org/licenses/
245
+
246
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
247
+
248
+ 1. Definitions.
249
+ "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
250
+
251
+ "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
252
+
253
+ "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
254
+
255
+ "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
256
+
257
+ "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
258
+
259
+ "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
260
+
261
+ "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
262
+
263
+ "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
264
+
265
+ "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
266
+
267
+ "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
268
+
269
+ 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
270
+ 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
271
+ 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
272
+ (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and
273
+ (b) You must cause any modified files to carry prominent notices stating that You changed the files; and
274
+ (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
275
+ (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
276
+ You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
277
+
278
+ 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
279
+ 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
280
+ 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
281
+ 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
282
+ 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
283
+ END OF TERMS AND CONDITIONS
284
+
285
+ microsoft-unilm v-u (MIT)
286
+
287
+
288
+ Copyright Statements
289
+
290
+ Copyright (c) Microsoft Corporation
291
+
292
+ License Text https://spdx.org/licenses/MIT.html
293
+
294
+ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the " Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
295
+
296
+ The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.
297
+
298
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
299
+
300
+ openai-CLIP v-u (MIT)
301
+
302
+
303
+ Copyright Statements
304
+
305
+ Copyright (c) 2021 OpenAI.
306
+
307
+ License Text https://spdx.org/licenses/MIT.html
308
+
309
+ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the " Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
310
+
311
+ The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.
312
+
313
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
314
+
315
+ salesforce-LAVIS v-u (BSD-3-Clause)
316
+
317
+
318
+ Copyright Statements
319
+
320
+ Copyright (c) 2023, salesforce.com, inc.
321
+
322
+ License Text https://spdx.org/licenses/BSD-3-Clause.html
323
+
324
+ Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
325
+
326
+ 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
327
+ 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
328
+ 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
329
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
330
+
331
+
332
+ Copyright Statements
333
+
334
+
335
+ Tongyi Qianwen is licensed under the Tongyi Qianwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.
336
+
337
+ Tongyi Qianwen LICENSE AGREEMENT
338
+
339
+ Tongyi Qianwen Release Date: August 23, 2023
340
+
341
+ By clicking to agree or by using or distributing any portion or element of the Tongyi Qianwen Materials, you will be deemed to have recognized and accepted the content of this Agreement, which is effective immediately.
342
+
343
+ 1. Definitions
344
+ a. This Tongyi Qianwen LICENSE AGREEMENT (this "Agreement") shall mean the terms and conditions for use, reproduction, distribution and modification of the Materials as defined by this Agreement.
345
+ b. "We"(or "Us") shall mean Alibaba Cloud.
346
+ c. "You" (or "Your") shall mean a natural person or legal entity exercising the rights granted by this Agreement and/or using the Materials for any purpose and in any field of use.
347
+ d. "Third Parties" shall mean individuals or legal entities that are not under common control with Us or You.
348
+ e. "Tongyi Qianwen" shall mean the large language models (including Qwen-VL model and Qwen-VL-Chat model), and software and algorithms, consisting of trained model weights, parameters (including optimizer states), machine-learning model code, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing distributed by Us.
349
+ f. "Materials" shall mean, collectively, Alibaba Cloud's proprietary Tongyi Qianwen and Documentation (and any portion thereof) made available under this Agreement.
350
+ g. "Source" form shall mean the preferred form for making modifications, including but not limited to model source code, documentation source, and configuration files.
351
+ h. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation,
352
+ and conversions to other media types.
353
+
354
+ 2. Grant of Rights
355
+ You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Alibaba Cloud's intellectual property or other rights owned by Us embodied in the Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Materials.
356
+
357
+ 3. Redistribution
358
+ You may reproduce and distribute copies of the Materials or derivative works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
359
+ a. You shall give any other recipients of the Materials or derivative works a copy of this Agreement;
360
+ b. You shall cause any modified files to carry prominent notices stating that You changed the files;
361
+ c. You shall retain in all copies of the Materials that You distribute the following attribution notices within a "Notice" text file distributed as a part of such copies: "Tongyi Qianwen is licensed under the Tongyi Qianwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved."; and
362
+ d. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such derivative works as a whole, provided Your use, reproduction, and distribution of the work otherwise complies with the terms and conditions of this Agreement.
363
+
364
+ 4. Restrictions
365
+ If you are commercially using the Materials, and your product or service has more than 100 million monthly active users, You shall request a license from Us. You cannot exercise your rights under this Agreement without our express authorization.
366
+
367
+ 5. Rules of use
368
+ a. The Materials may be subject to export controls or restrictions in China, the United States or other countries or regions. You shall comply with applicable laws and regulations in your use of the Materials.
369
+ b. You can not use the Materials or any output therefrom to improve any other large language model (excluding Tongyi Qianwen or derivative works thereof).
370
+
371
+ 6. Intellectual Property
372
+ a. We retain ownership of all intellectual property rights in and to the Materials and derivatives made by or for Us. Conditioned upon compliance with the terms and conditions of this Agreement, with respect to any derivative works and modifications of the Materials that are made by you, you are and will be the owner of such derivative works and modifications.
373
+ b. No trademark license is granted to use the trade names, trademarks, service marks, or product names of Us, except as required to fulfill notice requirements under this Agreement or as required for reasonable and customary use in describing and redistributing the Materials.
374
+ c. If you commence a lawsuit or other proceedings (including a cross-claim or counterclaim in a lawsuit) against Us or any entity alleging that the Materials or any output therefrom, or any part of the foregoing, infringe any intellectual property or other right owned or licensable by you, then all licences granted to you under this Agreement shall terminate as of the date such lawsuit or other proceeding is commenced or brought.
375
+
376
+ 7. Disclaimer of Warranty and Limitation of Liability
377
+
378
+ a. We are not obligated to support, update, provide training for, or develop any further version of the Tongyi Qianwen Materials or to grant any license thereto.
379
+ b. THE MATERIALS ARE PROVIDED "AS IS" WITHOUT ANY EXPRESS OR IMPLIED WARRANTY OF ANY KIND INCLUDING WARRANTIES OF MERCHANTABILITY, NONINFRINGEMENT, OR FITNESS FOR A PARTICULAR PURPOSE. WE MAKE NO WARRANTY AND ASSUME NO RESPONSIBILITY FOR THE SAFETY OR STABILITY OF THE MATERIALS AND ANY OUTPUT THEREFROM.
380
+ c. IN NO EVENT SHALL WE BE LIABLE TO YOU FOR ANY DAMAGES, INCLUDING, BUT NOT LIMITED TO ANY DIRECT, OR INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES ARISING FROM YOUR USE OR INABILITY TO USE THE MATERIALS OR ANY OUTPUT OF IT, NO MATTER HOW IT S CAUSED.
381
+ d. You will defend, indemnify and hold harmless Us from and against any claim by any third party arising out of or related to your use or distribution of the Materials.
382
+
383
+ 8. Survival and Termination.
384
+ a. The term of this Agreement shall commence upon your acceptance of this Agreement or access to the Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein.
385
+ b. We may terminate this Agreement if you breach any of the terms or conditions of this Agreement. Upon termination of this Agreement, you must delete and cease use of the Materials. Sections 7 and 9 shall survive the termination of this Agreement.
386
+
387
+ 9. Governing Law and Jurisdiction.
388
+ a. This Agreement and any dispute arising out of or relating to it will be governed by the laws of China, without regard to conflict of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement.
389
+ b. The People's Courts in Hangzhou City shall have exclusive jurisdiction over any dispute arising out of this Agreement.
390
+
391
+
392
+
393
+ ------------- LICENSE FOR NVIDIA Megatron-LM code --------------
394
+
395
+ Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
396
+
397
+ Redistribution and use in source and binary forms, with or without
398
+ modification, are permitted provided that the following conditions
399
+ are met:
400
+ * Redistributions of source code must retain the above copyright
401
+ notice, this list of conditions and the following disclaimer.
402
+ * Redistributions in binary form must reproduce the above copyright
403
+ notice, this list of conditions and the following disclaimer in the
404
+ documentation and/or other materials provided with the distribution.
405
+ * Neither the name of NVIDIA CORPORATION nor the names of its
406
+ contributors may be used to endorse or promote products derived
407
+ from this software without specific prior written permission.
408
+
409
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
410
+ EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
411
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
412
+ PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
413
+ CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
414
+ EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
415
+ PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
416
+ PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
417
+ OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
418
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
419
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
420
+
421
+
422
+ ------------- LICENSE FOR OpenAI tiktoken code --------------
423
+
424
+ MIT License
425
+
426
+ Copyright (c) 2022 OpenAI, Shantanu Jain
427
+
428
+ Permission is hereby granted, free of charge, to any person obtaining a copy
429
+ of this software and associated documentation files (the "Software"), to deal
430
+ in the Software without restriction, including without limitation the rights
431
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
432
+ copies of the Software, and to permit persons to whom the Software is
433
+ furnished to do so, subject to the following conditions:
434
+
435
+ The above copyright notice and this permission notice shall be included in all
436
+ copies or substantial portions of the Software.
437
+
438
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
439
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
440
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
441
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
442
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
443
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
444
+ SOFTWARE.
README.md ADDED
@@ -0,0 +1,433 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ {}
3
+ ---
4
+ # Instella-VL-1B ✨
5
+ Welcome to the official repository for **Instella-VL-1B**, AMD's first ever Vision-Language Model (VLM). This repository provides a detailed guide for training and inference with **Instella-VL-1B**. Developed from AMD's **Instella-1B** (previously known as [AMD OLMo 1B SFT](https://www.amd.com/en/developer/resources/technical-articles/introducing-the-first-amd-1b-language-model.html) LLM), this model is fully open-source, with both model weights and training code available for AMD GPUs (MI300). Its compact size aims to make it accessible to a broad spectrum of researchers, developers, and enthusiasts, enabling them to build upon, modify, and integrate it into their own projects.
6
+
7
+ [[GitHub](https://github.com/AMD-AIG-AIMA/InstellaVL)][[Blog](https://github.com/AMD-AIG-AIMA/Instella-VL/blog/blog-final.md)]
8
+
9
+ ## Main Results
10
+ We compare our model with models which only releases the model weights (with * in the below table) and also models which releases weights, data curvation and all training details.
11
+
12
+ <table class="tg"><thead>
13
+ <tr>
14
+ <td class="tg-0pky"></td>
15
+ <td class="tg-c3ow">DeepSeek-VL-1.3B *</td>
16
+ <td class="tg-c3ow">InternVL2-1B *</td>
17
+ <td class="tg-c3ow">InternVL2.5-1B *</td>
18
+ <td class="tg-c3ow">TinyLLaVA-2.4B</td>
19
+ <td class="tg-c3ow">TinyLLaVA-1.5B</td>
20
+ <td class="tg-c3ow">llava-onevision-1b</td>
21
+ <td class="tg-c3ow">MiniCPM-V-2</td>
22
+ <td class="tg-c3ow">Instella-VL-1B</td>
23
+ </tr></thead>
24
+ <tbody>
25
+ <tr>
26
+ <td class="tg-c3ow">GQA</td>
27
+ <td class="tg-c3ow">--</td>
28
+ <td class="tg-c3ow">55.06</td>
29
+ <td class="tg-c3ow">56.66</td>
30
+ <td class="tg-c3ow">61.58</td>
31
+ <td class="tg-c3ow">60.28</td>
32
+ <td class="tg-c3ow">57.95</td>
33
+ <td class="tg-c3ow">--</td>
34
+ <td class="tg-c3ow">61.52</td>
35
+ </tr>
36
+ <tr>
37
+ <td class="tg-c3ow">SQA</td>
38
+ <td class="tg-c3ow">64.52</td>
39
+ <td class="tg-c3ow">89.54</td>
40
+ <td class="tg-c3ow">93.90</td>
41
+ <td class="tg-c3ow">64.30</td>
42
+ <td class="tg-c3ow">59.69</td>
43
+ <td class="tg-c3ow">59.25</td>
44
+ <td class="tg-c3ow">76.10</td>
45
+ <td class="tg-c3ow">83.74</td>
46
+ </tr>
47
+ <tr>
48
+ <td class="tg-c3ow">POPE</td>
49
+ <td class="tg-c3ow">85.80</td>
50
+ <td class="tg-c3ow">87.40</td>
51
+ <td class="tg-c3ow">89.95</td>
52
+ <td class="tg-c3ow">85.66</td>
53
+ <td class="tg-c3ow">84.77</td>
54
+ <td class="tg-c3ow">87.17</td>
55
+ <td class="tg-c3ow">86.56</td>
56
+ <td class="tg-c3ow">86.73</td>
57
+ </tr>
58
+ <tr>
59
+ <td class="tg-c3ow">MM-Bench</td>
60
+ <td class="tg-c3ow">64.34</td>
61
+ <td class="tg-c3ow">61.70</td>
62
+ <td class="tg-c3ow">68.40</td>
63
+ <td class="tg-c3ow">58.16</td>
64
+ <td class="tg-c3ow">51.28</td>
65
+ <td class="tg-c3ow">44.60</td>
66
+ <td class="tg-c3ow">70.44</td>
67
+ <td class="tg-c3ow">69.17</td>
68
+ </tr>
69
+ <tr>
70
+ <td class="tg-c3ow">seedbench</td>
71
+ <td class="tg-c3ow">65.94</td>
72
+ <td class="tg-c3ow">65.90</td>
73
+ <td class="tg-c3ow">71.30</td>
74
+ <td class="tg-c3ow">63.30</td>
75
+ <td class="tg-c3ow">60.04</td>
76
+ <td class="tg-c3ow">65.43</td>
77
+ <td class="tg-c3ow">66.90</td>
78
+ <td class="tg-c3ow">68.47</td>
79
+ </tr>
80
+ <tr>
81
+ <td class="tg-c3ow">MMMU</td>
82
+ <td class="tg-c3ow">28.67</td>
83
+ <td class="tg-c3ow">32.40</td>
84
+ <td class="tg-c3ow">35.60</td>
85
+ <td class="tg-c3ow">32.11</td>
86
+ <td class="tg-c3ow">29.89</td>
87
+ <td class="tg-c3ow">30.90</td>
88
+ <td class="tg-c3ow">38.55</td>
89
+ <td class="tg-c3ow">29.30</td>
90
+ </tr>
91
+ <tr>
92
+ <td class="tg-c3ow">realworldqa</td>
93
+ <td class="tg-c3ow">50.20</td>
94
+ <td class="tg-c3ow">51.90</td>
95
+ <td class="tg-c3ow">58.30</td>
96
+ <td class="tg-c3ow">52.42</td>
97
+ <td class="tg-c3ow">46.67</td>
98
+ <td class="tg-c3ow">51.63</td>
99
+ <td class="tg-c3ow">55.03</td>
100
+ <td class="tg-c3ow">58.82</td>
101
+ </tr>
102
+ <tr>
103
+ <td class="tg-c3ow">mmstar</td>
104
+ <td class="tg-c3ow">38.30</td>
105
+ <td class="tg-c3ow">46.18</td>
106
+ <td class="tg-c3ow">47.93</td>
107
+ <td class="tg-c3ow">37.17</td>
108
+ <td class="tg-c3ow">31.87</td>
109
+ <td class="tg-c3ow">37.38</td>
110
+ <td class="tg-c3ow">40.93</td>
111
+ <td class="tg-c3ow">43.21</td>
112
+ </tr>
113
+ <tr>
114
+ <td class="tg-c3ow"><span style="font-weight:bold">Average</span></td>
115
+ <td class="tg-c3ow">-</td>
116
+ <td class="tg-c3ow">61.26</td>
117
+ <td class="tg-c3ow">65.26</td>
118
+ <td class="tg-c3ow">56.84</td>
119
+ <td class="tg-c3ow">53.06</td>
120
+ <td class="tg-c3ow">54.29</td>
121
+ <td class="tg-c3ow">-</td>
122
+ <td class="tg-c3ow">62.62</td>
123
+ </tr>
124
+ <tr>
125
+ <td class="tg-c3ow">ocrbench</td>
126
+ <td class="tg-c3ow">41.40</td>
127
+ <td class="tg-c3ow">74.40</td>
128
+ <td class="tg-c3ow">74.20</td>
129
+ <td class="tg-c3ow">28.90</td>
130
+ <td class="tg-c3ow">34.40</td>
131
+ <td class="tg-c3ow">43.00</td>
132
+ <td class="tg-c3ow">60.00</td>
133
+ <td class="tg-c3ow">67.90</td>
134
+ </tr>
135
+ <tr>
136
+ <td class="tg-c3ow">TextVQA</td>
137
+ <td class="tg-c3ow">57.54</td>
138
+ <td class="tg-c3ow">69.60</td>
139
+ <td class="tg-c3ow">72.96</td>
140
+ <td class="tg-c3ow">47.05</td>
141
+ <td class="tg-c3ow">49.54</td>
142
+ <td class="tg-c3ow">49.54</td>
143
+ <td class="tg-c3ow">74.23</td>
144
+ <td class="tg-c3ow">71.23</td>
145
+ </tr>
146
+ <tr>
147
+ <td class="tg-c3ow">AI2D</td>
148
+ <td class="tg-c3ow">51.13</td>
149
+ <td class="tg-c3ow">62.40</td>
150
+ <td class="tg-c3ow">67.58</td>
151
+ <td class="tg-c3ow">49.58</td>
152
+ <td class="tg-c3ow">43.10</td>
153
+ <td class="tg-c3ow">57.35</td>
154
+ <td class="tg-c3ow">64.40</td>
155
+ <td class="tg-c3ow">66.65</td>
156
+ </tr>
157
+ <tr>
158
+ <td class="tg-c3ow">ChartQA</td>
159
+ <td class="tg-c3ow">47.40</td>
160
+ <td class="tg-c3ow">71.52</td>
161
+ <td class="tg-c3ow">75.76</td>
162
+ <td class="tg-c3ow">12.96</td>
163
+ <td class="tg-c3ow">15.24</td>
164
+ <td class="tg-c3ow">61.24</td>
165
+ <td class="tg-c3ow">59.80</td>
166
+ <td class="tg-c3ow">72.52</td>
167
+ </tr>
168
+ <tr>
169
+ <td class="tg-c3ow">DocVQA</td>
170
+ <td class="tg-c3ow">35.70</td>
171
+ <td class="tg-c3ow">80.94</td>
172
+ <td class="tg-c3ow">82.76</td>
173
+ <td class="tg-c3ow">25.82</td>
174
+ <td class="tg-c3ow">30.38</td>
175
+ <td class="tg-c3ow">71.22</td>
176
+ <td class="tg-c3ow">69.54</td>
177
+ <td class="tg-c3ow">80.30</td>
178
+ </tr>
179
+ <tr>
180
+ <td class="tg-c3ow">InfoVQA</td>
181
+ <td class="tg-c3ow">20.52</td>
182
+ <td class="tg-c3ow">46.30</td>
183
+ <td class="tg-c3ow">53.62</td>
184
+ <td class="tg-c3ow">21.35</td>
185
+ <td class="tg-c3ow">24.46</td>
186
+ <td class="tg-c3ow">41.18</td>
187
+ <td class="tg-c3ow">38.24</td>
188
+ <td class="tg-c3ow">46.40</td>
189
+ </tr>
190
+ <tr>
191
+ <td class="tg-c3ow">OCR Average</td>
192
+ <td class="tg-c3ow">42.28</td>
193
+ <td class="tg-c3ow">67.53</td>
194
+ <td class="tg-c3ow">71.15</td>
195
+ <td class="tg-c3ow">30.94</td>
196
+ <td class="tg-c3ow">32.85</td>
197
+ <td class="tg-c3ow">53.92</td>
198
+ <td class="tg-c3ow">61.04</td>
199
+ <td class="tg-c3ow">67.50</td>
200
+ </tr>
201
+ </tbody></table>
202
+
203
+ ### Quick Start
204
+ > [!NOTE]
205
+ > Follow below packages list for setting up the inference environment.
206
+ > ```bash
207
+ > pip==25.0
208
+ > wheel==0.45.1
209
+ > setuptools==75.8.0
210
+ > torch==2.6.0
211
+ > torchvision==0.21.0
212
+ > transformers==4.49.0
213
+ > einops==0.8.0
214
+ > ```
215
+
216
+ ```python
217
+ import torch
218
+ from transformers import AutoTokenizer, AutoProcessor, AutoConfig, AutoModelForCausalLM
219
+
220
+ from PIL import Image
221
+ import requests
222
+ from io import BytesIO
223
+
224
+ def load_image(image_file):
225
+ if image_file.startswith("http") or image_file.startswith("https"):
226
+ response = requests.get(image_file)
227
+ image = Image.open(BytesIO(response.content)).convert("RGB")
228
+ else:
229
+ image = Image.open(image_file).convert("RGB")
230
+ return image
231
+
232
+
233
+ config = AutoConfig.from_pretrained("AIG-GenAI/Instella-VL-1B", trust_remote_code=True)
234
+ tokenizer = AutoTokenizer.from_pretrained("AIG-GenAI/Instella-VL-1B", config=config, trust_remote_code=True)
235
+ processor = AutoProcessor.from_pretrained("AIG-GenAI/Instella-VL-1B", trust_remote_code=True)
236
+ model = AutoModelForCausalLM.from_pretrained("AIG-GenAI/Instella-VL-1B", trust_remote_code=True).to('cuda') # or 'cpu'
237
+ model.eval()
238
+
239
+ # For single image and text
240
+ query="Describe the image."
241
+ image=load_image("path/to/your_image") # can be a https:// url
242
+ out = processor.encode(query, image, model.get_vision_tower().image_processor, tokenizer, config)
243
+ inputs = {k: v.to(model.device) for k, v in out.items() if isinstance(v, torch.Tensor)}
244
+ with torch.inference_mode():
245
+ output_ids = model.generate(inputs["input_ids"], images=inputs['image_tensor'], image_sizes=out['image_sizes'], do_sample=True, num_beams=1, temperature=0.2, max_new_tokens=1024, use_cache=True, stopping_criteria=out['stopping_criteria'], eos_token_id=out['eos_token_id'])
246
+ outputs = processor.decode(output_ids)
247
+ print("InstellaVL: ", outputs)
248
+
249
+ # For batch of images and text.
250
+ query=["Describe the image.", "What is the color of the dog?"]
251
+ image=[load_image("../assets/images/instellavl.png"), load_image("../assets/images/example2_dog.jpg")]
252
+ outs = processor.batch_encode(query, image, model.get_vision_tower().image_processor, tokenizer, config)
253
+
254
+ for idx, o in enumerate(outs):
255
+ ins = {k: v.to(model.device) for k, v in o.items() if isinstance(v, torch.Tensor)}
256
+ with torch.inference_mode():
257
+ output_ids = model.generate(ins["input_ids"],
258
+ images=ins['image_tensor'],
259
+ image_sizes=o['image_sizes'],
260
+ do_sample=True,
261
+ num_beams=1,
262
+ temperature=0.2,
263
+ max_new_tokens=1024,
264
+ use_cache=True,
265
+ stopping_criteria=o['stopping_criteria'],
266
+ eos_token_id=o['eos_token_id'])
267
+ outputs = processor.decode(output_ids)
268
+ print("Query: ", query[idx])
269
+ print("InstellaVL: ", outputs)
270
+ ```
271
+
272
+ <details>
273
+ <summary><b>TL;DR</b>: Loading from locally saved checkpoint</summary>
274
+ <p><strong>Note:</strong> Do <code>pip install -e . --no-deps</code> to register/include for InstellaVL repo as <code>instellavl</code> package into Python package list.</p>
275
+
276
+ ``` python
277
+ import torch
278
+
279
+ # Import essential modules
280
+ from instellavl.constants import DEFAULT_IMAGE_TOKEN, IMAGE_TOKEN_INDEX
281
+ from instellavl.conversation import conv_templates, SeparatorStyle
282
+ from instellavl.model.builder import load_pretrained_model
283
+ from instellavl.utils import disable_torch_init
284
+ from instellavl.mm_utils import process_images, tokenizer_image_token, get_model_name_from_path
285
+
286
+ from PIL import Image
287
+
288
+ import requests
289
+ from io import BytesIO
290
+
291
+ # Login into HF Hub
292
+ from huggingface_hub import login
293
+ login(token = "<Your HFtoken id>") # Enter your token
294
+
295
+ def load_image(image_file):
296
+ if image_file.startswith("http") or image_file.startswith("https"):
297
+ response = requests.get(image_file)
298
+ image = Image.open(BytesIO(response.content)).convert("RGB")
299
+ else:
300
+ image = Image.open(image_file).convert("RGB")
301
+ return image
302
+
303
+ #
304
+ # ========= CHANGE IMAGE and Query only HERE ============
305
+ image_file = '/path/to/Instella-VL-repo/assets/images/example2_dog.jpg' # Enter the test image path
306
+ query = 'Describe this image.'
307
+ # =======================================================
308
+
309
+ disable_torch_init()
310
+ conv_mode = 'instella'
311
+
312
+ # Model loading
313
+ model_path = '<path/to/model-checkpoint-saved-locally>' # Enter your model path, should contain instellavl substring in the name.
314
+ model_name = get_model_name_from_path(model_path)
315
+ tokenizer, model, image_processor, context_len = load_pretrained_model(model_path, None, model_name, False, False)
316
+ model.eval()
317
+ model = model.to('cuda') # change to 'cpu' if not 'cuda'
318
+
319
+ # Image pre-processing
320
+ image = load_image(image_file)
321
+ image_tensor = process_images([image], image_processor, model.config)
322
+ image_tensor = image_processor.preprocess(image, return_tensors="pt")["pixel_values"].to(model.dtype)
323
+
324
+ # Text pre-processing - follow the below logic too when there is no Image:
325
+ # if images is not None and len(image_tensor) != 0 and DEFAULT_IMAGE_TOKEN not in text:
326
+ # question = DEFAULT_IMAGE_TOKEN + "\n" + text
327
+ # else:
328
+ # question = text
329
+ query = query.replace(DEFAULT_IMAGE_TOKEN, "").strip()
330
+ question = DEFAULT_IMAGE_TOKEN + "\n" + query
331
+ conv = conv_templates[conv_mode].copy()
332
+ conv.append_message(conv.roles[0], question)
333
+ conv.append_message(conv.roles[1], None)
334
+ prompt_question = conv.get_prompt()
335
+
336
+ # Final arrangements required
337
+ input_ids = tokenizer_image_token(prompt_question, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt").unsqueeze(0)
338
+ keywords = [conv.sep]
339
+ image_sizes = [image.size]
340
+ stopping_criteria = [KeywordsStoppingCriteria(keywords, tokenizer, input_ids)]
341
+ terminators = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("|||IP_ADDRESS|||")]
342
+
343
+ with torch.inference_mode():
344
+ output_ids = model.generate(input_ids.to(model.device), images=image_tensor.to(model.device), image_sizes=image_sizes, do_sample=True, num_beams=1, temperature=0.2, max_new_tokens=1024, use_cache=True, stopping_criteria=stopping_criteria, eos_token_id=terminators)
345
+
346
+ outputs = tokenizer.decode(output_ids[0, input_ids.shape[1] :]).strip()
347
+ print("InstellaVL: ", outputs)
348
+ ```
349
+ </details>
350
+
351
+ ## Model Architecture
352
+
353
+ | Parts | Parameter size | Number of layers | Number of heads | Hidden size | Patch Size |
354
+ | ------------- |:-------------:|:-----:|:-----:|:-----:|:-----:|
355
+ | Vision Encoder | 300M | 24| 16 | 1024 | 14 |
356
+ | MLP | 6.3M | 2 | - | 2048 | - |
357
+ | LM | 1.2B | 16 | 16 | 2048 | - |
358
+
359
+ We initialize the vision encoder from [CLIP-ViT-L/14@336](https://huggingface.co/openai/clip-vit-large-patch14-336) and initialize LM from [AMD OLMo 1B SFT](https://huggingface.co/AIG-GenAI/AMD-OLMo-1B-SFT)
360
+
361
+ ## Training Stages
362
+
363
+ | Stages | MLP Warmup | Pretraining | Instruction Tuning |
364
+ | ------------- |:-------------:|:-----:|:-----:|
365
+ | Tunable Parts | Adapter | Entire Model | Entire Model |
366
+
367
+ ## Hardware
368
+ Training was conducted with up to 4 nodes, totaling 32 GPUs. Each node comprises [8 AMD Instinct™ MI300X GPUs](https://www.amd.com/en/products/accelerators/instinct/mi300/mi300x.html)
369
+
370
+ **MLP warmup**: 1 node
371
+ **Pretraining**: 2 nodes
372
+ **Finetune**: 4 nodes
373
+
374
+ ## Datasets
375
+
376
+ ### MLP Warmup
377
+ [BLIP558K](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain)
378
+
379
+ <h3 align="center">Pretraining Stage</h3>
380
+
381
+ | **Domain** | **Datasets** | **Num of Examples** | **Licenses** |
382
+ |---|:---:|---:|:---|
383
+ | Image Captions | [BLIP150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain), [COCO118K](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain), [CC3M-Recap](https://huggingface.co/datasets/lmms-lab/LLaVA-ReCap-CC3M), [Pixmo_Cap](https://huggingface.co/datasets/allenai/pixmo-cap) | 3.52M | BSD 3-Clause for BLIP150K, COCO118K; Apache 2 for CC3M-Recap; ODC-BY-1.0 for Pixmo_Cap; see source materials for CC3M-Recap |
384
+ | OCR | [SynthDog_EN](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Mid-Data), [SynthDog_ZH](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Mid-Data), [UReader](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Mid-Data), [ART](https://rrc.cvc.uab.es/?ch=14&com=downloads), [COCO-Text](https://bgshih.github.io/cocotext/), [HierText](https://github.com/google-research-datasets/hiertext), [Uber-Text](https://s3-us-west-2.amazonaws.com/uber-common-public/ubertext/index.html), [TextOCR](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), [OpenVINO](https://github.com/openvinotoolkit/cvat), [MLT-17](https://rrc.cvc.uab.es/?ch=8&com=downloads) | 913K | Apache 2 for SynthDog_EN, SynthDog_ZH, UReader, TextOCR, OpenVINO; CC By 4.0 for COCO-Text; CC BY-SA 4.0 for HierText, Uber-Text; See source materials for ART, MLT-17 |
385
+ | Doc | [DocVQA](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), [DocStruct4M](https://huggingface.co/datasets/mPLUG/DocStruct4M) | 410K | Apache 2 |
386
+ | Table & Chart & Plot | [Chart2Text](https://github.com/vis-nlp/Chart-to-text/tree/main/pew_dataset/dataset/imgs), [UniChart](https://huggingface.co/datasets/ahmed-masry/unichart-pretrain-data), [PlotQA](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), [WidgetCaption](https://huggingface.co/datasets/rootsautomation/RICO-WidgetCaptioning?row=0), [Screen2Words](https://huggingface.co/datasets/rootsautomation/RICO-Screen2Words), [SciGraphQA-295K](https://huggingface.co/datasets/alexshengzhili/SciGraphQA-295K-train), [Paper2Fig100K](https://zenodo.org/records/7299423#.Y2lzonbMKUl), [MMC Instruction](https://huggingface.co/datasets/xywang1/MMC/viewer/MMC-Instruction), [M-Paper](https://huggingface.co/datasets/mPLUG/M-Paper) | 1.97M | GPL-3.0 for Chart2Text; MIT for UniChart, SciGraphQA-295K; Apache 2 for PlotQA, M-Paper; CC By 4.0 for WidgetCaption, Screen2Words, Paper2Fig100K; CC BY-SA 4.0 for MMC Instruction |
387
+ | Text Only | [Evol-Instruct-GPT-4](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Mid-Data/tree/main/evol_instruct) | 70K | Apache 2 |
388
+
389
+ <h3 align="center">Instruction-tuning Stage</h3>
390
+
391
+ | **Domain** | **Datasets** | **Num of Examples** | **Licenses** |
392
+ |---|:---:|---:|:---|
393
+ | General | [AOKVQA, CLEVR, Hateful Memes, Image Textualization, OKVQA, ScienceQA, ShareGPT-4V, TallyQA, Visual7W, VizWiz, VQAv2, WebSight, ALLaVA Instruct, Cambrian, COCO Caption, IconQA, LLaVA-158K, LLaVAR, RefCOCO, ShareGPT-4O, Vision FLAN, VisText, VQARAD, VSR, InterGPS](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), [Image-Paragraph-Captioning, ImageNet, COCO-GOI, COCO-ITM, Visual Dialog, SNLI-VE](https://huggingface.co/datasets/MMInstruction/M3IT), [Web-Landmark, Web-Celebrity, SAM, LAION-GPT-4V-Dataset, OODVQA]( https://huggingface.co/datasets/nyu-visionx/Cambrian-10M/tree/main), [Pixmo_Cap](https://huggingface.co/datasets/allenai/pixmo-cap), [Pixmo_Count](https://huggingface.co/datasets/allenai/pixmo-count), [Pixmo_Points](https://huggingface.co/datasets/allenai/pixmo-points), [Pixmo_Ask_Model_Anything](https://huggingface.co/datasets/allenai/pixmo-ask-model-anything), [SVIT_Core_150K](https://huggingface.co/datasets/BAAI/SVIT), [Localized Narratives](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron) | 2.66M | see source materials for Image-Paragraph-Captioning, ImageNet, COCO-GOI, COCO-ITM, Visual Dialog, SNLI-VE; ODC-BY-1.0 for Pixmo_Cap, Pixmo_Count, Pixmo_Points, Pixmo_Ask_Model_Anything; CC By 4.0 for SVIT_Core_150K, Localized Narratives; Apache 2 for rest of the datasets; |
394
+ | Table & Chart & Screen | [AI2D, ChartQA, DocVQA, FigureQA, InfographicVQA, RoBUT-SQA, RoBUT-WTQ, TQA, UReader IE, UReader QA, Chart2Text, , Diagram Image2Text, DVQA, HiTab, LRV Chart, RoBUT WikiSQL, Screen2Words, UReader Caption, UReader KG, VisualMRC](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), [TinyChartData](https://huggingface.co/datasets/mPLUG/TinyChartData) | 866K | Apache 2 |
395
+ | Doc | [ArxivQA](https://huggingface.co/datasets/MMInstruction/ArxivQA), [DocDownstream-1.0](https://huggingface.co/datasets/mPLUG/DocDownstream-1.0), [DocReason25K](https://huggingface.co/datasets/mPLUG/DocReason25K), [DocStruct4M](https://huggingface.co/datasets/mPLUG/DocStruct4M), [Pixmo_Docs](https://huggingface.co/datasets/allenai/pixmo-docs) | 522K | CC BY-SA 4.0 for ArxivQA; Apache 2 for DocDownstream-1.0, DocReason25K, DocStruct4M; ODC-BY-1.0 for Pixmo_Docs |
396
+ | General OCR | [ChromeWriting, IIIT5K, K12 Printing, Rendered Text, TextCaps, HME100K, IAM, TextOCR-GPT-4V](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), [SynthDog-EN](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Mid-Data) | 84K | Apache 2 |
397
+ | Math & Reasoning | [MAVIS Manual Collection, CLEVR-Math, Geo170K QA, GEOS, GeoMVerse, MapQA, Super-CLEVR, UniGeo, LRV Normal, Visual Genome, MAVIS Data Engine, Geo170K Align, Geometry3K, GeoQA+, TabMWP, GQA, RAVEN, MathVision, KVQA, VCR](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), [FinQA](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron), [Design2Code, IDK](https://huggingface.co/datasets/nyu-visionx/Cambrian-10M/) | 460K | CC By 4.0 for FinQA; Apache 2 for rest of the datasets |
398
+ | Others | [IQA, MOCHEG, Shapes](https://huggingface.co/datasets/MMInstruction/M3IT), [ALFWorld, Q-Instruct-DB](https://huggingface.co/datasets/nyu-visionx/Cambrian-10M/) | 479K | see source materials for IQA, MOCHEG, Shapes; Apache 2 for ALFWorld, Q-Instruct-DB |
399
+ | Text Only | [MathQA, Magpie Pro (L3 MT), Magpie Pro (Qwen2 ST), Magpie Pro (L3 ST)](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data) | 480K | Apache 2 |
400
+
401
+ > [!NOTE]
402
+ > Further, to strengthen model’s understanding of science-based and general reasoning questions, as identified through error analysis, we oversampled (almost doubled the volume) specific datasets from the SFT dataset pool as detailed below.
403
+ >
404
+ > Oversampled (~2x sampling rate): ScienceQA, AI2D, PMC-VQA, Cambrian, and TQA
405
+ >
406
+ > Further information concerning the training datasets, including applicable licensing terms and use restrictions, may be located at the linked source location.
407
+
408
+
409
+ For the details of training hyperparameters, please check [our github repo](https://github.com/AMD-AIG-AIMA/Instella-VL)
410
+
411
+ ## Contributors
412
+ **Core contributors:** [Ximeng Sun](https://sunxm2357.github.io/), [Aditya Kumar Singh](https://rodosingh.github.io), [Gowtham Ramesh](https://www.linkedin.com/in/gowtham1/), [Zicheng Liu](https://zicliu.wixsite.com/mysite)
413
+
414
+ **Contributors:** [Pratik Prabhanjan Brahma](https://www.linkedin.com/in/pratik-p-brahma/), [Ze Wang](https://www.linkedin.com/in/ze-wang-1379601a5/), [Jiang Liu](https://joellliu.github.io/), [Jialian Wu](https://jialianwu.com/), [Prakamya Mishra](https://prakamya-mishra.github.io/), [Xiaodong Yu](https://www.xiaodongyu.me/), [Yusheng Su](https://yushengsu-thu.github.io/), [Sudhanshu Ranjan](https://www.linkedin.com/in/sudhanshu-ranjan-33a216124), [Emad Barsoum](https://www.linkedin.com/in/ebarsoum/)
415
+
416
+
417
+ ## Bias, Risks, and Limitations
418
+ This model is made accessible without any safety guarantees. Users should be aware that the model may generate outputs that are sensitive, inaccurate, harmful, biased, or otherwise objectionable based on user prompts. It is crucial for users to conduct comprehensive safety evaluations, implement safety filtering, and verify the model's outputs to mitigate these risks.
419
+
420
+ ## License
421
+ See Files for license and any notices.
422
+
423
+ ## Citing
424
+
425
+ ```bibtex
426
+ @misc{Instella-VL-1B,
427
+ title = {Instella-VL-1B-1.0: AMD’s first Vision language model},
428
+ url = {https://huggingface.co/AIG-GenAI/Instella-VL-1B},
429
+ author = {Ximeng Sun, Aditya Singh, Gowtham Ramesh, Jiang Liu, Ze Wang, Sudhanshu Ranjan, Pratik Prabhanjan Brahma, Prakamya Mishra, Jialian Wu, Xiaodong Yu, Yusheng Su, Emad Barsoum, Zicheng Liu},
430
+ month = {February},
431
+ year = {2025}
432
+ }
433
+ ```
chat_template.json ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ {"chat_template": "|||IP_ADDRESS|||\n{% for message in messages -%}{{ message['role'] + message['content']}}{%- if not loop.last -%}{{ '\\n' if loop.index % 2 == 1 else '|||IP_ADDRESS|||\\n'}}{%- endif %}{%- endfor -%}"
2
+ }
config.json ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/home/goramesh/local/gramesh/Instella-VL-1B/",
3
+ "architectures": [
4
+ "InstellaVLForCausalLM"
5
+ ],
6
+ "auto_map": {
7
+ "AutoConfig": "modeling_instellavl.InstellaVLConfig",
8
+ "AutoModelForCausalLM": "modeling_instellavl.InstellaVLForCausalLM"
9
+ },
10
+ "attention_bias": false,
11
+ "attention_dropout": 0.0,
12
+ "clip_qkv": null,
13
+ "eos_token_id": 50279,
14
+ "hidden_act": "silu",
15
+ "hidden_size": 2048,
16
+ "image_aspect_ratio": "anyres",
17
+ "image_crop_resolution": null,
18
+ "image_grid_pinpoints": [
19
+ [
20
+ 336,
21
+ 336
22
+ ],
23
+ [
24
+ 336,
25
+ 672
26
+ ],
27
+ [
28
+ 336,
29
+ 1008
30
+ ],
31
+ [
32
+ 336,
33
+ 1344
34
+ ],
35
+ [
36
+ 336,
37
+ 1680
38
+ ],
39
+ [
40
+ 672,
41
+ 336
42
+ ],
43
+ [
44
+ 672,
45
+ 672
46
+ ],
47
+ [
48
+ 1008,
49
+ 336
50
+ ],
51
+ [
52
+ 1344,
53
+ 336
54
+ ],
55
+ [
56
+ 1680,
57
+ 336
58
+ ]
59
+ ],
60
+ "image_split_resolution": null,
61
+ "initializer_range": 0.02,
62
+ "intermediate_size": 8192,
63
+ "max_position_embeddings": 2048,
64
+ "mm_anyres_choose_method": "best_fit",
65
+ "mm_compact_visual_tokens": false,
66
+ "mm_downsample_ratio": 1,
67
+ "mm_hidden_size": 1024,
68
+ "mm_newline_position": "one_token",
69
+ "mm_patch_merge_type": "spatial_unpad",
70
+ "mm_projector_lr": null,
71
+ "mm_projector_type": "mlp2x_gelu",
72
+ "mm_resampler_type": null,
73
+ "mm_spatial_pool_mode": "bilinear",
74
+ "mm_tunable_parts": "mm_vision_tower,mm_mlp_adapter,mm_language_model",
75
+ "mm_use_im_patch_token": false,
76
+ "mm_use_im_start_end": false,
77
+ "mm_vision_select_feature": "patch",
78
+ "mm_vision_select_layer": -2,
79
+ "mm_vision_tower": "openai/clip-vit-large-patch14-336",
80
+ "mm_vision_tower_lr": null,
81
+ "model_type": "instellavl",
82
+ "num_attention_heads": 16,
83
+ "num_hidden_layers": 16,
84
+ "num_key_value_heads": 16,
85
+ "online_training": true,
86
+ "pad_token_id": 1,
87
+ "pos_skipping_range": 4096,
88
+ "rope_scaling": null,
89
+ "rope_theta": 10000.0,
90
+ "tie_word_embeddings": true,
91
+ "tokenizer_model_max_length": 32768,
92
+ "tokenizer_padding_side": "right",
93
+ "torch_dtype": "float16",
94
+ "transformers_version": "4.45.1",
95
+ "use_cache": true,
96
+ "use_mm_proj": true,
97
+ "use_pos_skipping": false,
98
+ "vision_tower_pretrained": null,
99
+ "vocab_size": 50282
100
+ }
conversation.py ADDED
@@ -0,0 +1,334 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Modification Copyright© 2025 Advanced Micro Devices, Inc. All rights reserved.
2
+
3
+ import re
4
+ import base64
5
+ import dataclasses
6
+
7
+ from PIL import Image
8
+ from io import BytesIO
9
+ from enum import auto, Enum
10
+ from typing import List, Any, Dict, Union, Tuple
11
+
12
+ from transformers import AutoTokenizer
13
+
14
+
15
+ class SeparatorStyle(Enum):
16
+ """Different separator style."""
17
+
18
+ SINGLE = auto()
19
+ MPT = auto()
20
+ INSTELLA = auto()
21
+
22
+
23
+ @dataclasses.dataclass
24
+ class Conversation:
25
+ r"""A class that keeps all conversation history."""
26
+
27
+ system: str
28
+ roles: List[str]
29
+ messages: List[List[str]]
30
+ offset: int
31
+ sep_style: SeparatorStyle = SeparatorStyle.SINGLE
32
+ sep: str = "###"
33
+ sep2: str = None
34
+ version: str = "Unknown"
35
+
36
+ tokenizer_id: str = ""
37
+ tokenizer: Any = None
38
+ # Stop criteria (the default one is EOS token)
39
+ stop_str: Union[str, List[str]] = None
40
+ # Stops generation if meeting any token in this list
41
+ stop_token_ids: List[int] = None
42
+
43
+ skip_next: bool = False
44
+
45
+ def get_prompt(self):
46
+ """
47
+ Generates a formatted prompt string based on the messages and separator style.
48
+ The function processes the messages stored in the instance, applies specific formatting rules
49
+ based on the separator style, and returns the resulting prompt string.
50
+
51
+ Returns:
52
+ `str`: The formatted prompt string.
53
+
54
+ Raises:
55
+ `ValueError`: If an invalid separator style is specified.
56
+ """
57
+
58
+ messages = self.messages
59
+ if len(messages) > 0 and type(messages[0][1]) is tuple:
60
+ messages = self.messages.copy()
61
+ init_role, init_msg = messages[0].copy()
62
+ init_msg = init_msg[0]
63
+ if "mmtag" in self.version:
64
+ init_msg = init_msg.replace("<image>", "").strip()
65
+ messages[0] = (init_role, init_msg)
66
+ messages.insert(0, (self.roles[0], "<Image><image></Image>"))
67
+ messages.insert(1, (self.roles[1], "Received."))
68
+ elif not init_msg.startswith("<image>"):
69
+ init_msg = init_msg.replace("<image>", "").strip()
70
+ messages[0] = (init_role, "<image>\n" + init_msg)
71
+ else:
72
+ messages[0] = (init_role, init_msg)
73
+
74
+ if self.sep_style == SeparatorStyle.SINGLE:
75
+ ret = self.system + self.sep
76
+ for role, message in messages:
77
+ if message:
78
+ if type(message) is tuple:
79
+ message, _, _ = message
80
+ ret += role + ": " + message + self.sep
81
+ else:
82
+ ret += role + ":"
83
+
84
+ elif self.sep_style == SeparatorStyle.MPT:
85
+ ret = self.system + self.sep
86
+ for role, message in messages:
87
+ if message:
88
+ if type(message) is tuple:
89
+ message, _, _ = message
90
+ ret += role + message + self.sep
91
+ else:
92
+ ret += role
93
+
94
+ elif self.sep_style == SeparatorStyle.INSTELLA:
95
+ seps = [self.sep, self.sep2]
96
+ ret = "|||IP_ADDRESS|||"
97
+ for i, (role, message) in enumerate(messages):
98
+ if message:
99
+ if type(message) is tuple:
100
+ message, _, _ = message
101
+ if i % 2 == 1:
102
+ message = message.strip()
103
+ ret += role + message + seps[i % 2]
104
+ else:
105
+ ret += role
106
+ else:
107
+ raise ValueError(f"Invalid style: {self.sep_style}")
108
+
109
+ return ret
110
+
111
+ def append_message(self, role, message):
112
+ self.messages.append([role, message])
113
+
114
+ def process_image(self, image: Union[str, Image.Image], image_process_mode: str, return_pil: bool=False, image_format: str="PNG")->Union[str, Image.Image]:
115
+ r"""
116
+ Processes an image according to the specified mode and returns either a PIL image or a base64 encoded string.
117
+
118
+ Args:
119
+ - image (Union[str, Image.Image]): The image to be processed. Can be a file path or a PIL Image object.
120
+ - image_process_mode (str): The mode of image processing. Options are "Pad", "Default", "Crop", or "Resize".
121
+ - return_pil (bool, optional): If True, returns a PIL Image object. If False, returns a base64 encoded string. Defaults to False.
122
+ - image_format (str, optional): The format to save the image in if returning a base64 encoded string. Defaults to "PNG".
123
+
124
+ Returns:
125
+ Union[str, Image.Image]: The processed image, either as a PIL Image object or a base64 encoded string.
126
+
127
+ Raises:
128
+ ValueError: If an invalid image_process_mode is provided.
129
+ """
130
+
131
+ if image_process_mode == "Pad":
132
+
133
+ def expand2square(pil_img, background_color=(122, 116, 104)):
134
+ width, height = pil_img.size
135
+ if width == height:
136
+ return pil_img
137
+ elif width > height:
138
+ result = Image.new(pil_img.mode, (width, width), background_color)
139
+ result.paste(pil_img, (0, (width - height) // 2))
140
+ return result
141
+ else:
142
+ result = Image.new(pil_img.mode, (height, height), background_color)
143
+ result.paste(pil_img, ((height - width) // 2, 0))
144
+ return result
145
+
146
+ image = expand2square(image)
147
+ elif image_process_mode in ["Default", "Crop"]:
148
+ pass
149
+ elif image_process_mode == "Resize":
150
+ image = image.resize((336, 336))
151
+ else:
152
+ raise ValueError(f"Invalid image_process_mode: {image_process_mode}")
153
+
154
+ if type(image) is not Image.Image:
155
+ image = Image.open(image).convert("RGB")
156
+
157
+ max_hw, min_hw = max(image.size), min(image.size)
158
+ aspect_ratio = max_hw / min_hw
159
+ max_len, min_len = 672, 448
160
+ shortest_edge = int(min(max_len / aspect_ratio, min_len, min_hw))
161
+ longest_edge = int(shortest_edge * aspect_ratio)
162
+ W, H = image.size
163
+ if H > W:
164
+ H, W = longest_edge, shortest_edge
165
+ else:
166
+ H, W = shortest_edge, longest_edge
167
+ image = image.resize((W, H))
168
+ if return_pil:
169
+ return image
170
+ else:
171
+ buffered = BytesIO()
172
+ image.save(buffered, format=image_format)
173
+ img_b64_str = base64.b64encode(buffered.getvalue()).decode()
174
+ return img_b64_str
175
+
176
+ def get_images(self, return_pil: bool=False, return_path: bool=False) -> List[Union[str, Image.Image]]:
177
+ """
178
+ Retrieve images from the conversation messages.
179
+
180
+ Args:
181
+ return_pil (bool): If True, return images as PIL objects. Defaults to False.
182
+ return_path (bool): If True, return the image file paths instead of processing them. Defaults to False.
183
+
184
+ Returns:
185
+ list: A list of images or image paths depending on the arguments.
186
+ """
187
+ images = []
188
+ for i, (role, msg) in enumerate(self.messages[self.offset :]):
189
+ if i % 2 == 0:
190
+ if type(msg) is tuple:
191
+ msg, image, image_process_mode = msg
192
+ if type(image) != list:
193
+ image = [image]
194
+ for img in image:
195
+ if not return_path and self.is_image_file(img):
196
+ img = self.process_image(img, image_process_mode, return_pil=return_pil)
197
+ else:
198
+ images.append(img)
199
+ return images
200
+
201
+ def is_image_file(self, filename: str)->bool:
202
+ image_extensions = [".png", ".jpg", ".jpeg", ".gif", ".bmp", ".tiff", ".webp"]
203
+ return any(filename.lower().endswith(ext) for ext in image_extensions)
204
+
205
+ def is_video_file(self, filename: str)->bool:
206
+ video_extensions = [".mp4", ".mov", ".avi", ".mkv", ".wmv", ".flv", ".mpeg", ".mpg"]
207
+ return any(filename.lower().endswith(ext) for ext in video_extensions)
208
+
209
+ def to_gradio_chatbot(self)->list:
210
+ ret = []
211
+ for i, (role, msg) in enumerate(self.messages[self.offset :]):
212
+ if i % 2 == 0:
213
+ if type(msg) is tuple:
214
+ msg, image, image_process_mode = msg
215
+ if type(image) != list:
216
+ image = [image]
217
+ if len(image) == 1:
218
+ msg = "<image>\n" + msg.replace("<image>", "").strip()
219
+ else:
220
+ msg = re.sub(r"(<image>)\n(?=<image>)", r"\1 ", msg)
221
+
222
+ img_str_list = []
223
+ for img in image:
224
+ if self.is_image_file(img):
225
+ img_b64_str = self.process_image(img, "Default", return_pil=False, image_format="JPEG")
226
+ img_str = f'<img src="data:image/jpeg;base64,{img_b64_str}" style="max-width: 256px; max-height: 256px; width: auto; height: auto; object-fit: contain;"/>'
227
+ img_str_list.append(img_str)
228
+ elif self.is_video_file(img):
229
+ ret.append(((img,), None))
230
+
231
+ msg = msg.strip()
232
+ img_place_holder = ""
233
+ for img_str in img_str_list:
234
+ img_place_holder += f"{img_str}\n\n"
235
+
236
+ if len(img_str_list) > 0:
237
+ msg = f"{img_place_holder}\n\n{msg}"
238
+
239
+ if len(msg) > 0:
240
+ ret.append([msg, None])
241
+ else:
242
+ ret.append([msg, None])
243
+ else:
244
+ ret[-1][-1] = msg
245
+ return ret
246
+
247
+ def copy(self)->"Conversation":
248
+ return Conversation(system=self.system, roles=self.roles, messages=[[x, y] for x, y in self.messages], offset=self.offset, sep_style=self.sep_style, sep=self.sep, sep2=self.sep2, version=self.version)
249
+
250
+ def dict(self)->Dict[str, Any]:
251
+ if len(self.get_images()) > 0:
252
+ return {
253
+ "system": self.system,
254
+ "roles": self.roles,
255
+ "messages": [[x, y[0] if type(y) is tuple else y] for x, y in self.messages],
256
+ "offset": self.offset,
257
+ "sep": self.sep,
258
+ "sep2": self.sep2,
259
+ }
260
+ return {
261
+ "system": self.system,
262
+ "roles": self.roles,
263
+ "messages": self.messages,
264
+ "offset": self.offset,
265
+ "sep": self.sep,
266
+ "sep2": self.sep2,
267
+ }
268
+
269
+
270
+ conv_vicuna_v0 = Conversation(
271
+ system="A chat between a curious human and an artificial intelligence assistant. " "The assistant gives helpful, detailed, and polite answers to the human's questions.",
272
+ roles=("Human", "Assistant"),
273
+ messages=[
274
+ ["Human", "What are the key differences between renewable and non-renewable energy sources?"],
275
+ [
276
+ "Assistant",
277
+ "Renewable energy sources are those that can be replenished naturally in a relatively "
278
+ "short amount of time, such as solar, wind, hydro, geothermal, and biomass. "
279
+ "Non-renewable energy sources, on the other hand, are finite and will eventually be "
280
+ "depleted, such as coal, oil, and natural gas. Here are some key differences between "
281
+ "renewable and non-renewable energy sources:\n"
282
+ "1. Availability: Renewable energy sources are virtually inexhaustible, while non-renewable "
283
+ "energy sources are finite and will eventually run out.\n"
284
+ "2. Environmental impact: Renewable energy sources have a much lower environmental impact "
285
+ "than non-renewable sources, which can lead to air and water pollution, greenhouse gas emissions, "
286
+ "and other negative effects.\n"
287
+ "3. Cost: Renewable energy sources can be more expensive to initially set up, but they typically "
288
+ "have lower operational costs than non-renewable sources.\n"
289
+ "4. Reliability: Renewable energy sources are often more reliable and can be used in more remote "
290
+ "locations than non-renewable sources.\n"
291
+ "5. Flexibility: Renewable energy sources are often more flexible and can be adapted to different "
292
+ "situations and needs, while non-renewable sources are more rigid and inflexible.\n"
293
+ "6. Sustainability: Renewable energy sources are more sustainable over the long term, while "
294
+ "non-renewable sources are not, and their depletion can lead to economic and social instability.\n",
295
+ ],
296
+ ],
297
+ offset=2,
298
+ sep_style=SeparatorStyle.SINGLE,
299
+ sep="###",
300
+ )
301
+
302
+ conv_mpt = Conversation(
303
+ system="""<|im_start|>system
304
+ A conversation between a user and an LLM-based AI assistant. The assistant gives helpful and honest answers.""",
305
+ roles=("<|im_start|>user\n", "<|im_start|>assistant\n"),
306
+ version="mpt",
307
+ messages=[],
308
+ offset=0,
309
+ sep_style=SeparatorStyle.MPT,
310
+ sep="<|im_end|>",
311
+ )
312
+
313
+ conv_instella = Conversation(
314
+ system="",
315
+ roles=("<|user|>\n", "<|assistant|>\n"),
316
+ version="instella",
317
+ messages=(),
318
+ offset=0,
319
+ sep_style=SeparatorStyle.INSTELLA,
320
+ sep="\n",
321
+ sep2='|||IP_ADDRESS|||\n'
322
+ )
323
+
324
+
325
+ default_conversation = conv_instella
326
+ conv_templates = {
327
+ "default": conv_instella,
328
+ "mpt": conv_mpt,
329
+ "instella": conv_instella,
330
+ }
331
+
332
+
333
+ if __name__ == "__main__":
334
+ print(default_conversation.get_prompt())
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "eos_token_id": 50279,
4
+ "pad_token_id": 1,
5
+ "transformers_version": "4.45.1"
6
+ }
image_processing_instellavl.py ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List
2
+ from PIL.Image import Image
3
+ from transformers import CLIPImageProcessor
4
+ from transformers.image_processing_utils import BaseImageProcessor
5
+ from .mm_utils import process_images
6
+
7
+ # TODO can inherit from CLIPImageProcessor instead and use the process function directly.
8
+ class InstellaVLImageProcessor(BaseImageProcessor):
9
+ r"""
10
+ Pre-process images
11
+ """
12
+ def __init__(self, **kwargs):
13
+ super().__init__(**kwargs)
14
+
15
+ def process(self,
16
+ images: List[Image],
17
+ processor: CLIPImageProcessor,
18
+ model_cfg: dict
19
+ ):
20
+ image_tensors = process_images(images, processor, model_cfg)
21
+ if images is None:
22
+ return {
23
+ "pixel_values": None,
24
+ }
25
+ else:
26
+ return{
27
+ "pixel_values": image_tensors,
28
+ }
29
+
30
+ InstellaVLImageProcessor.register_for_auto_class()
mm_utils.py ADDED
@@ -0,0 +1,519 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Modification Copyright© 2025 Advanced Micro Devices, Inc. All rights reserved.
2
+
3
+ r"""This module provides various utility functions for processing images, including resizing, cropping, padding,
4
+ and extracting patches. It also includes functions for processing images with different resolutions and
5
+ tokenizing image prompts."""
6
+
7
+ import re
8
+ import ast
9
+ import math
10
+ import torch
11
+ import base64
12
+ import torch.distributed as dist
13
+
14
+ from PIL import Image
15
+ from io import BytesIO
16
+ from typing import List, Tuple, Union, Any
17
+ from transformers import StoppingCriteria, PreTrainedTokenizer
18
+
19
+ IGNORE_INDEX = -100
20
+ IMAGE_TOKEN_INDEX = -200
21
+ DEFAULT_IMAGE_TOKEN = "<image>"
22
+ DEFAULT_IMAGE_PATCH_TOKEN = "<im_patch>"
23
+ DEFAULT_IM_START_TOKEN = "<im_start>"
24
+ DEFAULT_IM_END_TOKEN = "<im_end>"
25
+
26
+ def resize_and_center_crop(image: Image.Image, shortest_edge_length: int) -> Image.Image:
27
+ r"""
28
+ Resize the given image such that its shortest edge matches the specified length,
29
+ and then center crop it to a square of the same size.
30
+
31
+ Args:
32
+ - image (`Image.Image`): The input image to be resized and cropped.
33
+ - shortest_edge_length (`int`): The length of the shortest edge after resizing.
34
+
35
+ Returns:
36
+ `Image.Image`: The resized and center-cropped image.
37
+ """
38
+
39
+ # Calculate new dimensions and resize
40
+ aspect_ratio = float(image.width) / float(image.height)
41
+ if (aspect_ratio > 1):
42
+ new_width = int(shortest_edge_length * aspect_ratio)
43
+ new_height = shortest_edge_length
44
+ else:
45
+ new_width = shortest_edge_length
46
+ new_height = int(shortest_edge_length / aspect_ratio)
47
+ resized_image = image.resize((new_width, new_height), Image.ANTIALIAS)
48
+
49
+ # Calculate the position and perform the center crop
50
+ left = (new_width - shortest_edge_length) / 2
51
+ top = (new_height - shortest_edge_length) / 2
52
+ right = (new_width + shortest_edge_length) / 2
53
+ bottom = (new_height + shortest_edge_length) / 2
54
+ cropped_image = resized_image.crop((left, top, right, bottom))
55
+
56
+ return cropped_image
57
+
58
+
59
+ def auto_pad_images(image: Image.Image, grid_params: list) -> Image.Image:
60
+ r"""
61
+ Automatically pads an input image to match the closest aspect ratio from a list of grid parameters.
62
+
63
+ Args:
64
+ - image (`Image.Image`): The input image to be padded. Must be a Pillow Image object.
65
+ - grid_params (`list`): A list of integers representing the grid parameters to determine the target aspect ratio.
66
+
67
+ Returns:
68
+ `Image.Image`: The padded image with the closest aspect ratio from the grid parameters.
69
+
70
+ Raises:
71
+ `AssertionError`: If the input is not a Pillow Image object or if the grid parameters list is empty.
72
+ """
73
+
74
+ assert isinstance(image, Image.Image), "Input should be a Pillow Image"
75
+ assert len(grid_params) > 0, "Grid parameters should not be empty"
76
+
77
+ # Step 1: Calculate and find the closest aspect ratio
78
+ input_width, input_height = image.size
79
+ input_aspect_ratio = input_width / input_height
80
+ candidate_resolutions = [(w / h, w, h) for w in grid_params for h in grid_params]
81
+ closest_aspect_ratio = min(candidate_resolutions, key=lambda x: abs(input_aspect_ratio - x[0]))
82
+
83
+ candidate_resolutions = [(x[1], x[2]) for x in candidate_resolutions if abs(x[0] - closest_aspect_ratio[0]) < 1e-3]
84
+
85
+ target_resolution = min(candidate_resolutions, key=lambda res: abs(max(input_width, input_height) / max(res) - 1))
86
+
87
+ resize_width, resize_height = target_resolution
88
+ if input_width > input_height:
89
+ resize_height = int(resize_width / input_aspect_ratio)
90
+ else:
91
+ resize_width = int(resize_height * input_aspect_ratio)
92
+ resized_image = image.resize((resize_width, resize_height), Image.ANTIALIAS)
93
+
94
+ # Step 5: Pad the resized image if necessary to match the target resolution
95
+ pad_width = target_resolution[0] - resize_width
96
+ pad_height = target_resolution[1] - resize_height
97
+ padded_image = Image.new("RGB", target_resolution, color=(0, 0, 0))
98
+ padded_image.paste(resized_image, (pad_width // 2, pad_height // 2))
99
+
100
+ return padded_image
101
+
102
+
103
+ def extract_patches(image: Image.Image, patch_size: int, overlap_ratio: float) -> List[Image.Image]:
104
+ r"""
105
+ Extracts patches from a given image with specified patch size and overlap ratio.
106
+
107
+ Args:
108
+ - image (`Image.Image`): The input image from which patches are to be extracted. Must be a Pillow Image.
109
+ - patch_size (`int`): The size of each patch (both width and height). Must be greater than 0.
110
+ - overlap_ratio (`float`): The ratio of overlap between adjacent patches. Must be between 0 and 1 (exclusive).
111
+
112
+ Returns:
113
+ `List[Image.Image]`: A list of extracted patches as Pillow Images.
114
+
115
+ Raises:
116
+ `AssertionError`: If the input image is not a Pillow Image.
117
+ `AssertionError`: If the patch size is not greater than 0.
118
+ `AssertionError`: If the overlap ratio is not between 0 and 1.
119
+ """
120
+
121
+ assert isinstance(image, Image.Image), "Input should be a Pillow Image"
122
+ assert patch_size > 0, "Patch size should be greater than 0"
123
+ assert 0 <= overlap_ratio < 1, "Overlap ratio should be between 0 and 1"
124
+
125
+ W, H = image.size
126
+ patches = []
127
+
128
+ stride = int(patch_size * (1 - overlap_ratio))
129
+
130
+ num_patches_y = (H - patch_size) // stride + 1
131
+ num_patches_x = (W - patch_size) // stride + 1
132
+
133
+ y_start = (H - (num_patches_y - 1) * stride - patch_size) // 2
134
+ x_start = (W - (num_patches_x - 1) * stride - patch_size) // 2
135
+
136
+ for y in range(y_start, y_start + num_patches_y * stride, stride):
137
+ for x in range(x_start, x_start + num_patches_x * stride, stride):
138
+ patch = image.crop((x, y, x + patch_size, y + patch_size))
139
+ patches.append(patch)
140
+
141
+ return patches
142
+
143
+
144
+ def process_highres_image_crop_split(image: Image.Image, data_args, processor=None) -> torch.Tensor:
145
+ """
146
+ Process a high-resolution image by cropping and splitting it into patches.
147
+
148
+ Args:
149
+ - image (`PIL.Image.Image`): The input image to be processed.
150
+ - data_args: The data arguments containing crop and split resolutions.
151
+ - processor: The image processor object. If None, it will be taken from data_args.
152
+
153
+ Returns:
154
+ `torch.Tensor`: A tensor containing the processed image patches.
155
+ """
156
+ crop_resolution = data_args.image_crop_resolution
157
+ split_resolution = data_args.image_split_resolution
158
+ if processor is None:
159
+ processor = data_args.image_processor
160
+ image_crop = resize_and_center_crop(image, crop_resolution)
161
+ image_patches = extract_patches(image_crop, patch_size=split_resolution, overlap_ratio=0)
162
+ image_patches = [processor.preprocess(image_patch, return_tensors="pt")["pixel_values"][0] for image_patch in image_patches]
163
+ return torch.stack(image_patches, dim=0)
164
+
165
+
166
+ def process_highres_image(image: Image.Image, processor, grid_pinpoints: str) -> torch.Tensor:
167
+ r"""
168
+ Processes a high-resolution image by resizing, padding, and extracting patches.
169
+
170
+ Args:
171
+ - image (`Image.Image`): The input image to be processed.
172
+ - processor: An object that contains image processing parameters and methods.
173
+ - grid_pinpoints (`str`): A comma-separated string of grid sizes to consider for resizing.
174
+
175
+ Returns:
176
+ torch.Tensor: A tensor containing the processed image patches.
177
+ """
178
+
179
+ grid_params = [int(x) for x in grid_pinpoints.split(",")]
180
+ width_height = max(image.size)
181
+ fit_grid_params = [x for x in grid_params if x >= width_height]
182
+ if len(fit_grid_params) == 0:
183
+ select_size = max(grid_params)
184
+ else:
185
+ select_size = min(fit_grid_params)
186
+ # FIXME: always select the 448
187
+ select_size = max(grid_params)
188
+ image_padded = expand2square(image, tuple(int(x * 255) for x in processor.image_mean))
189
+
190
+ # FIXME: this seems to be a bug that it always resizes instead of padding
191
+ image_original_resize = image.resize((processor.size["shortest_edge"], processor.size["shortest_edge"]))
192
+ image_padded = image_padded.resize((select_size, select_size))
193
+ image_patches = extract_patches(image_padded, patch_size=processor.size["shortest_edge"], overlap_ratio=0)
194
+ image_patches = [image_original_resize] + image_patches
195
+ image_patches = [processor.preprocess(image_patch, return_tensors="pt")["pixel_values"][0] for image_patch in image_patches]
196
+ return torch.stack(image_patches, dim=0)
197
+
198
+
199
+ def select_best_resolution(original_size: tuple, possible_resolutions: List[Tuple[int, int]]) -> tuple:
200
+ """
201
+ Selects the best resolution from a list of possible resolutions based on the original size.
202
+
203
+ Args:
204
+ - original_size (`tuple`): The original size of the image in the format (width, height).
205
+ - possible_resolutions (`List[Tuple[int, int]]`): A list of possible resolutions in the format [(width1, height1), (width2, height2), ...].
206
+
207
+ Returns:
208
+ `tuple`: The best fit resolution in the format (width, height).
209
+ """
210
+ original_width, original_height = original_size
211
+ best_fit = None
212
+ max_effective_resolution = 0
213
+ min_wasted_resolution = float("inf")
214
+
215
+ for width, height in possible_resolutions:
216
+ # Calculate the downscaled size to keep the aspect ratio
217
+ scale = min(width / original_width, height / original_height)
218
+ downscaled_width, downscaled_height = int(original_width * scale), int(original_height * scale)
219
+
220
+ # Calculate effective and wasted resolutions
221
+ effective_resolution = min(downscaled_width * downscaled_height, original_width * original_height)
222
+ wasted_resolution = (width * height) - effective_resolution
223
+
224
+ if effective_resolution > max_effective_resolution or (effective_resolution == max_effective_resolution and wasted_resolution < min_wasted_resolution):
225
+ max_effective_resolution = effective_resolution
226
+ min_wasted_resolution = wasted_resolution
227
+ best_fit = (width, height)
228
+
229
+ return best_fit
230
+
231
+
232
+ def resize_and_pad_image(image: Image.Image, target_resolution: tuple) -> Image.Image:
233
+ r"""
234
+ Resize and pad an image to a target resolution while maintaining aspect ratio.
235
+
236
+ Args:
237
+ - image (`Image.Image`): The input image.
238
+ - target_resolution (`tuple`): The target resolution (width, height) of the image.
239
+
240
+ Returns:
241
+ `Image.Image`: The resized and padded image.
242
+ """
243
+ original_width, original_height = image.size
244
+ target_width, target_height = target_resolution
245
+
246
+ # Determine which dimension (width or height) to fill
247
+ scale_w = target_width / original_width
248
+ scale_h = target_height / original_height
249
+
250
+ if scale_w < scale_h:
251
+ # Width will be filled completely
252
+ new_width = target_width
253
+ new_height = min(math.ceil(original_height * scale_w), target_height)
254
+ else:
255
+ # Height will be filled completely
256
+ new_height = target_height
257
+ new_width = min(math.ceil(original_width * scale_h), target_width)
258
+
259
+ # Resize the image
260
+ resized_image = image.resize((new_width, new_height))
261
+
262
+ # Create a new image with the target size and paste the resized image onto it
263
+ new_image = Image.new("RGB", (target_width, target_height), (0, 0, 0))
264
+ paste_x = (target_width - new_width) // 2
265
+ paste_y = (target_height - new_height) // 2
266
+ new_image.paste(resized_image, (paste_x, paste_y))
267
+
268
+ return new_image
269
+
270
+
271
+ def divide_to_patches(image: Image.Image, patch_size: int) -> list:
272
+ """
273
+ Divides an image into patches of a specified size.
274
+
275
+ Args:
276
+ - image (`Image.Image`): The input image.
277
+ - patch_size (`int`): The size of each patch.
278
+
279
+ Returns:
280
+ `list`: A list of Image.Image objects representing the patches.
281
+ """
282
+ patches = []
283
+ width, height = image.size
284
+ for i in range(0, height, patch_size):
285
+ for j in range(0, width, patch_size):
286
+ box = (j, i, j + patch_size, i + patch_size)
287
+ patch = image.crop(box)
288
+ patches.append(patch)
289
+
290
+ return patches
291
+
292
+
293
+ def get_anyres_image_grid_shape(image_size: Tuple[int, int], grid_pinpoints: Union[str, list], patch_size: int) -> Tuple[int, int]:
294
+ r"""
295
+ Calculate the shape of the image patch grid after the preprocessing for images of any resolution.
296
+
297
+ Args:
298
+ - image_size (`tuple`): The size of the input image in the format (width, height).
299
+ - grid_pinpoints (`str` or `list`): A string representation of a list of possible resolutions.
300
+ - patch_size (`int`): The size of each image patch.
301
+
302
+ Returns:
303
+ `tuple`: The shape of the image patch grid in the format (width, height).
304
+ """
305
+ if isinstance(grid_pinpoints, str) and "x" in grid_pinpoints:
306
+ assert patch_size in [224, 336, 384, 448, 512], "patch_size should be in [224, 336, 384, 448, 512]"
307
+ # Use regex to extract the range from the input string
308
+ matches = re.findall(r"\((\d+)x(\d+)\)", grid_pinpoints)
309
+ range_start = tuple(map(int, matches[0]))
310
+ range_end = tuple(map(int, matches[-1]))
311
+ # Generate a matrix of tuples from (range_start[0], range_start[1]) to (range_end[0], range_end[1])
312
+ grid_pinpoints = [(i, j) for i in range(range_start[0], range_end[0] + 1) for j in range(range_start[1], range_end[1] + 1)]
313
+ # Multiply all elements by patch_size
314
+ grid_pinpoints = [[dim * patch_size for dim in pair] for pair in grid_pinpoints]
315
+ if type(grid_pinpoints) is list:
316
+ possible_resolutions = grid_pinpoints
317
+ else:
318
+ possible_resolutions = ast.literal_eval(grid_pinpoints)
319
+ width, height = select_best_resolution(image_size, possible_resolutions)
320
+ return width // patch_size, height // patch_size
321
+
322
+
323
+ def process_anyres_image(image: Image.Image, processor: Any, grid_pinpoints: Union[str, List[Tuple[int, int]]]) -> torch.Tensor:
324
+ r"""
325
+ Process an image with variable resolutions.
326
+
327
+ Args:
328
+ - image (`Image.Image`): The input image to be processed.
329
+ - processor: The image processor object.
330
+ - grid_pinpoints (`str`): A string representation of a list of possible resolutions.
331
+
332
+ Returns:
333
+ `torch.Tensor`: A tensor containing the processed image patches.
334
+ """
335
+ # Convert grid_pinpoints from string to list
336
+ if isinstance(grid_pinpoints, str) and "x" in grid_pinpoints:
337
+ try:
338
+ patch_size = processor.size[0]
339
+ except Exception as e:
340
+ patch_size = processor.size["shortest_edge"]
341
+ assert patch_size in [224, 336, 384, 448, 512], "patch_size should be in [224, 336, 384, 448, 512]"
342
+ # Use regex to extract the range from the input string
343
+ matches = re.findall(r"\((\d+)x(\d+)\)", grid_pinpoints)
344
+ range_start = tuple(map(int, matches[0]))
345
+ range_end = tuple(map(int, matches[-1]))
346
+ # Generate a matrix of tuples from (range_start[0], range_start[1]) to (range_end[0], range_end[1])
347
+ grid_pinpoints = [(i, j) for i in range(range_start[0], range_end[0] + 1) for j in range(range_start[1], range_end[1] + 1)]
348
+ # Multiply all elements by patch_size
349
+ grid_pinpoints = [[dim * patch_size for dim in pair] for pair in grid_pinpoints]
350
+
351
+ if type(grid_pinpoints) is list:
352
+ possible_resolutions = grid_pinpoints
353
+ else:
354
+ possible_resolutions = ast.literal_eval(grid_pinpoints)
355
+ best_resolution = select_best_resolution(image.size, possible_resolutions)
356
+ image_padded = resize_and_pad_image(image, best_resolution)
357
+
358
+ patches = divide_to_patches(image_padded, processor.crop_size["height"])
359
+
360
+ # FIXME: this seems to be a bug that it resizes instead of pad. # FIXME
361
+ # but to keep it consistent with previous, i will keep it as it is
362
+ # TODO: uncomment below to ablate with the padding
363
+ if isinstance(processor.size, dict):
364
+ shortest_edge = processor.size["shortest_edge"]
365
+ else:
366
+ shortest_edge = min(processor.size)
367
+ image_original_resize = image.resize((shortest_edge, shortest_edge))
368
+ # image_padded_square = expand2square(image, tuple(int(x*255) for x in processor.image_mean))
369
+ # image_original_resize = image_padded_square.resize((processor.size['shortest_edge'], processor.size['shortest_edge']))
370
+
371
+ image_patches = [image_original_resize] + patches
372
+ image_patches = [processor.preprocess(image_patch, return_tensors="pt")["pixel_values"][0] for image_patch in image_patches]
373
+ image_patches = torch.stack(image_patches, dim=0)
374
+ return image_patches
375
+
376
+
377
+ def load_image_from_base64(image):
378
+ return Image.open(BytesIO(base64.b64decode(image)))
379
+
380
+
381
+ def expand2square(pil_img: Image.Image, background_color: tuple) -> Image.Image:
382
+ r"""
383
+ Expands a given PIL image to a square by adding a background color.
384
+
385
+ Args:
386
+ - pil_img (`Image.Image`): The input PIL image to be expanded.
387
+ - background_color (`tuple`): The background color to use for expansion, specified as an RGB tuple.
388
+
389
+ Returns:
390
+ `Image.Image`: The expanded square PIL image.
391
+ """
392
+ width, height = pil_img.size
393
+ if width == height:
394
+ return pil_img
395
+ elif width > height:
396
+ result = Image.new(pil_img.mode, (width, width), background_color)
397
+ result.paste(pil_img, (0, (width - height) // 2))
398
+ return result
399
+ else:
400
+ result = Image.new(pil_img.mode, (height, height), background_color)
401
+ result.paste(pil_img, ((height - width) // 2, 0))
402
+ return result
403
+
404
+
405
+ def process_images(images: List[Image.Image], image_processor: Any, model_cfg: Any) -> Union[torch.Tensor, List[torch.Tensor]]:
406
+ r"""
407
+ Processes a list of images based on the specified model configuration.
408
+
409
+ Args:
410
+ - images (`list`): A list of images to be processed.
411
+ - image_processor (`ImageProcessor`): An instance of the image processor to be used.
412
+ - model_cfg (`ModelConfig`): Configuration object containing model settings.
413
+
414
+ Returns:
415
+ `torch.Tensor` or list: Processed images as a tensor if all images have the same shape,
416
+ otherwise a list of processed images.
417
+ """
418
+ image_aspect_ratio = getattr(model_cfg, "image_aspect_ratio", None)
419
+ new_images = []
420
+ if image_aspect_ratio == "highres":
421
+ for image in images:
422
+ image = process_highres_image(image, image_processor, model_cfg.image_grid_pinpoints)
423
+ new_images.append(image)
424
+ elif image_aspect_ratio == "anyres" or "anyres_max" in image_aspect_ratio:
425
+ for image in images:
426
+ image = process_anyres_image(image, image_processor, model_cfg.image_grid_pinpoints)
427
+ new_images.append(image)
428
+ elif image_aspect_ratio == "crop_split":
429
+ for image in images:
430
+ image = process_highres_image_crop_split(image, model_cfg, image_processor)
431
+ new_images.append(image)
432
+ elif image_aspect_ratio == "pad":
433
+ for image in images:
434
+ image = expand2square(image, tuple(int(x * 255) for x in image_processor.image_mean))
435
+ image = image_processor.preprocess(image, return_tensors="pt")["pixel_values"][0]
436
+ new_images.append(image)
437
+ else:
438
+ return image_processor.preprocess(images, return_tensors="pt")["pixel_values"]
439
+ if all(x.shape == new_images[0].shape for x in new_images):
440
+ new_images = torch.stack(new_images, dim=0)
441
+ return new_images
442
+
443
+
444
+ def tokenizer_image_token(prompt: str, tokenizer: PreTrainedTokenizer, image_token_index=IMAGE_TOKEN_INDEX, return_tensors=None)->Union[torch.Tensor, List[torch.Tensor]]:
445
+ r"""
446
+ Tokenizes a prompt containing image tokens and inserts the specified image token index at the appropriate positions.
447
+
448
+ Args:
449
+ - prompt (str): The input prompt string containing text and "<image>" placeholders.
450
+ - tokenizer (PreTrainedTokenizer): The tokenizer to use for tokenizing the text chunks.
451
+ - image_token_index (int): The token index to use for the image placeholders. Default is IMAGE_TOKEN_INDEX.
452
+ - return_tensors (str, optional): The type of tensor to return. If "pt", returns a PyTorch tensor. Default is None.
453
+
454
+ Returns:
455
+ list or torch.Tensor: The tokenized input IDs as a list or a PyTorch tensor if return_tensors is specified.
456
+ """
457
+ prompt_chunks = [tokenizer(chunk).input_ids for chunk in prompt.split("<image>")]
458
+ # FIXME: prompt_chunks = [tokenizer(chunk, return_tensors="pt", padding="longest", max_length=tokenizer.model_max_length, truncation=True).input_ids for chunk in prompt.split("<image>")]
459
+
460
+ def insert_separator(X, sep):
461
+ return [ele for sublist in zip(X, [sep] * len(X)) for ele in sublist][:-1]
462
+
463
+ input_ids = []
464
+ offset = 0
465
+ if len(prompt_chunks) > 0 and len(prompt_chunks[0]) > 0 and prompt_chunks[0][0] == tokenizer.bos_token_id:
466
+ offset = 1
467
+ input_ids.append(prompt_chunks[0][0])
468
+
469
+ for x in insert_separator(prompt_chunks, [image_token_index] * (offset + 1)):
470
+ input_ids.extend(x[offset:])
471
+
472
+ if return_tensors is not None:
473
+ if return_tensors == "pt":
474
+ return torch.tensor(input_ids, dtype=torch.long)
475
+ raise ValueError(f"Unsupported tensor type: {return_tensors}")
476
+ return input_ids
477
+
478
+
479
+ def get_model_name_from_path(model_path: str)->str:
480
+ model_path = model_path.strip("/")
481
+ model_paths = model_path.split("/")
482
+ if model_paths[-1].startswith("checkpoint-"):
483
+ return model_paths[-2] + "_" + model_paths[-1]
484
+ else:
485
+ return model_paths[-1]
486
+
487
+
488
+ class KeywordsStoppingCriteria(StoppingCriteria):
489
+ def __init__(self, keywords, tokenizer, input_ids):
490
+ self.keywords = keywords
491
+ self.keyword_ids = []
492
+ for keyword in keywords:
493
+ cur_keyword_ids = tokenizer(keyword).input_ids
494
+ if len(cur_keyword_ids) > 1 and cur_keyword_ids[0] == tokenizer.bos_token_id:
495
+ cur_keyword_ids = cur_keyword_ids[1:]
496
+ self.keyword_ids.append(torch.tensor(cur_keyword_ids))
497
+ self.tokenizer = tokenizer
498
+ self.start_len = input_ids.shape[1]
499
+
500
+ def __call__(self, output_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
501
+ assert output_ids.shape[0] == 1, "Only support batch size 1 (yet)" # TODO
502
+ offset = min(output_ids.shape[1] - self.start_len, 3)
503
+ self.keyword_ids = [keyword_id.to(output_ids.device) for keyword_id in self.keyword_ids]
504
+ for keyword_id in self.keyword_ids:
505
+ if output_ids[0, -keyword_id.shape[0] :] == keyword_id:
506
+ return True
507
+ outputs = self.tokenizer.batch_decode(output_ids[:, -offset:], skip_special_tokens=True)[0]
508
+ for keyword in self.keywords:
509
+ if keyword in outputs:
510
+ return True
511
+ return False
512
+
513
+
514
+ def rank0_print(*args):
515
+ if dist.is_initialized():
516
+ if dist.get_rank() == 0:
517
+ print(f"Rank {dist.get_rank()}: ", *args)
518
+ else:
519
+ print(*args)
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff379663469ab8fce65791d825a0e76227e5ba1904044a2cd0bf289e8fe88cc4
3
+ size 2973123232
modeling_instellavl.py ADDED
The diff for this file is too large to render. See raw diff
 
preprocessor_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "auto_map": {
3
+ "AutoImageProcessor": "image_processing_instellavl.InstellaVLImageProcessor",
4
+ "AutoProcessor": "processing_instellavl.InstellaVLProcessor"
5
+ },
6
+ "processor_class": "InstellaVLProcessor"
7
+ }
processing_instellavl.py ADDED
@@ -0,0 +1,212 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from PIL import ImageOps
2
+ from PIL.Image import Image
3
+
4
+ import torch
5
+
6
+ from typing import Union, List
7
+ from tqdm import tqdm
8
+
9
+ from transformers.image_utils import ImageInput
10
+ from transformers.tokenization_utils_base import TextInput
11
+ from transformers import CLIPImageProcessor
12
+ from transformers.processing_utils import (
13
+ ProcessorMixin,
14
+ )
15
+ from transformers import AutoTokenizer, PreTrainedTokenizer
16
+
17
+ from .image_processing_instellavl import InstellaVLImageProcessor
18
+ from .mm_utils import DEFAULT_IMAGE_TOKEN, IMAGE_TOKEN_INDEX, KeywordsStoppingCriteria
19
+ from .conversation import conv_templates
20
+
21
+ def tokenizer_image_token(prompt: str, tokenizer: PreTrainedTokenizer, image_token_index=IMAGE_TOKEN_INDEX, return_tensors=None)->Union[torch.Tensor, List[torch.Tensor]]:
22
+ r"""
23
+ Tokenizes a prompt containing image tokens and inserts the specified image token index at the appropriate positions.
24
+
25
+ Args:
26
+ - prompt (str): The input prompt string containing text and DEFAULT_IMAGE_TOKEN="<image>" placeholders.
27
+ - tokenizer (PreTrainedTokenizer): The tokenizer to use for tokenizing the text chunks.
28
+ - image_token_index (int): The token index to use for the image placeholders. Default is IMAGE_TOKEN_INDEX.
29
+ - return_tensors (str, optional): The type of tensor to return. If "pt", returns a PyTorch tensor. Default is None.
30
+
31
+ Returns:
32
+ list or torch.Tensor: The tokenized input IDs as a list or a PyTorch tensor if return_tensors is specified.
33
+ """
34
+ prompt_chunks = [tokenizer(chunk).input_ids for chunk in prompt.split(DEFAULT_IMAGE_TOKEN)]
35
+
36
+ def insert_separator(X, sep):
37
+ return [ele for sublist in zip(X, [sep] * len(X)) for ele in sublist][:-1]
38
+
39
+ input_ids = []
40
+ offset = 0
41
+ if len(prompt_chunks) > 0 and len(prompt_chunks[0]) > 0 and prompt_chunks[0][0] == tokenizer.bos_token_id:
42
+ offset = 1
43
+ input_ids.append(prompt_chunks[0][0])
44
+
45
+ for x in insert_separator(prompt_chunks, [image_token_index] * (offset + 1)):
46
+ input_ids.extend(x[offset:])
47
+
48
+ if return_tensors is not None:
49
+ if return_tensors == "pt":
50
+ return torch.tensor(input_ids, dtype=torch.long)
51
+ raise ValueError(f"Unsupported tensor type: {return_tensors}")
52
+ return input_ids
53
+
54
+
55
+ class InstellaVLProcessor(ProcessorMixin):
56
+ attributes = ["image_processor", "tokenizer"]
57
+ image_processor_class = "AutoImageProcessor"
58
+ tokenizer_class = ("GPTNeoXTokenizerFast")
59
+
60
+ def __init__(self, image_processor: InstellaVLImageProcessor = None, tokenizer: AutoTokenizer = None, **kwargs):
61
+ super().__init__(image_processor, tokenizer, **kwargs)
62
+
63
+ def pad_sequence(self, input_ids: Union[List[torch.Tensor], List[List[torch.Tensor]]], batch_first: bool, padding_value: int, tokenizer: AutoTokenizer):
64
+ if tokenizer.padding_side == "left":
65
+ input_ids = [torch.flip(_input_ids, [0]) for _input_ids in input_ids]
66
+ input_ids = torch.nn.utils.rnn.pad_sequence(input_ids, batch_first=batch_first, padding_value=padding_value)
67
+ if tokenizer.padding_side == "left":
68
+ input_ids = torch.flip(input_ids, [1])
69
+ return input_ids
70
+
71
+ def encode(self,
72
+ text: TextInput = None,
73
+ images: ImageInput = None,
74
+ image_processor: CLIPImageProcessor = None,
75
+ tokenizer: AutoTokenizer = None,
76
+ model_cfg: dict = None,
77
+ ) -> dict:
78
+
79
+ if images is not None:
80
+ if isinstance(images, Image):
81
+ # Handle images with EXIF orientation tags, which PIL will ignore by default
82
+ # https://github.com/python-pillow/Pillow/issues/4703
83
+ ImageOps.exif_transpose(images, in_place=True)
84
+ image_sizes = [images.size]
85
+ images = [images]
86
+ elif isinstance(images, list):
87
+ image_sizes = []
88
+ for i in images:
89
+ ImageOps.exif_transpose(i, in_place=True)
90
+ image_sizes.append(i.size)
91
+ image_tensor = self.image_processor.process(images, image_processor, model_cfg)['pixel_values']
92
+
93
+ text = text.replace(DEFAULT_IMAGE_TOKEN, "").strip()
94
+ if images is not None and len(image_tensor) != 0 and DEFAULT_IMAGE_TOKEN not in text:
95
+ question = DEFAULT_IMAGE_TOKEN + "\n" + text
96
+ else:
97
+ question = text
98
+ conv = conv_templates["instella"].copy()
99
+ conv.append_message(conv.roles[0], question)
100
+ conv.append_message(conv.roles[1], None)
101
+ prompt_question = conv.get_prompt()
102
+
103
+
104
+ input_ids = tokenizer_image_token(prompt_question, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt").unsqueeze(0)
105
+ keywords = [conv.sep]
106
+ stopping_criteria = KeywordsStoppingCriteria(keywords, tokenizer, input_ids)
107
+ terminators = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("|||IP_ADDRESS|||")]
108
+
109
+ out = {
110
+ "input_ids": input_ids,
111
+ "stopping_criteria": [stopping_criteria],
112
+ "eos_token_id": terminators,
113
+ }
114
+ if images is not None:
115
+ out = {
116
+ "image_tensor": image_tensor,
117
+ "image_sizes": image_sizes,
118
+ **out,
119
+ }
120
+ self.tokenizer = tokenizer
121
+ return out
122
+
123
+ def batch_encode(self,
124
+ texts: List[TextInput] = None,
125
+ images: List[ImageInput] = None,
126
+ image_processor: CLIPImageProcessor = None,
127
+ tokenizer: AutoTokenizer = None,
128
+ model_cfg: dict = None,
129
+ ):
130
+
131
+ if texts is None:
132
+ raise ValueError("Text must be provided for batch encoding.")
133
+
134
+ if images is None:
135
+ images = [None] * len(text)
136
+
137
+ assert isinstance(texts, list), "Since batch encoding happening, provide batch of texts in a list."
138
+
139
+ assert len(texts) == len(images), "The number of texts and images must be equal."
140
+
141
+ batch_outs = []
142
+ for txt, img in tqdm(zip(texts, images), total=len(texts), desc="Total Samples to encode"):
143
+ batch_outs.append(self.encode(txt, img, image_processor, tokenizer, model_cfg))
144
+
145
+ return batch_outs
146
+ # batched_image_tensors = []
147
+ # batched_text_tokens = []
148
+ # stopping_criterias = []
149
+ # image_sizes = []
150
+ # for t, img in tqdm(zip(text, images), desc="Total Samples to encode"):
151
+ # if img is not None:
152
+ # if isinstance(img, Image):
153
+ # ImageOps.exif_transpose(img, in_place=True)
154
+ # image_sizes.append(img.size)
155
+ # img = [img]
156
+
157
+ # elif isinstance(img, list):
158
+ # tmp_img_sizes = []
159
+ # for i in img:
160
+ # ImageOps.exif_transpose(i, in_place=True)
161
+ # tmp_img_sizes.append(i.size)
162
+ # image_sizes.append(tmp_img_sizes)
163
+ # batched_image_tensors.append(self.image_processor.process(img, image_processor, model_cfg)['pixel_values'].squeeze(0))
164
+
165
+ # t = t.replace(DEFAULT_IMAGE_TOKEN, "").strip()
166
+ # if img is not None and len(batched_image_tensors[-1]) != 0 and DEFAULT_IMAGE_TOKEN not in t:
167
+ # question = DEFAULT_IMAGE_TOKEN + "\n" + t
168
+ # else:
169
+ # question = t
170
+ # conv = conv_templates["instella"].copy()
171
+ # conv.append_message(conv.roles[0], question)
172
+ # conv.append_message(conv.roles[1], None)
173
+ # prompt_question = conv.get_prompt()
174
+
175
+ # input_ids = tokenizer_image_token(prompt_question, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt")
176
+ # stopping_criterias.append(KeywordsStoppingCriteria([conv.sep], tokenizer, input_ids.unsqueeze(0)))
177
+ # batched_text_tokens.append(input_ids)
178
+ # terminators = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("|||IP_ADDRESS|||")]
179
+
180
+ # # Pad the text tokens.
181
+ # pad_token_ids = tokenizer.pad_token_id if tokenizer.pad_token_id is not None else tokenizer.eos_token_id
182
+ # input_ids = self.pad_sequence(batched_text_tokens, batch_first=True, padding_value=pad_token_ids, tokenizer=tokenizer)
183
+ # attention_masks = input_ids.ne(pad_token_ids)
184
+ # batch_outs = {
185
+ # "input_ids": input_ids,
186
+ # "attention_mask": attention_masks,
187
+ # "pad_token_id": pad_token_ids,
188
+ # "stopping_criteria": stopping_criterias,
189
+ # "eos_token_id": terminators,
190
+ # }
191
+ # if images is not None:
192
+ # batch_outs = {
193
+ # "image_tensor": batched_image_tensors,
194
+ # "image_sizes": image_sizes,
195
+ # **batch_outs
196
+ # }
197
+ # self.tokenizer = tokenizer
198
+ # return batch_outs
199
+
200
+ def decode(self, output_ids: torch.Tensor)->str:
201
+ return self.tokenizer.decode(output_ids[0, :], skip_special_tokens=True).strip()
202
+
203
+ def batch_decode(self, output_ids_lst: List[torch.Tensor])->List[str]:
204
+ raise NotImplementedError("Batch decode is not implemented for InstellaVLProcessor")
205
+ # text_decoded_outs = []
206
+ # for out_ids in output_ids_lst:
207
+ # text_decoded_outs.append(self.decode(out_ids))
208
+ # return text_decoded_outs
209
+
210
+
211
+
212
+ InstellaVLProcessor.register_for_auto_class()
processor_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "auto_map": {
3
+ "AutoProcessor": "processing_instellavl.InstellaVLProcessor"
4
+ },
5
+ "processor_class": "InstellaVLProcessor"
6
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "eos_token": {
3
+ "content": "|||IP_ADDRESS|||",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "pad_token": {
10
+ "content": "<|padding|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ }
16
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,255 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": false,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<|endoftext|>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<|padding|>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "50254": {
23
+ "content": " ",
24
+ "lstrip": false,
25
+ "normalized": true,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": false
29
+ },
30
+ "50255": {
31
+ "content": " ",
32
+ "lstrip": false,
33
+ "normalized": true,
34
+ "rstrip": false,
35
+ "single_word": false,
36
+ "special": false
37
+ },
38
+ "50256": {
39
+ "content": " ",
40
+ "lstrip": false,
41
+ "normalized": true,
42
+ "rstrip": false,
43
+ "single_word": false,
44
+ "special": false
45
+ },
46
+ "50257": {
47
+ "content": " ",
48
+ "lstrip": false,
49
+ "normalized": true,
50
+ "rstrip": false,
51
+ "single_word": false,
52
+ "special": false
53
+ },
54
+ "50258": {
55
+ "content": " ",
56
+ "lstrip": false,
57
+ "normalized": true,
58
+ "rstrip": false,
59
+ "single_word": false,
60
+ "special": false
61
+ },
62
+ "50259": {
63
+ "content": " ",
64
+ "lstrip": false,
65
+ "normalized": true,
66
+ "rstrip": false,
67
+ "single_word": false,
68
+ "special": false
69
+ },
70
+ "50260": {
71
+ "content": " ",
72
+ "lstrip": false,
73
+ "normalized": true,
74
+ "rstrip": false,
75
+ "single_word": false,
76
+ "special": false
77
+ },
78
+ "50261": {
79
+ "content": " ",
80
+ "lstrip": false,
81
+ "normalized": true,
82
+ "rstrip": false,
83
+ "single_word": false,
84
+ "special": false
85
+ },
86
+ "50262": {
87
+ "content": " ",
88
+ "lstrip": false,
89
+ "normalized": true,
90
+ "rstrip": false,
91
+ "single_word": false,
92
+ "special": false
93
+ },
94
+ "50263": {
95
+ "content": " ",
96
+ "lstrip": false,
97
+ "normalized": true,
98
+ "rstrip": false,
99
+ "single_word": false,
100
+ "special": false
101
+ },
102
+ "50264": {
103
+ "content": " ",
104
+ "lstrip": false,
105
+ "normalized": true,
106
+ "rstrip": false,
107
+ "single_word": false,
108
+ "special": false
109
+ },
110
+ "50265": {
111
+ "content": " ",
112
+ "lstrip": false,
113
+ "normalized": true,
114
+ "rstrip": false,
115
+ "single_word": false,
116
+ "special": false
117
+ },
118
+ "50266": {
119
+ "content": " ",
120
+ "lstrip": false,
121
+ "normalized": true,
122
+ "rstrip": false,
123
+ "single_word": false,
124
+ "special": false
125
+ },
126
+ "50267": {
127
+ "content": " ",
128
+ "lstrip": false,
129
+ "normalized": true,
130
+ "rstrip": false,
131
+ "single_word": false,
132
+ "special": false
133
+ },
134
+ "50268": {
135
+ "content": " ",
136
+ "lstrip": false,
137
+ "normalized": true,
138
+ "rstrip": false,
139
+ "single_word": false,
140
+ "special": false
141
+ },
142
+ "50269": {
143
+ "content": " ",
144
+ "lstrip": false,
145
+ "normalized": true,
146
+ "rstrip": false,
147
+ "single_word": false,
148
+ "special": false
149
+ },
150
+ "50270": {
151
+ "content": " ",
152
+ "lstrip": false,
153
+ "normalized": true,
154
+ "rstrip": false,
155
+ "single_word": false,
156
+ "special": false
157
+ },
158
+ "50271": {
159
+ "content": " ",
160
+ "lstrip": false,
161
+ "normalized": true,
162
+ "rstrip": false,
163
+ "single_word": false,
164
+ "special": false
165
+ },
166
+ "50272": {
167
+ "content": " ",
168
+ "lstrip": false,
169
+ "normalized": true,
170
+ "rstrip": false,
171
+ "single_word": false,
172
+ "special": false
173
+ },
174
+ "50273": {
175
+ "content": " ",
176
+ "lstrip": false,
177
+ "normalized": true,
178
+ "rstrip": false,
179
+ "single_word": false,
180
+ "special": false
181
+ },
182
+ "50274": {
183
+ "content": " ",
184
+ "lstrip": false,
185
+ "normalized": true,
186
+ "rstrip": false,
187
+ "single_word": false,
188
+ "special": false
189
+ },
190
+ "50275": {
191
+ "content": " ",
192
+ "lstrip": false,
193
+ "normalized": true,
194
+ "rstrip": false,
195
+ "single_word": false,
196
+ "special": false
197
+ },
198
+ "50276": {
199
+ "content": " ",
200
+ "lstrip": false,
201
+ "normalized": true,
202
+ "rstrip": false,
203
+ "single_word": false,
204
+ "special": false
205
+ },
206
+ "50277": {
207
+ "content": "|||EMAIL_ADDRESS|||",
208
+ "lstrip": false,
209
+ "normalized": true,
210
+ "rstrip": false,
211
+ "single_word": false,
212
+ "special": false
213
+ },
214
+ "50278": {
215
+ "content": "|||PHONE_NUMBER|||",
216
+ "lstrip": false,
217
+ "normalized": true,
218
+ "rstrip": false,
219
+ "single_word": false,
220
+ "special": false
221
+ },
222
+ "50279": {
223
+ "content": "|||IP_ADDRESS|||",
224
+ "lstrip": false,
225
+ "normalized": true,
226
+ "rstrip": false,
227
+ "single_word": false,
228
+ "special": true
229
+ },
230
+ "50280": {
231
+ "content": "<point>",
232
+ "lstrip": false,
233
+ "normalized": false,
234
+ "rstrip": false,
235
+ "single_word": false,
236
+ "special": true
237
+ },
238
+ "50281": {
239
+ "content": "</point>",
240
+ "lstrip": false,
241
+ "normalized": false,
242
+ "rstrip": false,
243
+ "single_word": false,
244
+ "special": true
245
+ }
246
+ },
247
+ "bos_token": null,
248
+ "clean_up_tokenization_spaces": true,
249
+ "eos_token": "|||IP_ADDRESS|||",
250
+ "chat_template": "|||IP_ADDRESS|||\n{% for message in messages -%}{{ message['role'] + message['content']}}{%- if not loop.last -%}{{ '\\n' if loop.index % 2 == 1 else '|||IP_ADDRESS|||\\n'}}{%- endif %}{%- endfor -%}",
251
+ "model_max_length": 32768,
252
+ "pad_token": "<|padding|>",
253
+ "tokenizer_class": "GPTNeoXTokenizer",
254
+ "unk_token": null
255
+ }