peterroh commited on Jul 23

Commit

4ce7387

verified ·

1 Parent(s): d82e323

Upload folder using huggingface_hub

Browse files

Files changed (17) hide show

.gitattributes +3 -0
LICENSE +73 -0
README.md +216 -0
config.json +262 -0
configuration.py +125 -0
examples/waybill.png +3 -0
generation_config.json +8 -0
model-00001-of-00002.safetensors +3 -0
model-00002-of-00002.safetensors +3 -0
model.safetensors.index.json +746 -0
modeling.py +493 -0
preprocessor_config.json +6 -0
processing.py +208 -0
processing_image.py +289 -0
tokenization.py +240 -0
tokenizer.json +3 -0
tokenizer_config.json +2095 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+examples/example1.png filter=lfs diff=lfs merge=lfs -text
+examples/waybill.png filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

LICENSE ADDED Viewed

	@@ -0,0 +1,73 @@

+KANANA LICENSE AGREEMENT
+Kanana Release Date: July 17, 2025
+This KANANA LICENSE AGREEMENT (this “Agreement”) is made by and between you and Kakao Corp. (“KAKAO”) that governs your use of Kanana Materials that KAKAO provides to you.
+By using, copying, modifying, distributing, performing, or displaying all or part of Kanana Materials, or otherwise accepting the terms and conditions of this Agreement, you agree to be bound by this Agreement. You hereby represent and warrant that (i) you are legally authorized to enter into this Agreement, and (ii) if you are entering into this Agreement on behalf of a legal entity, you have the authority to legally and validly bind such entity.
+1. Definition
+    1.1 “Agreement” means the terms and conditions for use, copying, distribution and modification of Kanana Materials as set forth herein.
+    1.2 “KAKAO” means Kakao Corp.
+    1.3 “You” means an individual or legal entity that enters into this Agreement with KAKAO and exercises its rights hereunder or uses Kanana Materials for any purpose. If you enter into this Agreement on behalf of a legal entity, “you” shall include such entity.
+    1.4 “Kanana” means the basic large-scale language model, software, and algorithms distributed by KAKAO under this Agreement, including parameters (such as Model Weights and optimizer status), machine learning model codes, inference/learning/fine-tuning codes, and other related elements.
+    1.5 “Documentation” means the specifications, manuals, and other documentation accompanying Kanana distributed by KAKAO.
+    1.6 “Kanana Materials” means, collectively, Kanana and Documentation, including any portions or components thereof.
+    1.7 “Outputs” means information content generated by operating or otherwise using Kanana Materials.
+    1.8 “Derivative Works” means (i) any modifications to Kanana, (ii) any work of authorship based on Kanana, or (iii) any other designed machine learning models that either directly use the patterns of Model Weights, parameters, operations, and/or outputs or incorporate a substantial part of Kanana’s performance or functional characteristics through methods including, but not limited to, transfer learning, fine-tuning, or knowledge distillation. This includes distillation methods using Kanana’s intermediate data representations or a method based on the synthetic data outputs generated by Kanana; provided, however, that Outputs shall not be deemed to be Derivative Works.
+    1.9 “Model Weights” means a set of numerical parameter values generated during Kanana’s learning process, representing the result of substantial investment and effort by KAKAO.
+2. Grant of License and Use Policy
+    2.1 Grant of License. Subject to the terms and conditions of this Agreement, you are granted a non-exclusive, worldwide, non-transferrable, royalty-free limited license under KAKAO’s intellectual property or other rights owned by KAKAO that enables you to access, download, install, copy, use, reproduce, distribute, create Derivative Works of, and make modifications to Kanana Materials.
+    2.2 Policy on Prohibited Use. Your use of Kanana Materials and Derivative Works must comply with applicable laws and regulations and adhere to KAKAO’s Guidelines For Responsible AI (https://www.kakaocorp.com/page/responsible/detail/guidelinesForResponsibleAI), which is hereby incorporated into this Agreement.
+    2.3 This Agreement applies solely to Kanana-*** and shall not apply to any other models distributed by KAKAO under separate licenses. Licenses applicable to such other models shall not apply to Kanana-***.
+    2.4 The license terms applicable to a specific version of Kanana applies exclusively to that version and shall not extend to any other versions. Each version shall be deemed as an independent and separate work of authorship.
+    2.5 You may use each version of Kanana only in accordance with the license terms expressly specified for that version, and you shall not claim that the license terms applicable to one version apply to any other version.
+    2.6 You shall not combine different versions of Kanana versions that are subject to different license terms in order to circumvent any applicable license terms.
+3. Redistribution
+    3.1 You may copy, distribute or disclose Kanana, Derivative Works, or any products or services that contain Kanana or Derivative Works; provided, however, that you shall:
+        (i) incorporate the compliance obligation set forth in the Policy on Prohibited Use provision of Section 2.2 in any agreement for use and distribution and notify subsequent users that such use restrictions apply;
+        (ii) provide any recipients of Kanana Materials or Derivative Works a copy of this Agreement;
+        (iii) expressly indicate in any files you have modified that it has been modified by you;
+        (iv) include a “Notice” text file that includes the following notice:
+            “Kanana is licensed in accordance with the Kanana License Agreement. Copyright © KAKAO Corp. All Rights Reserved.”; and
+        (v) clearly display the phrase “Powered by Kanana” on related websites, user interfaces, blog posts, introduction pages, or product documentation in a manner that is easily recognizable to users. In addition, if you use Kanana Materials or their outputs to create, train, improve, or enhance other AI models and distribute them, you must include ‘Kanana’ as a prefix to the name of such AI models.
+    3.2 You may add your own copyright statement to your modifications of Kanana Materials and may provide additional or different license terms and conditions; provided, however, that such additional or different license terms and conditions shall not violate or conflict with any provisions of this Agreement.
+4. Additional Commercial Terms
+    4.1 If you wish to engage in any of the following activities using Kanana Materials or any Derivative Works, you must obtain a separate commercial license expressly granted by KAKAO:
+        (i) Offering or (re)selling to third parties access to Kanana Materials or any Derivative Works through API, cloud platforms, or other remote access services;
+        (ii) Offering or (re)selling to third parties Kanana Materials or any Derivative Works in whole or in part, as part of a system integration (SI) or on-premise deployment solution; or
+        (iii) Offering or (re)selling to third parties Kanana Materials or any Derivative Works embedded in an on-device domains.
+    4.2 If, as of Kanana Release Date, the number of monthly active users of the products or services provided by you and/or your affiliates, is greater than 10 million in the preceding calendar month, you must obtain a separate commercial license expressly granted by KAKAO.
+    4.3 For clarity, unless your activities or conditions fall within those specified in Sections 4.1 and 4.2 above, you may use Kanana Materials or any Derivative Works for the development and operation of your own services without obtaining a commercial license from KAKAO.
+    4.4	The grant of any commercial license under Sections 4.1 and 4.2 shall be at KAKAO’s sole discretion
+5. Outputs
+KAKAO will not claim any rights to Outputs you generate using Kanana Materials. You shall be solely responsible for Outputs and the use thereof.
+6. Disclaimer of Warranty
+Unless required by law, Kanana Materials are provided on an “AS IS” basis, and KAKAO disclaims all warranties of any kind, both express and implied, including, without limitation, any warranties of title, non-infringement, merchantability, or fitness for a particular purpose.
+7. Limitation on Liability
+Unless required by law, in no event shall KAKAO be liable to you for damages, including any direct, indirect, special, consequential, incidental, and punitive damages of any character arising out of the use or inability to use Kanana Materials, Derivative Works, or Outputs, even if KAKAO has been advised of the possibility of such damages.
+8. Indemnification
+You shall indemnify and hold KAKAO harmless from and against any and all claims that may be filed by a third party as a result of your infringement of any third party’s rights or violation of any applicable law, to the extent caused by your use or distribution of Kanana Materials, Derivative Works, or Outputs; provided, however, that the foregoing shall not apply to claims resulting from KAKAO’s willful or gross negligence.
+9. Intellectual Property
+    9.1 This Agreement does not grant you any rights to use KAKAO’s trademarks, service marks, or product names. However, on a limited basis and solely for the purpose of complying with Section 3.1(v), KAKAO authorizes you to use the Kanana trademark, provided that KAKAO may require you to discontinue such use at any time if you impair the value of the Kanana trademark.
+    9.2 KAKAO retains ownership of Kanana Materials and Derivative Works created by KAKAO, but you will retain ownership of any Derivative Works and modifications made by you.
+    9.3 If you bring any legal action or proceeding against KAKAO or a third party alleging that the Kanana Materials, Derivative Works, or Outputs infringe your intellectual property rights, your rights under this Agreement shall automatically terminate as of the date such action is filed.
+    9.4 You acknowledge that Model Weights are a valuable asset of KAKAO. You shall not extract, copy, distribute, modify Model Weights or use them to train new models, except as expressly permitted under this Agreement.
+    9.5 The protections under this Agreement apply to all components of Kanana Materials (irrespective of whether it is recognized as a work of authorship), including, but not limited to, Model Weights, parameters, algorithms, or structures. You may exercise your rights in these components only to the extent expressly permitted under this Agreement.
+10. Term and Termination
+The term of this Agreement will commence upon your acceptance of this Agreement or access to Kanana Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. KAKAO may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of Kanana Materials and Derivative Works. Sections 5, 6, 7, 8, 10 and 11 shall survive the termination of this Agreement.
+11. Governing Law and Arbitration
+    11.1 This Agreement will be governed and construed under the laws of the Republic of Korea, without regard to its conflicts of laws principles.
+    11.2 Any disputes arising out of or in connection with this Agreement shall be finally settled by arbitration in accordance with the International Arbitration Rules of the Korean Commercial Arbitration Board. The number of arbitrators shall be one. The seat, or legal place, of arbitral proceedings shall be Seoul, Republic of Korea. The language to be used in the arbitral proceedings shall be English. Either party may seek interim or provisional relief from a court of competent jurisdiction, which shall not be considered a waiver of any provision in this Section. The arbitral tribunal also has the authority to issue orders for interim or provisional relief.
+12. No Waiver
+KAKAO’s failure or delay in exercising any of its rights under this Agreement shall not constitute a waiver of such rights.

README.md ADDED Viewed

	@@ -0,0 +1,216 @@

+---
+license: other
+license_name: "kanana"
+license_link: LICENSE
+language:
+- ko
+- en
+base_model:
+- kakaocorp/kanana-1.5-v-3b-instruct
+pipeline_tag: image-text-to-text
+---
+# kanana-1.5-v-3b-instruct
+The Unified Foundation Model (UFO) task force of Kanana at Kakao developed and released the Kanana-V family of multimodal large language models (MLLMs), a collection of pretrained text/image-to-text (TI2T) models.
+## Intended Use
+kanana-1.5-v-3b-instruct is intended for research and application development in multimodal understanding and text generation tasks. Typical use cases include image captioning, document understanding, OCR-based reasoning, and multimodal instruction following in both English and Korean. The model is optimized for both general-purpose and Korea-specific benchmarks, making it suitable for bilingual environments.
+ ## Model Details
+- **Developed by:** Unified Foundation Model (UFO) TF at Kakao
+- **Language(s) :** ['en', 'ko']
+- **Model Architecture:** kanana-1.5-v-3b-instruct has 3.6B parameters and contains image encoder, C-abstractor, and kanana-1.5-3b-instruct language model.
+- **Input:** The models accept text and image inputs.
+- **Output:** The models generate text only.
+- **Context Length:** 32k
+- **Knowledge Cutoff Date:** June 30, 2024
+- **Model Release Date:** Jul 24, 2025.
+- **License:** kanana-license
+## Evaluation
+### Model Configuration Summary
+| Model                      | LLM                              | Total Parameter |
+|----------------------------|----------------------------------|-----------|
+| **kanana-1.5-v-3b-instruct**        | kanana-1.5-3b-instruct  | 3.67B     |
+| HCX-SEED-Vision-3B         | HyperCLOVAX-SEED-Text-Base-3B    | 3.72B     |
+| Phi-3-Vision               | Phi-3-Mini                       | 4.15B     |
+| Qwen2.5-VL-3B-Instruct     | Qwen2.5-3B                       | 3.75B     |
+| InternVL2.5-4B             | Qwen2.5-3B-Instruct              | 3.94B     |
+### Overview
+| Model                      | All    | Image (EN) | Image (KO) | IF (EN, KO) |
+|----------------------------|--------|------------|------------|-------------|
+| **kanana-1.5-v-3b-instruct**        | 73.22  | 74.00      | 68.27      | 77.39       |
+| HCX-SEED-Vision-3B         | 59.00  | 64.81      | 51.96      | 60.23       |
+| Phi-3-Vision               | 48.84  | 65.41      | 36.40      | 44.71       |
+| Qwen2.5-VL-3B-Instruct     | 63.54  | 73.97      | 60.60      | 56.04       |
+| InternVL2.5-4B             | 61.35  | 74.73      | 54.68      | 54.63       |
+### Image Benchmarks (EN)
+| Model                      | average | MMMU (Val) | MathVista | DocVQA | ChartQA | OCRBench | InfoVQA | TextVQA | RealWorldQA | MMStar | MMB   | SEED-image | MMVet | LLaVA-Wild | scienceqa | AI2D  |
+|----------------------------|--------------|------------|-----------|--------|---------|----------|---------|---------|-------------|--------|-------|------------|-------|------------|-----------|-------|
+| **kanana-1.5-v-3b-instruct**            | 74.00        | 43.89      | 56.00     | 93.06  | 81.20   | 82.50    | 73.62   | 78.62   | 65.36       | 56.32  | 78.44 | 75.17      | 65.87 | 89.60      | 95.61     | 74.81 |
+| HCX-SEED-Vision-3B         | 64.81        | 38.89      | 47.40     | 79.87  | 71.88   | 62.90    | 55.59   | 73.51   | 62.48       | 46.66  | 72.42 | 74.84      | 47.27 | 79.30      | 86.84     | 72.31 |
+| Phi-3-Vision               | 65.41        | 45.33      | 43.60     | 87.04  | 81.40   | 63.60    | 54.80   | 69.61   | 59.08       | 47.47  | 73.37 | 71.69      | 45.96 | 70.40      | 90.84     | 76.98 |
+| Qwen2.5-VL-3B-Instruct     | 73.97        | 50.67      | 62.00     | 94.19  | 83.60   | 79.10    | 77.22   | 77.77   | 59.74       | 56.26  | 77.75 | 74.83      | 61.06 | 96.90      | 79.69     | 78.79 |
+| InternVL2.5-4B             | 74.73        | 52.33      | 61.80     | 92.13  | 82.76   | 79.20    | 69.73   | 78.24   | 62.88       | 59.72  | 81.96 | 75.59      | 61.38 | 86.30      | 97.14     | 79.83 |
+### Image Benchmarks (KO)
+| Model                      | average | KoOCRBench | KoMMDBench | KoChartTask | KoMathSolution | KoCosMed | KoFoodMenu | KoEntity | KoExam | KoCelebV2 |
+|----------------------------|--------------|----------------------|------------|-------------|----------------|----------|------------|----------|--------|-----------|
+| **kanana-1.5-v-3b-instruct**            | 68.27        | 85.93                | 74.00      | 84.96       | 36.88          | 87.58    | 70.84      | 72.04    | 58.99  | 43.24     |
+| HCX-SEED-Vision-3B         | 51.96        | 32.91                | 64.57      | 73.55       | 27.88          | 78.16    | 57.08      | 64.12    | 31.82  | 37.58     |
+| Phi-3-Vision               | 36.40        | 25.13                | 37.93      | 52.36       | 38.75          | 56.75    | 34.70      | 31.71    | 24.05  | 26.25     |
+| Qwen2.5-VL-3B-Instruct     | 60.60        | 50.67                | 61.75      | 84.96       | 47.13          | 82.01    | 66.32      | 58.15    | 60.68  | 33.72     |
+| InternVL2.5-4B             | 54.68        | 20.52                | 62.65      | 82.61       | 46.50          | 82.66    | 65.09      | 50.42    | 47.43  | 34.23     |
+### Multimodal Instruction Following Benchmarks (EN, KO)
+| Model                      | average      | MIABench | MIABench-Ko | MM-IFEval | MM-OmniAlign |
+|----------------------------|--------------|----------|-------------|-----------|--------------|
+| **kanana-1.5-v-3b-instruct**            | 77.39        | 90.28    | 91.17       | 56.67     | 71.43        |
+| HCX-SEED-Vision-3B         | 60.23        | 85.81    | 81.80       | 47.91     | 25.40        |
+| Phi-3-Vision               | 44.71        | 85.78    | 38.35       | 44.37     | 10.32        |
+| Qwen2.5-VL-3B-Instruct     | 56.04        | 82.55    | 59.61       | 39.14     | 42.86        |
+| InternVL2.5-4B             | 54.63        | 85.68    | 68.35       | 43.06     | 21.43        |
+### Note on Benchmarking Methodology
+All benchmarks were re-measured under identical software conditions to ensure fair comparison.
+- **[VLMEvalKit](https://github.com/open-compass/VLMEvalKit)** was used for MMMU, MathVista, ScienceQA, MIA-Bench, MM-IFEval and MM-OmniAlign.
+- **[lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval)** was employed for DocVQA, ChartQA, OCRBench, InfoVQA, TextVQA, RealWorldQA, MMStar, MMB, and SEED-image.
+- HCX-SEED-Vision-3B was evaluated without the use of any auxiliary tools (e.g., external OCR engines or Lens features), as such tools are not publicly available and therefore not included in our evaluation setup.
+- **Important note for ChartQA**: It was identified that the original rule-based parser used by lmms-eval marked answers ending with a period (".") as incorrect due to parsing issues. To address this, the parser logic was modified to remove any trailing period before parsing the response. All ChartQA evaluations presented here reflect results obtained after applying this parser adjustment.
+The following in-house benchmarks evaluate Korean-language tasks and Korea-specific knowledge:
+| Benchmark | Purpose |
+|-----------|---------|
+| **KoOCRBench** | Korean character recognition (OCR) |
+| **KoMMDBench**, **KoEntity**, **KoCelebV2** | Korean knowledge & cultural visual QA |
+| **KoFoodMenu**, **KoCosMed** | Korean text-based visual QA |
+| **KoChartTask** | Chart understanding in Korean |
+| **KoExam**, **KoMathSolution** | Multimodal Problem-solving in Korean (general exams & mathematics) |
+| **MIABench-Ko** | Korean multimodal instruction-following benchmark (derived from MIABench) |
+## Usage
+### Requirements
+```
+pip install transformers accelerate timm omegaconf
+```
+`transformers>=4.45.0` or the latest version is recommended.
+### Quickstart
+The following is a code snippet that briefly demonstrates how to load a model and process input data using the `AutoClass` from `transformers`.
+```python
+from PIL import Image
+import torch
+from transformers import AutoModelForVision2Seq, AutoProcessor
+MODEL = "kakaocorp/kanana-1.5-v-3b-instruct"
+# Load the model on the available device(s)
+model = AutoModelForVision2Seq.from_pretrained(
+    MODEL,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    trust_remote_code=True
+)
+model.eval()
+# Load processor
+processor = AutoProcessor.from_pretrained(MODEL, trust_remote_code=True)
+# Prepare input batch
+batch = []
+for _ in range(1):  # dummy loop to demonstrate batch processing
+    image_files = [
+        "./examples/waybill.png"
+    ]
+    sample = {
+        "image": [Image.open(image_file_path).convert("RGB") for image_file_path in image_files],
+        "conv": [
+            {"role": "system", "content": "The following is a conversation between a curious human and AI assistant."},
+            {"role": "user", "content": " ".join(["<image>"] * len(image_files))},
+            {"role": "user", "content": "사진에서 보내는 사람과 받는 사람 정보를 json 형태로 정리해줘."},
+        ]
+    }
+    batch.append(sample)
+inputs = processor.batch_encode_collate(
+    batch, padding_side="left", add_generation_prompt=True, max_length=8192
+)
+inputs = {k: v.to(model.device) if isinstance(v, torch.Tensor) else v for k, v in inputs.items()}
+# Set the generation config
+gen_kwargs = {
+    "max_new_tokens": 2048,
+    "temperature": 0,
+    "top_p": 1.0,
+    "num_beams": 1,
+    "do_sample": False,
+}
+# Generate text
+gens = model.generate(
+    **inputs,
+    **gen_kwargs,
+)
+text_outputs = processor.tokenizer.batch_decode(gens, skip_special_tokens=True)
+print(text_outputs)  # ['```json\n{\n  "보내는분": {\n    "성명": "카카오",\n    "주소": "경기도 성남시 판교역로 166"\n  },\n  "받는분": {\n    "성명": "카나나",\n    "주소": "제주도 제주시 첨단로 242"\n  }\n}\n```']
+```
+## Limitations
+- The model may generate inaccurate or misleading content, especially in scenarios requiring precise factual understanding (e.g., scientific diagrams or mathematical reasoning).
+- Performance on languages other than Korean and English has not been evaluated and may be poor.
+- The model is not designed for medical, legal, or other high-stakes domains.
+- The model may reflect social biases present in the pretraining data.
+## Contributors
+- Beomhee Park, Byeonguk Bae, Byungseok Roh, Daejin Jo, Donghee Son, Dongjin Lee, Hyunwoong Ko, Jaemyung Lee, Jeehye Lee, Sunghun Kang, Wooyoung Kang
+- Listed in alphabetical order (first name)
+## Contact
+- Kanana MLLM Core Team Technical Support: [email protected]
+- Business & Partnership Contact: [email protected]

config.json ADDED Viewed

	@@ -0,0 +1,262 @@

+{
+  "architectures": [
+    "KananaVForConditionalGeneration"
+  ],
+  "auto_map": {
+      "AutoConfig": "configuration.KananaVConfig",
+      "AutoModelForVision2Seq": "modeling.KananaVForConditionalGeneration",
+      "AutoImageProcessor": "processing_image.KananaVImageProcessor",
+      "AutoProcessor": "processing.KananaVProcessor"
+  },
+  "model_type": "kanana-1.5-v",
+  "plora_config": null,
+  "projector_config": {
+    "_attn_implementation_autoset": false,
+    "add_cross_attention": false,
+    "architectures": null,
+    "bad_words_ids": null,
+    "begin_suppress_tokens": null,
+    "bos_token_id": null,
+    "chunk_size_feed_forward": 0,
+    "cross_attention_hidden_size": null,
+    "decoder_start_token_id": null,
+    "depth": 2,
+    "diversity_penalty": 0.0,
+    "do_sample": false,
+    "early_stopping": false,
+    "encoder_hidden_size": 1280,
+    "encoder_no_repeat_ngram_size": 0,
+    "eos_token_id": null,
+    "exponential_decay_length_penalty": null,
+    "feature_layer_index": -1,
+    "finetuning_task": null,
+    "forced_bos_token_id": null,
+    "forced_eos_token_id": null,
+    "hidden_size": 1024,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1"
+    },
+    "is_decoder": false,
+    "is_encoder_decoder": false,
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1
+    },
+    "length_penalty": 1.0,
+    "max_length": 20,
+    "merge_size": 2,
+    "min_length": 0,
+    "mlp_depth": 2,
+    "model_type": "kanana-1.5-v-visual_projector",
+    "no_repeat_ngram_size": 0,
+    "num_beam_groups": 1,
+    "num_beams": 1,
+    "num_eos_tokens": 0,
+    "num_return_sequences": 1,
+    "output_attentions": false,
+    "output_hidden_size": 2048,
+    "output_hidden_states": false,
+    "output_scores": false,
+    "pad_token_id": null,
+    "pos_emb": true,
+    "pos_emb_size": 576,
+    "prefix": null,
+    "prenorm": false,
+    "problem_type": null,
+    "projector_type": "dynamic-c-abs",
+    "pruned_heads": {},
+    "remove_invalid_values": false,
+    "repetition_penalty": 1.0,
+    "return_dict": true,
+    "return_dict_in_generate": false,
+    "sep_token_id": null,
+    "suppress_tokens": null,
+    "task_specific_params": null,
+    "temperature": 1.0,
+    "tf_legacy_loss": false,
+    "tie_encoder_decoder": false,
+    "tie_word_embeddings": true,
+    "tokenizer_class": null,
+    "top_k": 50,
+    "top_p": 1.0,
+    "torch_dtype": null,
+    "torchscript": false,
+    "typical_p": 1.0,
+    "use_bfloat16": false
+  },
+  "text_config": {
+    "_name_or_path": "kakaocorp/kanana-1.5-3b-instruct",
+    "_attn_implementation_autoset": false,
+    "add_cross_attention": false,
+    "architectures": [
+      "LlamaForCausalLM"
+    ],
+    "attention_bias": false,
+    "attention_dropout": 0.0,
+    "bad_words_ids": null,
+    "begin_suppress_tokens": null,
+    "bos_token_id": 128000,
+    "chunk_size_feed_forward": 0,
+    "cross_attention_hidden_size": null,
+    "decoder_start_token_id": null,
+    "diversity_penalty": 0.0,
+    "do_sample": false,
+    "early_stopping": false,
+    "encoder_no_repeat_ngram_size": 0,
+    "eos_token_id": 128009,
+    "exponential_decay_length_penalty": null,
+    "finetuning_task": null,
+    "forced_bos_token_id": null,
+    "forced_eos_token_id": null,
+    "head_dim": 128,
+    "hidden_act": "silu",
+    "hidden_size": 2048,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1"
+    },
+    "initializer_range": 0.02,
+    "intermediate_size": 9216,
+    "is_decoder": false,
+    "is_encoder_decoder": false,
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1
+    },
+    "length_penalty": 1.0,
+    "max_length": 20,
+    "max_position_embeddings": 32768,
+    "min_length": 0,
+    "mlp_bias": false,
+    "model_type": "kanana-1.5-3b-instruct",
+    "no_repeat_ngram_size": 0,
+    "num_attention_heads": 32,
+    "num_beam_groups": 1,
+    "num_beams": 1,
+    "num_hidden_layers": 32,
+    "num_key_value_heads": 8,
+    "num_return_sequences": 1,
+    "output_attentions": false,
+    "output_hidden_states": false,
+    "output_scores": false,
+    "pad_token_id": 128001,
+    "prefix": null,
+    "pretraining_tp": 1,
+    "problem_type": null,
+    "pruned_heads": {},
+    "remove_invalid_values": false,
+    "repetition_penalty": 1.0,
+    "return_dict": true,
+    "return_dict_in_generate": false,
+    "rms_norm_eps": 1e-05,
+    "rope_scaling": null,
+    "rope_theta": 8000000.0,
+    "sep_token_id": null,
+    "suppress_tokens": null,
+    "task_specific_params": null,
+    "temperature": 1.0,
+    "tf_legacy_loss": false,
+    "tie_encoder_decoder": false,
+    "tie_word_embeddings": false,
+    "tokenizer_class": null,
+    "top_k": 50,
+    "top_p": 1.0,
+    "torch_dtype": "bfloat16",
+    "torchscript": false,
+    "typical_p": 1.0,
+    "use_bfloat16": false,
+    "use_cache": false,
+    "vocab_size": 128259
+  },
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.51.3",
+  "vision_config": {
+    "_attn_implementation_autoset": false,
+    "add_cross_attention": false,
+    "architectures": null,
+    "bad_words_ids": null,
+    "begin_suppress_tokens": null,
+    "bos_token_id": null,
+    "chunk_size_feed_forward": 0,
+    "cross_attention_hidden_size": null,
+    "decoder_start_token_id": null,
+    "depth": 32,
+    "diversity_penalty": 0.0,
+    "do_sample": false,
+    "early_stopping": false,
+    "embed_dim": 1280,
+    "encoder_no_repeat_ngram_size": 0,
+    "encoder_type": "qwen2-vl-ve",
+    "eos_token_id": null,
+    "exponential_decay_length_penalty": null,
+    "finetuning_task": null,
+    "forced_bos_token_id": null,
+    "forced_eos_token_id": null,
+    "hidden_act": "quick_gelu",
+    "hidden_size": 1280,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1"
+    },
+    "image_mean": [
+      0.48145466,
+      0.4578275,
+      0.40821073
+    ],
+    "image_size": "dynamic",
+    "image_std": [
+      0.26862954,
+      0.26130258,
+      0.27577711
+    ],
+    "in_channels": 3,
+    "in_chans": 3,
+    "initializer_range": 0.02,
+    "is_decoder": false,
+    "is_encoder_decoder": false,
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1
+    },
+    "length_penalty": 1.0,
+    "max_length": 20,
+    "min_length": 0,
+    "mlp_ratio": 4,
+    "model_type": "kanana-1.5-v-visual-encoder",
+    "no_repeat_ngram_size": 0,
+    "num_beam_groups": 1,
+    "num_beams": 1,
+    "num_heads": 16,
+    "num_return_sequences": 1,
+    "output_attentions": false,
+    "output_hidden_states": false,
+    "output_scores": false,
+    "pad_token_id": null,
+    "patch_size": 14,
+    "prefix": null,
+    "problem_type": null,
+    "pruned_heads": {},
+    "remove_invalid_values": false,
+    "repetition_penalty": 1.0,
+    "return_dict": true,
+    "return_dict_in_generate": false,
+    "sep_token_id": null,
+    "spatial_merge_size": 2,
+    "spatial_patch_size": 14,
+    "suppress_tokens": null,
+    "task_specific_params": null,
+    "temperature": 1.0,
+    "temporal_patch_size": 2,
+    "tf_legacy_loss": false,
+    "tie_encoder_decoder": false,
+    "tie_word_embeddings": true,
+    "tokenizer_class": null,
+    "top_k": 50,
+    "top_p": 1.0,
+    "torch_dtype": "bfloat16",
+    "torchscript": false,
+    "typical_p": 1.0,
+    "use_bfloat16": false
+  }
+}

configuration.py ADDED Viewed

	@@ -0,0 +1,125 @@

+import logging
+from transformers.configuration_utils import PretrainedConfig
+from transformers.models.llama.configuration_llama import LlamaConfig
+from transformers.utils.constants import OPENAI_CLIP_MEAN, OPENAI_CLIP_STD
+logger = logging.getLogger("kanana-1.5-v")
+class KananaVVisionConfig(PretrainedConfig):
+    model_type = "kanana-1.5-v-visual-encoder"
+    base_config_key = "vision_config"
+    def __init__(
+        self,
+        depth=32,
+        embed_dim=1280,
+        mlp_ratio=4,
+        num_heads=16,
+        in_chans=3,
+        hidden_size=1280,
+        patch_size=14,
+        spatial_merge_size=2,
+        spatial_patch_size=14,
+        temporal_patch_size=2,
+        initializer_range=0.02,
+        image_size="dynamic",
+        image_mean=OPENAI_CLIP_MEAN,
+        image_std=OPENAI_CLIP_STD,
+        **kwargs,
+    ):
+        super().__init__(**kwargs)
+        self.depth = depth
+        self.embed_dim = embed_dim
+        self.mlp_ratio = mlp_ratio
+        self.num_heads = num_heads
+        self.in_chans = in_chans
+        self.hidden_size = hidden_size
+        self.patch_size = patch_size
+        self.spatial_merge_size = spatial_merge_size
+        self.spatial_patch_size = spatial_patch_size
+        self.temporal_patch_size = temporal_patch_size
+        self.initializer_range = initializer_range
+        self.image_size = image_size
+        self.image_mean = image_mean
+        self.image_std = image_std
+class KananaVVisualProjectorConfig(PretrainedConfig):
+    model_type = "kanana-1.5-v-visual_projector"
+    base_config_key = "projector_config"
+    def __init__(
+        self,
+        depth=2,
+        encoder_hidden_size=1280,
+        feature_layer_index=-1,
+        hidden_size=1024,
+        merge_size=2,
+        mlp_depth=2,
+        num_eos_tokens=0,
+        output_hidden_size=2048,
+        pos_emb=True,
+        pos_emb_size=576,
+        prenorm=False,
+        projector_type="dynamic-c-abs",
+        **kwargs,
+    ):
+        super().__init__(**kwargs)
+        self.depth = depth
+        self.encoder_hidden_size = encoder_hidden_size
+        self.feature_layer_index = feature_layer_index
+        self.hidden_size = hidden_size
+        self.merge_size = merge_size
+        self.mlp_depth = mlp_depth
+        self.num_eos_tokens = num_eos_tokens
+        self.output_hidden_size = output_hidden_size
+        self.pos_emb = pos_emb
+        self.pos_emb_size = pos_emb_size
+        self.prenorm = prenorm
+        self.projector_type = projector_type
+class KananaLanguageConfig(LlamaConfig):
+    model_type = "kanana-1.5-3b-instruct"
+    base_config_key = "text_config"
+    def __init__(
+        self,
+        **kwargs,
+    ):
+        super().__init__(**kwargs)
+class KananaVConfig(PretrainedConfig):
+    model_type = "kanana-1.5-v"
+    is_composition = True
+    def __init__(
+        self,
+        vision_config: dict = {},
+        projector_config: dict = {},
+        text_config: dict = {},
+        **kwargs,
+    ):
+        super().__init__(**kwargs)
+        # Vision config
+        self.vision_config = KananaVVisionConfig(**vision_config)
+        # Visual projector config
+        self.projector_config = KananaVVisualProjectorConfig(**projector_config)
+        # Language model config
+        self.text_config = KananaLanguageConfig(**text_config)
+    @property
+    def num_visual_tokens(self):
+        return "dynamic"
+    @property
+    def hidden_size(self):
+        return self.text_config.hidden_size

examples/waybill.png ADDED Viewed

Git LFS Details

SHA256: e9a1e1d9ac471583e1c4734787c558b4f36b3816e7c88ccda4952484032eb35d
Pointer size: 132 Bytes
Size of remote file: 1.43 MB

generation_config.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 128000,
+  "eos_token_id": 128009,
+  "pad_token_id": 128001,
+  "transformers_version": "4.51.3",
+  "use_cache": false
+}

model-00001-of-00002.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:94272a33a98c25bdd9d646f82c13d1cfeae654e7dd9780cef9ff259799621577
+size 4990094968

model-00002-of-00002.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:90eac19b6578027e8271c0712f7c14a9b2ee0705633a022aa22d31b7b02746cb
+size 2345793064

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,746 @@

+{
+  "metadata": {
+    "total_size": 7335800448
+  },
+  "weight_map": {
+    "abstractor.net.0.b1.conv1.bn.bias": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b1.conv1.bn.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b1.conv1.conv.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b1.conv2.bn.bias": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b1.conv2.bn.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b1.conv2.conv.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b1.conv3.bn.bias": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b1.conv3.bn.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b1.conv3.conv.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b1.downsample.bn.bias": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b1.downsample.bn.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b1.downsample.conv.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b1.se.fc1.bias": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b1.se.fc1.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b1.se.fc2.bias": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b1.se.fc2.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b2.conv1.bn.bias": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b2.conv1.bn.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b2.conv1.conv.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b2.conv2.bn.bias": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b2.conv2.bn.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b2.conv2.conv.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b2.conv3.bn.bias": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b2.conv3.bn.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b2.conv3.conv.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b2.se.fc1.bias": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b2.se.fc1.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b2.se.fc2.bias": "model-00001-of-00002.safetensors",
+    "abstractor.net.0.b2.se.fc2.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b1.conv1.bn.bias": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b1.conv1.bn.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b1.conv1.conv.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b1.conv2.bn.bias": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b1.conv2.bn.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b1.conv2.conv.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b1.conv3.bn.bias": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b1.conv3.bn.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b1.conv3.conv.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b1.downsample.bn.bias": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b1.downsample.bn.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b1.downsample.conv.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b1.se.fc1.bias": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b1.se.fc1.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b1.se.fc2.bias": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b1.se.fc2.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b2.conv1.bn.bias": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b2.conv1.bn.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b2.conv1.conv.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b2.conv2.bn.bias": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b2.conv2.bn.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b2.conv2.conv.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b2.conv3.bn.bias": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b2.conv3.bn.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b2.conv3.conv.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b2.se.fc1.bias": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b2.se.fc1.weight": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b2.se.fc2.bias": "model-00001-of-00002.safetensors",
+    "abstractor.net.2.b2.se.fc2.weight": "model-00001-of-00002.safetensors",
+    "abstractor.pos_emb": "model-00001-of-00002.safetensors",
+    "abstractor.readout.0.bias": "model-00001-of-00002.safetensors",
+    "abstractor.readout.0.weight": "model-00001-of-00002.safetensors",
+    "abstractor.readout.2.bias": "model-00001-of-00002.safetensors",
+    "abstractor.readout.2.weight": "model-00001-of-00002.safetensors",
+    "language_model.lm_head.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.embed_tokens.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.19.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.19.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.20.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.20.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.20.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.20.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.20.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.20.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.21.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.21.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.21.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.21.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.21.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.21.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.21.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.21.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.21.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.22.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.22.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.22.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.22.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.22.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.22.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.22.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.22.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.22.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.23.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.23.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.23.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.23.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.23.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.23.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.23.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.24.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.24.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.24.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.24.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.24.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.24.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.25.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.25.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.25.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.25.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.25.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.26.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.26.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.26.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.26.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.26.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.26.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.26.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.26.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.26.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.27.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.27.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.27.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.27.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.27.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.27.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.27.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.27.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.27.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.28.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.28.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.28.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.28.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.28.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.28.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.28.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.28.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.28.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.29.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.29.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.29.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.29.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.29.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.29.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.29.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.30.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.30.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.30.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.30.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.30.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.30.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.30.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.30.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.30.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.31.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.31.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.31.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.31.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.31.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.31.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.31.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.31.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.31.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
+    "language_model.model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "language_model.model.norm.weight": "model-00002-of-00002.safetensors",
+    "vision_model.blocks.0.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.0.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.0.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.0.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.0.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.0.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.0.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.0.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.0.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.0.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.0.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.0.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.1.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.1.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.1.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.1.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.1.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.1.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.1.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.1.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.1.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.1.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.1.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.1.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.10.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.10.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.10.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.10.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.10.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.10.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.10.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.10.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.10.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.10.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.10.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.10.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.11.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.11.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.11.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.11.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.11.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.11.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.11.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.11.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.11.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.11.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.11.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.11.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.12.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.12.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.12.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.12.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.12.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.12.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.12.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.12.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.12.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.12.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.12.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.12.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.13.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.13.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.13.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.13.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.13.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.13.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.13.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.13.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.13.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.13.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.13.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.13.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.14.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.14.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.14.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.14.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.14.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.14.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.14.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.14.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.14.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.14.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.14.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.14.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.15.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.15.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.15.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.15.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.15.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.15.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.15.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.15.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.15.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.15.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.15.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.15.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.16.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.16.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.16.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.16.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.16.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.16.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.16.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.16.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.16.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.16.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.16.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.16.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.17.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.17.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.17.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.17.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.17.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.17.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.17.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.17.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.17.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.17.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.17.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.17.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.18.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.18.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.18.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.18.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.18.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.18.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.18.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.18.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.18.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.18.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.18.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.18.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.19.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.19.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.19.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.19.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.19.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.19.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.19.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.19.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.19.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.19.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.19.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.19.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.2.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.2.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.2.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.2.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.2.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.2.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.2.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.2.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.2.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.2.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.2.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.2.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.20.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.20.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.20.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.20.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.20.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.20.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.20.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.20.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.20.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.20.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.20.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.20.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.21.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.21.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.21.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.21.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.21.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.21.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.21.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.21.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.21.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.21.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.21.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.21.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.22.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.22.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.22.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.22.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.22.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.22.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.22.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.22.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.22.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.22.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.22.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.22.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.23.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.23.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.23.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.23.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.23.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.23.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.23.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.23.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.23.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.23.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.23.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.23.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.24.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.24.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.24.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.24.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.24.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.24.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.24.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.24.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.24.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.24.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.24.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.24.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.25.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.25.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.25.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.25.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.25.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.25.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.25.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.25.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.25.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.25.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.25.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.25.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.26.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.26.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.26.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.26.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.26.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.26.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.26.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.26.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.26.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.26.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.26.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.26.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.27.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.27.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.27.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.27.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.27.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.27.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.27.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.27.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.27.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.27.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.27.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.27.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.28.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.28.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.28.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.28.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.28.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.28.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.28.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.28.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.28.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.28.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.28.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.28.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.29.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.29.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.29.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.29.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.29.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.29.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.29.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.29.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.29.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.29.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.29.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.29.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.3.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.3.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.3.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.3.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.3.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.3.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.3.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.3.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.3.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.3.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.3.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.3.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.30.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.30.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.30.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.30.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.30.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.30.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.30.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.30.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.30.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.30.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.30.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.30.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.31.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.31.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.31.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.31.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.31.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.31.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.31.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.31.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.31.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.31.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.31.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.31.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.4.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.4.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.4.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.4.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.4.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.4.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.4.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.4.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.4.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.4.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.4.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.4.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.5.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.5.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.5.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.5.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.5.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.5.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.5.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.5.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.5.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.5.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.5.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.5.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.6.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.6.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.6.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.6.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.6.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.6.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.6.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.6.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.6.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.6.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.6.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.6.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.7.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.7.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.7.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.7.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.7.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.7.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.7.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.7.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.7.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.7.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.7.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.7.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.8.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.8.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.8.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.8.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.8.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.8.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.8.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.8.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.8.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.8.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.8.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.8.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.9.attn.proj.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.9.attn.proj.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.9.attn.qkv.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.9.attn.qkv.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.9.mlp.fc1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.9.mlp.fc1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.9.mlp.fc2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.9.mlp.fc2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.9.norm1.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.9.norm1.weight": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.9.norm2.bias": "model-00001-of-00002.safetensors",
+    "vision_model.blocks.9.norm2.weight": "model-00001-of-00002.safetensors",
+    "vision_model.patch_embed.proj.weight": "model-00001-of-00002.safetensors"
+  }
+}

modeling.py ADDED Viewed

	@@ -0,0 +1,493 @@

+from functools import partial
+import logging
+import re
+from typing import Optional, Tuple, Union
+from einops import rearrange
+from timm.layers import LayerNorm, LayerNorm2d
+from timm.layers.pos_embed import resample_abs_pos_embed
+from timm.models.regnet import RegStage
+import torch
+from torch import nn
+import torch.nn.functional as F
+import torch.utils.checkpoint
+from transformers import LlamaForCausalLM
+from transformers.modeling_outputs import BaseModelOutput
+from transformers.modeling_utils import PreTrainedModel
+from transformers.models.auto import AutoModelForCausalLM
+from transformers.models.qwen2_vl.configuration_qwen2_vl import (
+    Qwen2VLVisionConfig,
+)
+from transformers.models.qwen2_vl.modeling_qwen2_vl import (
+    PatchEmbed,
+    Qwen2VLPreTrainedModel,
+    Qwen2VisionTransformerPretrainedModel,
+    Qwen2VLVisionBlock,
+    VisionRotaryEmbedding
+)
+from .configuration import KananaVVisualProjectorConfig, KananaVConfig
+logger = logging.getLogger("kanana-1.5-v")
+def build_pos_embeds(
+    config: KananaVVisualProjectorConfig, num_input_tokens: int, vision_hidden_size: int
+):
+    # pos emb
+    if config.pos_emb:
+        pos_emb = torch.nn.Parameter(torch.zeros(1, num_input_tokens, vision_hidden_size))
+        nn.init.trunc_normal_(pos_emb, mean=0.0, std=0.02)
+    else:
+        pos_emb = None
+    return pos_emb
+def build_eos_tokens(config: KananaVVisualProjectorConfig, output_hidden_size: int):
+    # think tokens
+    num_eos_tokens = config.num_eos_tokens
+    if num_eos_tokens:
+        eos_tokens = torch.nn.Parameter(torch.randn(1, num_eos_tokens, output_hidden_size))
+        nn.init.trunc_normal_(eos_tokens, mean=0.0, std=config.initializer_range)
+    else:
+        eos_tokens = None
+    return eos_tokens
+def build_prenorm(config: KananaVVisualProjectorConfig):
+    if getattr(config, "prenorm", False):
+        prenorm = LayerNorm(config.encoder_hidden_size)
+    else:
+        prenorm = None
+    return prenorm
+def build_mlp(depth: int, hidden_size: int, output_hidden_size: int):
+    layers = [nn.Linear(hidden_size, output_hidden_size)]
+    for _ in range(1, depth):
+        layers.append(nn.SiLU())
+        layers.append(nn.Linear(output_hidden_size, output_hidden_size))
+    return nn.Sequential(*layers)
+class PatchMerge(nn.Module):
+    def __init__(self, merge_size):
+        super().__init__()
+        self.merge_size = merge_size
+    def forward(self, x, channel_last=False):
+        if channel_last:
+            x = rearrange(x, "B H W D -> B D H W")
+        _, D, H, W = x.shape
+        merged_x = rearrange(
+            x, "B D (H h2) (W w2) -> B (D h2 w2) H W", h2=self.merge_size, w2=self.merge_size
+        )
+        return merged_x
+class DynamicCAbstractor(nn.Module):
+    """Dynamic C-Abstractor based on RegBlock"""
+    def __init__(self, config: KananaVVisualProjectorConfig, num_input_tokens: int):
+        super().__init__()
+        self.config = config
+        if num_input_tokens == -1:
+            num_input_tokens = config.pos_emb_size
+        self.num_input_tokens = num_input_tokens
+        self.merge_size = config.merge_size
+        self.pos_emb_size = config.pos_emb_size
+        self.eos_tokens = build_eos_tokens(config, config.output_hidden_size)
+        self.pos_emb = build_pos_embeds(config, num_input_tokens, config.encoder_hidden_size)
+        self.prenorm = build_prenorm(config)
+        self.build_net()
+    def build_net(self):
+        encoder_hidden_size = self.config.encoder_hidden_size
+        hidden_size = self.config.hidden_size
+        output_hidden_size = self.config.output_hidden_size
+        depth = self.config.depth
+        mlp_depth = self.config.mlp_depth
+        RegBlock = partial(
+            RegStage,
+            stride=1,
+            dilation=1,
+            act_layer=nn.SiLU,
+            norm_layer=LayerNorm2d,
+        )
+        s1 = RegBlock(
+            depth,
+            encoder_hidden_size,
+            hidden_size,
+        )
+        sampler = PatchMerge(merge_size=self.merge_size)
+        s2 = RegBlock(
+            depth,
+            self.merge_size**2 * hidden_size,
+            hidden_size,
+        )
+        if depth:
+            self.net = nn.ModuleList([s1, sampler, s2])
+            self.readout = build_mlp(mlp_depth, hidden_size, output_hidden_size)
+        else:
+            self.net = sampler
+            self.readout = build_mlp(mlp_depth, encoder_hidden_size, output_hidden_size)
+    def forward(self, flattened_visual_embeds, grid_thw, **unused_kwargs):
+        n_token_loc = torch.prod(grid_thw, dim=1)
+        split_visual_embeds = torch.split(flattened_visual_embeds, n_token_loc.tolist())
+        flattened_visual_embeds = []
+        for _visual_embeds, _grid_thw in zip(split_visual_embeds, grid_thw):
+            T, H, W = _grid_thw
+            assert T == 1, "T must be 1. Video is not supported yet."
+            reshaped_visual_embeds = rearrange(
+                _visual_embeds, "(t h w) d -> 1 t h w d", t=T, h=H, w=W
+            )
+            # remove temporal dim
+            reshaped_visual_embeds = reshaped_visual_embeds[:, 0]
+            if self.prenorm is not None:
+                reshaped_visual_embeds = self.prenorm(reshaped_visual_embeds)
+            if self.pos_emb is not None:
+                # interpolate pos emb and add to visual embeds
+                _local_pos_emb = resample_abs_pos_embed(
+                    posemb=self.pos_emb,
+                    old_size=tuple([int(self.pos_emb_size**0.5)] * 2),
+                    new_size=(H, W),
+                    num_prefix_tokens=0,
+                )
+                _local_pos_emb = rearrange(
+                    _local_pos_emb,
+                    "1 (h w) d -> 1 h w d",
+                    h=H,
+                    w=W,
+                )
+                reshaped_visual_embeds = reshaped_visual_embeds + _local_pos_emb
+            reshaped_visual_embeds = self._forward(
+                reshaped_visual_embeds,
+                input_size=(H, W),
+            )
+            flattened_visual_embeds.append(reshaped_visual_embeds)
+        reshaped_visual_embeds = torch.cat(flattened_visual_embeds, dim=0)
+        output = BaseModelOutput(last_hidden_state=reshaped_visual_embeds)
+        return output
+    def _forward(self, x, input_size):
+        h, w = input_size
+        x = rearrange(x, "1 h w d -> 1 d h w", h=h, w=w)
+        x = self.net[0](x)
+        x = self.net[1](x)
+        x = self.net[2](x)
+        x = rearrange(x, "1 d h w -> (h w) d")
+        x = self.readout(x)
+        return x
+class CustomQwen2VLVE(Qwen2VisionTransformerPretrainedModel):
+    config_class = Qwen2VLVisionConfig
+    _no_split_modules = ["Qwen2VLVisionBlock"]
+    def __init__(self, config) -> None:
+        Qwen2VLPreTrainedModel.__init__(self, config)
+        self.spatial_merge_size = config.spatial_merge_size
+        self.gradient_checkpointing = False
+        self.patch_embed = PatchEmbed(
+            patch_size=config.patch_size,
+            temporal_patch_size=config.temporal_patch_size,
+            in_channels=config.in_channels,
+            embed_dim=config.embed_dim,
+        )
+        head_dim = config.embed_dim // config.num_heads
+        self.rotary_pos_emb = VisionRotaryEmbedding(head_dim // 2)
+        self.blocks = nn.ModuleList(
+            [Qwen2VLVisionBlock(config, config._attn_implementation) for _ in range(config.depth)]
+        )
+    def forward(
+        self,
+        pixel_values: torch.Tensor,
+        grid_thw: torch.Tensor,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+    ) -> Union[Tuple, BaseModelOutput]:
+        assert return_dict, "Only return_dict=True is supported."
+        encoder_states = () if output_hidden_states else None
+        hidden_states = self.patch_embed(pixel_values)
+        rotary_pos_emb = self.rot_pos_emb(grid_thw)
+        emb = torch.cat((rotary_pos_emb, rotary_pos_emb), dim=-1)
+        position_embeddings = emb.cos(), emb.sin()
+        cu_seqlens = torch.repeat_interleave(
+            grid_thw[:, 1] * grid_thw[:, 2], grid_thw[:, 0]
+        ).cumsum(dim=0, dtype=torch.int32)
+        cu_seqlens = F.pad(cu_seqlens, (1, 0), value=0)
+        for blk in self.blocks:
+            if output_hidden_states:
+                encoder_states = encoder_states + (hidden_states,)
+            if self.gradient_checkpointing and self.training:
+                layer_outputs = torch.utils.checkpoint.checkpoint(
+                    blk.__call__,
+                    hidden_states=hidden_states,
+                    cu_seqlens=cu_seqlens,
+                    position_embeddings=position_embeddings,
+                    use_reentrant=False,
+                )
+            else:
+                layer_outputs = blk(
+                    hidden_states=hidden_states,
+                    cu_seqlens=cu_seqlens,
+                    position_embeddings=position_embeddings,
+                )
+            hidden_states = layer_outputs
+        if output_hidden_states:
+            encoder_states = encoder_states + (hidden_states,)
+        if not return_dict:
+            return tuple(v for v in [hidden_states, encoder_states] if v is not None)
+        return BaseModelOutput(last_hidden_state=hidden_states, hidden_states=encoder_states)
+    def get_num_tokens(self):
+        return -1
+class KananaVPreTrainedModel(PreTrainedModel):
+    """
+    An abstract class to handle weights initialization and
+    a simple interface for downloading and loading pretrained models.
+    """
+    config_class = KananaVConfig
+    base_model_prefix = "kanana-1.5-v"
+    supports_gradient_checkpointing = True
+    _skip_keys_device_placement = "past_key_values"
+    _supports_flash_attn_2 = True
+    _supports_sdpa = True
+    _supports_cache_class = True
+    _supports_static_cache = False
+    _keys_to_ignore_on_load_missing = [
+        r"position_ids",
+        r"language_model.encoder.embed_tokens.weight",
+        r"language_model.decoder.embed_tokens.weight",
+        r"language_model.lm_head.weight",
+    ]
+    _no_split_modules = [
+        "CustomQwen2VLVE",
+        "DynamicCAbstractor",
+        "LlamaForCausalLM",
+        "Parameter",
+    ]
+    def _init_weights(self, module):
+        """Initialize the weights"""
+        if (
+            isinstance(module, nn.Conv2d)
+            or isinstance(module, nn.Embedding)
+            or isinstance(module, nn.Linear)
+        ):
+            module.weight.data.normal_(mean=0.0, std=0.02)
+            if hasattr(module, "bias") and module.bias is not None:
+                module.bias.data.zero_()
+        elif isinstance(module, nn.LayerNorm):
+            module.bias.data.zero_()
+            module.weight.data.fill_(1.0)
+        elif isinstance(module, nn.Parameter):
+            raise ValueError()
+class KananaVForConditionalGeneration(KananaVPreTrainedModel):
+    config_class = KananaVConfig
+    def __init__(self, config: KananaVConfig):
+        super().__init__(config)
+        logger.info("Build vision model ...")
+        self.vision_model = CustomQwen2VLVE._from_config(config.vision_config)
+        logger.info("Build projector ...")
+        self.abstractor = DynamicCAbstractor(config.projector_config,
+                                             num_input_tokens=self.vision_model.get_num_tokens())
+        logger.info("Build language model ...")
+        self.language_model = LlamaForCausalLM._from_config(config=config.text_config)
+        self.post_init()
+    def forward_vision(self, pixel_values, image_metas: Optional[dict] = None):
+        vision_model_args = {
+            "pixel_values": pixel_values,
+            "return_dict": True,
+            "output_hidden_states": True,
+            "grid_thw": image_metas["vision_grid_thw"],
+        }
+        v_outputs = self.vision_model(**vision_model_args)
+        layer_index = self.config.projector_config.feature_layer_index
+        visual_features = self._get_visual_feature_at(v_outputs.hidden_states, layer_index)
+        return visual_features
+    def forward_projector(self, visual_features, image_metas: Optional[dict] = None):
+        assert image_metas is not None
+        visual_embeds = self.abstractor(
+            visual_features,
+            grid_thw=image_metas["vision_grid_thw"],
+        )["last_hidden_state"]
+        return visual_embeds
+    def forward_and_project_vision(self, pixel_values, image_metas: Optional[dict] = None):
+        assert pixel_values is not None
+        visual_features = self.forward_vision(pixel_values, image_metas=image_metas)
+        visual_embeds = self.forward_projector(visual_features, image_metas=image_metas)
+        return visual_embeds
+    def _get_visual_feature_at(self, v_output, layer_index):
+        if isinstance(layer_index, list):
+            visual_features = torch.stack(v_output, dim=1)[:, layer_index]  # [B, n_scales, L, dim]
+        else:
+            visual_features = v_output[layer_index]  # [B, L, dim]
+        return visual_features
+    def embed_text_tokens(self, input_ids):
+        """Embed input_ids into text_embeds, ignoring media tokens (negative values)."""
+        input_ids = input_ids.clone()
+        input_ids[input_ids < 0] = 0
+        text_embeds = self.language_model.get_input_embeddings()(input_ids)
+        if hasattr(self.language_model, "transformer") and hasattr(
+            self.language_model.transformer, "word_embeddings_layernorm"
+        ):
+            text_embeds = self.language_model.transformer.word_embeddings_layernorm(text_embeds)
+        return text_embeds
+    def prepare_mm_inputs(
+        self,
+        input_ids: torch.FloatTensor,
+        pixel_values: Optional[list[torch.FloatTensor]] = None,
+        image_metas: Optional[dict] = None,
+        attention_mask: Optional[torch.LongTensor] = None,
+    ):
+        """Prepare multimodal inputs from input_ids and pixel_values."""
+        if pixel_values is not None:
+            pixel_values = pixel_values.to(self._get_input_dtype())
+        if attention_mask is None:
+            attention_mask = input_ids.new_ones(*input_ids.shape)
+        # Get Text Embeddings
+        text_embeds = self.embed_text_tokens(input_ids)
+        flattened_text_embeds = rearrange(text_embeds, "b l d -> (b l) d")
+        flattened_input_ids = rearrange(input_ids, "b l -> (b l)")
+        # Get Visual Embeddings
+        if pixel_values is not None:
+            flattened_visual_embeds = self.forward_and_project_vision(
+                pixel_values, image_metas
+            )
+            flattened_text_embeds[flattened_input_ids == -1] = flattened_visual_embeds
+        input_embeds = rearrange(
+            flattened_text_embeds, "(b l) d -> b l d", b=input_ids.shape[0]
+        )
+        return_inputs = {
+            "inputs_embeds": input_embeds,
+            "attention_mask": attention_mask,
+        }
+        return return_inputs
+    def forward(
+        self,
+        pixel_values: list[torch.FloatTensor],
+        image_metas: dict[list],
+        input_ids: torch.FloatTensor,
+        seq_length: Optional[torch.LongTensor] = None,
+        attention_mask: Optional[torch.LongTensor] = None,
+        labels: Optional[torch.LongTensor] = None,
+        return_dict: Optional[bool] = None,
+    ):
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        inputs = self.prepare_mm_inputs(
+            input_ids=input_ids,
+            pixel_values=pixel_values,
+            image_metas=image_metas,
+            attention_mask=attention_mask,
+        )
+        outputs = self.language_model(
+            **inputs,
+            labels=labels,
+            position_ids=None,
+            return_dict=return_dict,
+            output_attentions=self.config.output_attentions,
+        )
+        return outputs
+    @torch.no_grad()
+    def generate(
+        self,
+        pixel_values: torch.FloatTensor = None,
+        image_metas: dict[list] = None,
+        input_ids: Optional[torch.LongTensor] = None,
+        attention_mask: Optional[torch.LongTensor] = None,
+        seq_length: Optional[torch.LongTensor] = None,
+        **generate_kwargs,
+    ) -> torch.LongTensor:
+        """
+        Overrides `generate` function to be able to use the model as a conditional generator.
+        Args:
+            pixel_values (`torch.FloatTensor` of shape (batch_size, num_channels, height, width)):
+                Input images to be processed.
+            input_ids (`torch.LongTensor` of shape (batch_size, sequence_length), *optional*):
+                The sequence used as a prompt for the generation.
+            attention_mask (`torch.LongTensor` of shape (batch_size, sequence_length), *optional*):
+                Mask to avoid performing attention on padding token indices
+        Returns:
+            captions (list): A list of strings of length batch_size * num_captions.
+        """
+        if input_ids is None:
+            return self.language_model.generate(attention_mask=attention_mask, **generate_kwargs)
+        if pixel_values is None:
+            return self.language_model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
+        if (
+            image_metas is not None
+            and image_metas.get("vision_grid_thw") is not None
+            and isinstance(image_metas.get("vision_grid_thw"), torch.Tensor)
+        ):
+            image_metas["vision_grid_thw"] = image_metas["vision_grid_thw"].to(input_ids.device)
+        inputs = self.prepare_mm_inputs(
+            input_ids=input_ids,
+            pixel_values=pixel_values,
+            image_metas=image_metas,
+            attention_mask=attention_mask,
+        )
+        outputs = self.language_model.generate(
+            **inputs,
+            **generate_kwargs,
+        )
+        return outputs
+    def _get_input_dtype(self):
+        dtype = next(self.vision_model.parameters()).dtype
+        return dtype

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "max_pixels": 1254400,
+  "merge_size": 2,
+  "min_pixels": 78400,
+  "patch_size": 14
+}

processing.py ADDED Viewed

	@@ -0,0 +1,208 @@

+import logging
+import torch
+from PIL.Image import Image
+from transformers.processing_utils import ProcessorMixin
+logger = logging.getLogger("kanana-1.5-v")
+HUMAN = "Human: "
+AI = "AI: "
+CHAT_TEMPLATE = (
+"""
+{%- if bos_token is defined and bos_token %}
+    {{- bos_token }}
+{%- endif %}
+{%- set intro %}
+The following is a conversation between a curious human and AI assistant. 당신은 Kakao에서 개발된 인공지능 언어모델이고 이름은 kanana입니다.
+Knowledge Cutoff Date: June30, 2024.
+Capabilities and Limitations:
+   - I cannot search for external content such as weather, news, or the current date and time.
+   - If a URL is provided, I cannot access it directly. Insteaed, please copy and provide the relevant content for me to process.
+{%- endset %}
+{{ intro }}
+{{- '\n' }}
+{%- for message in messages %}
+    {%- if message['role'] == 'system' %}
+        {{- message['content'] }}
+    {%- elif message['role'] == 'user' %}
+        {{- '<|USER|>' + message['content'] }}
+    {%- elif message['role'] == 'assistant' %}
+        {{- '<|ASSISTANT|>' + message['content'] + eos_token }}
+    {%- endif %}
+    {%- if not loop.last %}
+        {{- '\n' }}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '\n<|ASSISTANT|>' }}
+{%- endif %}
+""".strip()
+    .replace("<|USER|>", HUMAN)
+    .replace("<|ASSISTANT|>", AI)
+)
+class KananaVProcessor(ProcessorMixin):
+    attributes = ["image_processor", "tokenizer"]
+    valid_kwargs = []
+    image_processor_class = "AutoImageProcessor"
+    tokenizer_class = "AutoTokenizer"
+    def __init__(self, image_processor, tokenizer):
+        super().__init__(image_processor, tokenizer)
+        self.image_processor = image_processor
+        self.tokenizer = tokenizer
+        self.tokenizer.mllm_setup("dynamic")
+    def conv2prompt(
+        self,
+        conv: list[dict] | str,
+        chat_template=CHAT_TEMPLATE,
+        add_generation_prompt=False,
+    ) -> str:
+        """Convert conversation to prompt"""
+        if isinstance(conv, list):
+            prompt = self.tokenizer.apply_chat_template(
+                conversation=conv,
+                tokenize=False,
+                chat_template=chat_template,
+                add_generation_prompt=add_generation_prompt,
+            )
+        elif isinstance(conv, str):
+            prompt = conv
+        else:
+            raise TypeError(f"conv must be list or str, but got {type(conv)}")
+        return prompt
+    def __call__(self, data: dict, max_length, add_generation_prompt=False):
+        return self.encode(data, max_length, add_generation_prompt=add_generation_prompt)
+    def encode(self, data: dict, max_length, add_generation_prompt=False) -> dict:
+        """
+        Args:
+            data (dict): {
+                "conv": [
+                    {"role": "system", "content": "The following is a conversation between a curious human and AI assistant."},
+                    {"role": "user", "content": IMAGE},
+                    {"role": "user", "content": "Hello, how are you?"},
+                    {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
+                    ...
+                ],
+                "image": [
+                    PIL.Image,
+                    ...
+                ]
+            }
+        Return:
+            data (dict): {
+                "text": text_tokens_from_tokenizer,
+                "text_raw": prompt,
+                "image": pixel_values,
+                "image_meta": image_meta (dict of list) includes image resolution, etc.
+            }
+        """
+        assert "images" not in data
+        conv = data["conv"]
+        images: list[Image] = data.get("image")  # PIL images
+        data = {
+            "text": None,
+            "text_raw": None,
+            "image": None,
+            "image_meta": None,
+        }
+        # image
+        if images:
+            processor_output = [
+                self.image_processor(image) for image in images if image
+            ]
+            pixel_values = [
+                processor_output["pixel_values"] for processor_output in processor_output
+            ]
+            image_meta = [processor_output["image_meta"] for processor_output in processor_output]
+            if pixel_values:
+                pixel_values = torch.concat(pixel_values, dim=0)
+                data["image"] = pixel_values
+                data["image_meta"] = {k: [d[k] for d in image_meta] for k in image_meta[0]}
+        # text
+        prompt = self.conv2prompt(conv, add_generation_prompt=add_generation_prompt)
+        text_tokens = self.tokenizer.encode_prompt(
+            prompt,
+            max_length,
+            image_meta=data["image_meta"],
+        )
+        data["text"] = text_tokens
+        data["text_raw"] = prompt
+        return data
+    def batch_encode_collate(
+        self,
+        data_list: list[dict],
+        padding: str = "longest",
+        padding_side: str = "right",
+        max_length: int | None = None,
+        add_generation_prompt=False,
+    ):
+        """Encode batch and collate them"""
+        batch = [
+            self.encode(data, max_length, add_generation_prompt=add_generation_prompt)
+            for data in data_list
+        ]
+        batch = self.collate(
+            batch,
+            padding=padding,
+            padding_side=padding_side,
+            max_length=max_length,
+        )
+        return batch
+    def collate(
+        self,
+        batch,
+        padding,
+        padding_side,
+        max_length,
+    ):
+        """Collate encoded results to model inputs"""
+        text_batch = [data["text"] for data in batch]
+        text_batch = self.tokenizer.batch_collate_pad(
+            text_batch,
+            padding=padding,
+            padding_side=padding_side,
+            max_length=max_length,
+        )
+        image_list = [data["image"] for data in batch if data["image"] is not None]
+        image_meta = [data["image_meta"] for data in batch if data["image_meta"] is not None]
+        if len(image_meta) > 0:
+            image_meta = {
+                k: sum([d[k] for d in image_meta], []) for k in image_meta[0]
+            }
+            if image_meta.get("vision_grid_thw"):
+                image_meta["vision_grid_thw"] = torch.tensor(image_meta["vision_grid_thw"])
+        else:
+            image_meta = None
+        output_batch = text_batch
+        output_batch["pixel_values"] = torch.cat(image_list, dim=0) if len(image_list) > 0 else None
+        output_batch["image_metas"] = image_meta
+        return output_batch
+    def decode(self, *args, **kwargs):
+        return self.tokenizer.decode(*args, **kwargs)
+    def batch_decode(self, *args, **kwargs):
+        return self.tokenizer.batch_decode(*args, **kwargs)

processing_image.py ADDED Viewed

	@@ -0,0 +1,289 @@

+import logging
+import math
+from typing import Optional, Union
+import numpy as np
+import torch
+from einops import rearrange
+from PIL import Image
+from transformers.image_processing_utils import BaseImageProcessor
+from transformers.image_transforms import convert_to_rgb, resize
+from transformers.image_utils import (
+    ChannelDimension,
+    ImageInput,
+    PILImageResampling,
+    get_image_size,
+    infer_channel_dimension_format,
+    is_scaled_image,
+    make_list_of_images,
+    to_numpy_array,
+)
+from transformers.utils.constants import OPENAI_CLIP_MEAN, OPENAI_CLIP_STD
+logger = logging.getLogger("kanana-1.5-v")
+def smart_resize(
+    height: int,
+    width: int,
+    factor: int = 28,
+    min_pixels: int = 56 * 56,
+    max_pixels: int = 14 * 14 * 4 * 1280,
+):
+    """Rescales the image so that the following conditions are met:
+    1. Both dimensions (height and width) are divisible by 'factor'.
+    2. The total number of pixels is within the range ['min_pixels', 'max_pixels'].
+    3. The aspect ratio of the image is maintained as closely as possible.
+    """
+    if height < factor or width < factor:
+        raise ValueError(f"height:{height} or width:{width} must be larger than factor:{factor}")
+    elif max(height, width) / min(height, width) > 200:
+        raise ValueError(
+            f"absolute aspect ratio must be smaller than 200, got {max(height, width) / min(height, width)}"
+        )
+    h_bar = round(height / factor) * factor
+    w_bar = round(width / factor) * factor
+    if h_bar * w_bar > max_pixels:
+        beta = math.sqrt((height * width) / max_pixels)
+        h_bar = math.floor(height / beta / factor) * factor
+        w_bar = math.floor(width / beta / factor) * factor
+    elif h_bar * w_bar < min_pixels:
+        beta = math.sqrt(min_pixels / (height * width))
+        h_bar = math.ceil(height * beta / factor) * factor
+        w_bar = math.ceil(width * beta / factor) * factor
+    return h_bar, w_bar
+class KananaVImageProcessor(BaseImageProcessor):
+    def __init__(
+        self,
+        do_resize: bool = True,
+        do_rescale: bool = True,
+        rescale_factor: Union[int, float] = 1 / 255,
+        do_normalize: bool = True,
+        image_mean: Optional[Union[float, list[float]]] = OPENAI_CLIP_MEAN,
+        image_std: Optional[Union[float, list[float]]] = OPENAI_CLIP_STD,
+        do_convert_rgb: bool = True,
+        min_pixels: int = 56 * 56,
+        max_pixels: int = 14 * 14 * 4 * 1280,
+        patch_size: int = 14,
+        temporal_patch_size: int = 2,
+        merge_size: int = 2,
+        **kwargs,
+    ) -> None:
+        super().__init__(**kwargs)
+        self.do_resize = do_resize
+        self.resample = Image.BICUBIC
+        self.do_rescale = do_rescale
+        self.rescale_factor = rescale_factor
+        self.do_normalize = do_normalize
+        self.image_mean = image_mean if image_mean is not None else OPENAI_CLIP_MEAN
+        self.image_std = image_std if image_std is not None else OPENAI_CLIP_STD
+        self.min_pixels = min_pixels
+        self.max_pixels = max_pixels
+        self.patch_size = patch_size
+        self.temporal_patch_size = temporal_patch_size
+        self.merge_size = merge_size
+        self.size = {"min_pixels": min_pixels, "max_pixels": max_pixels}
+        self.do_convert_rgb = do_convert_rgb
+        self.input_data_format = ChannelDimension.LAST
+    def _preprocess(
+        self,
+        images: Union[ImageInput],
+        do_resize: bool = True,
+        resample: PILImageResampling = None,
+        do_rescale: bool = None,
+        rescale_factor: float = None,
+        do_normalize: bool = None,
+        image_mean: Optional[Union[float, list[float]]] = None,
+        image_std: Optional[Union[float, list[float]]] = None,
+        do_convert_rgb: bool = None,
+        data_format: Optional[ChannelDimension] = ChannelDimension.FIRST,
+        input_data_format: Optional[Union[str, ChannelDimension]] = None,
+    ):
+        """
+        Preprocess an image or batch of images. Copy of the `preprocess` method from `CLIPImageProcessor`.
+        (samuel) From image_processing_qwen2_vl.py
+        Args:
+            images (`ImageInput`):
+                Image or batch of images to preprocess. Expects pixel values ranging from 0 to 255. If pixel values range from 0 to 1, set `do_rescale=False`.
+            do_resize (`bool`, *optional*, defaults to `self.do_resize`):
+                Whether to resize the image.
+            resample (`PILImageResampling`, *optional*, defaults to `self.resample`):
+                Resampling filter to use if resizing the image. This can be one of the `PILImageResampling` enums.
+            do_rescale (`bool`, *optional*, defaults to `self.do_rescale`):
+                Whether to rescale the image.
+            rescale_factor (`float`, *optional*, defaults to `self.rescale_factor`):
+                Scale factor to use if rescaling the image.
+            do_normalize (`bool`, *optional*, defaults to `self.do_normalize`):
+                Whether to normalize the image.
+            image_mean (`float` or `List[float]`, *optional*, defaults to `self.image_mean`):
+                Mean to use if normalizing the image. Can be a float or a list of floats corresponding to the number of channels in the image.
+            image_std (`float` or `List[float]`, *optional*, defaults to `self.image_std`):
+                Standard deviation to use if normalizing the image. Can be a float or a list of floats corresponding to the number of channels in the image.
+            do_convert_rgb (`bool`, *optional*, defaults to `self.do_convert_rgb`):
+                Whether to convert the image to RGB.
+            data_format (`ChannelDimension`, *optional*, defaults to `ChannelDimension.FIRST`):
+                The channel dimension format for the output image. Can be one of:
+                - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format.
+                - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format.
+                - Unset: Use the channel dimension format of the input image.
+            input_data_format (`ChannelDimension` or `str`, *optional*):
+                The channel dimension format for the input image. Can be one of:
+                - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format.
+                - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format.
+                - `"none"` or `ChannelDimension.NONE`: image in (height, width) format.   - `"none"` or `ChannelDimension.NONE`: image in (height, width) format.
+        """
+        images = make_list_of_images(images)
+        if do_convert_rgb:
+            images = [convert_to_rgb(image) for image in images]
+        # All transformations expect numpy arrays.
+        images = [to_numpy_array(image) for image in images]
+        if is_scaled_image(images[0]) and do_rescale:
+            logger.warning_once(
+                "It looks like you are trying to rescale already rescaled images. If the input"
+                " images have pixel values between 0 and 1, set `do_rescale=False` to avoid rescaling them again."
+            )
+        if input_data_format is None:
+            # We assume that all images have the same channel dimension format.
+            input_data_format = infer_channel_dimension_format(images[0])
+        height, width = get_image_size(images[0], channel_dim=input_data_format)
+        resized_height, resized_width = height, width
+        processed_images = []
+        for image in images:
+            if do_resize:
+                resized_height, resized_width = smart_resize(
+                    height,
+                    width,
+                    factor=self.patch_size * self.merge_size,
+                    min_pixels=self.min_pixels,
+                    max_pixels=self.max_pixels,
+                )
+                image = resize(
+                    image,
+                    size=(resized_height, resized_width),
+                    resample=resample,
+                    input_data_format=input_data_format,
+                )
+            if do_rescale:
+                image = self.rescale(
+                    image, scale=rescale_factor, input_data_format=input_data_format
+                )
+            if do_normalize:
+                image = self.normalize(
+                    image=image, mean=image_mean, std=image_std, input_data_format=input_data_format
+                )
+            processed_images.append(image)
+        patches = np.array(processed_images)
+        if data_format == ChannelDimension.LAST:
+            # Convert from (num_images, height, width, num_channels) format.
+            patches = rearrange(patches, "N H W C -> N C H W")
+        if patches.shape[0] == 1:
+            patches = np.tile(patches, (self.temporal_patch_size, 1, 1, 1))
+        grid_t = patches.shape[0] // self.temporal_patch_size
+        grid_h, grid_w = resized_height // self.patch_size, resized_width // self.patch_size
+        flatten_patches = rearrange(
+            patches,
+            "(nT T) C (nH sH H) (nW sW W) -> (nT nH nW sH sW) (C T H W)",
+            T=self.temporal_patch_size,
+            H=self.patch_size,
+            W=self.patch_size,
+            nH=grid_h // self.merge_size,
+            nW=grid_w // self.merge_size,
+            sH=self.merge_size,
+            sW=self.merge_size,
+        )
+        return (
+            flatten_patches,
+            (grid_t, grid_h, grid_w),
+            (resized_height, resized_width),
+            (height, width),
+        )
+    def resize_pil_image(self, image):
+        """
+        if width * height < self.min_pixels:
+            expansion_ratio = np.ceil(1 / np.sqrt((width * height / self.min_pixels)))
+            width, height = (width * expansion_ratio, height * expansion_ratio)
+        """
+        ori_width, ori_height = image.size
+        width, height = (ori_width, ori_height)
+        if min(width, height) < self.patch_size * self.merge_size:
+            scale = self.patch_size * self.merge_size / min(width, height)
+            width, height = (int(width * scale), int(height * scale))
+        if (width, height) != (ori_width, ori_height):
+            image = image.resize((width, height), resample=Image.BICUBIC)
+        return image
+    def __call__(self, image):
+        """
+        Args:
+            image:
+        Return:
+            image_input (tensors): (number of tiles, 3, H, W)
+            hw_tiles (tuple): (height, width) of the tiles
+            hw_best_resolution (tuple): (height, width) of the best resolution (with padding)
+            hw_orig_resolution (tuple): (height, width) of the original resolution
+        """
+        do_resize = self.do_resize
+        resample = self.resample
+        do_rescale = self.do_rescale
+        rescale_factor = self.rescale_factor
+        do_normalize = self.do_normalize
+        image_mean = self.image_mean
+        image_std = self.image_std
+        do_convert_rgb = self.do_convert_rgb
+        input_data_format = self.input_data_format
+        if image is not None:
+            # resize imagee if the shortest side is smaller than patch_size * merge_size
+            image = self.resize_pil_image(image)
+            patches, image_grid_thw, resized_hw, original_hw = self._preprocess(
+                images=image,
+                do_resize=do_resize,
+                resample=resample,
+                do_rescale=do_rescale,
+                rescale_factor=rescale_factor,
+                do_normalize=do_normalize,
+                image_mean=image_mean,
+                image_std=image_std,
+                do_convert_rgb=do_convert_rgb,
+                input_data_format=input_data_format,
+                data_format=ChannelDimension.LAST,
+            )
+            pixel_values = torch.tensor(patches)
+            image_meta = {
+                "vision_grid_thw": image_grid_thw,
+                "hw_best_resolution": resized_hw,
+                "hw_orig_resolution": original_hw,
+                "image_token_thw": (
+                    image_grid_thw[0],
+                    image_grid_thw[1] // self.merge_size,
+                    image_grid_thw[2] // self.merge_size,
+                ),
+            }
+        else:
+            pixel_values = None
+            image_meta = None
+        return {
+            "pixel_values": pixel_values,
+            "image_meta": image_meta,
+        }

tokenization.py ADDED Viewed

	@@ -0,0 +1,240 @@

+import logging
+import re
+from typing import Optional
+import torch
+from transformers import PreTrainedTokenizer, PreTrainedTokenizerFast
+# Role tokens
+AI = "AI: "
+HUMAN = "Human: "
+_AI = "\n" + AI
+_HUMAN = "\n" + HUMAN
+# special media tokens
+IMAGE = "<image>"
+IMAGE_ROW_SEPARATOR = "\n"
+IMAGE_GLOBAL_LOCAL_SEPARATOR = "\n"
+MEDIA_TOKENS = {
+    "image": [IMAGE],
+}
+_INFINITE = int(1e12)  # infinite token length for no-truncation
+logger = logging.getLogger("kanana-1.5-v")
+def _pad_trunc(
+    x: list[list[int]],
+    padding: str,
+    padding_side: str,
+    pad_value: int,
+    max_length: int,
+) -> torch.LongTensor:
+    """Pad and truncate sequences to the same length
+    Args:
+        x (list[list[int]])
+        padding ("longest" or "max_length")
+        padding_side ("left" or "right")
+        pad_value (int)
+        max_length (int or None): if padding == "max_length", max_length should be given.
+    """
+    assert padding in ["longest", "max_length"]
+    assert padding_side in ["left", "right"]
+    lengths = [len(sample) for sample in x]
+    if padding == "longest":
+        max_length = max(lengths)
+    new_x = []
+    for sample, length in zip(x, lengths):
+        if torch.is_tensor(sample):
+            sample = sample.tolist()
+        if length >= max_length:
+            new_x.append(sample[:max_length])
+            continue
+        padding_size = max_length - length
+        pads = [pad_value] * padding_size
+        if padding_side == "right":
+            new_x.append(sample + pads)
+        else:
+            new_x.append(pads + sample)
+    return torch.as_tensor(new_x, dtype=torch.long)
+class KananaVTokenizerMixin:
+    def mllm_setup(self, num_visual_tokens: int):
+        self.num_visual_tokens = num_visual_tokens
+        # Currently we only support the image modality for media modality.
+        self.media_tokens = {k: -int(i + 1) for i, k in enumerate(MEDIA_TOKENS["image"])}
+        self.media_lengths = {MEDIA_TOKENS["image"][0]: num_visual_tokens}
+    def repeat_image_tokens(
+        self, hw_tokens, with_row_separator=True, add_global_local_separator=False
+    ):
+        if len(hw_tokens) == 3:
+            T, H, W = hw_tokens
+        else:
+            H, W = hw_tokens
+        repeated_tokens = []
+        if add_global_local_separator:
+            global_local_separator = self(IMAGE_GLOBAL_LOCAL_SEPARATOR, add_special_tokens=False)[
+                "input_ids"
+            ]
+            repeated_tokens += global_local_separator
+        if with_row_separator:
+            row_sep = self(IMAGE_ROW_SEPARATOR, add_special_tokens=False)["input_ids"]
+        for h_idx in range(H):
+            repeated_tokens += [self.media_tokens[IMAGE]] * W
+            if with_row_separator and h_idx != H - 1:
+                repeated_tokens += row_sep
+        return repeated_tokens
+    def encode_text_only(self, prompt: str, add_special_tokens: bool = False) -> list:
+        # Text-only Data
+        # split prompt into chunks by role tokens
+        tokens_to_split = [_AI, _HUMAN]
+        pattern = "|".join(map(re.escape, tokens_to_split))
+        chunk_strs = re.split(f"({pattern})", prompt)
+        chunk_strs = [x for x in chunk_strs if len(x) > 0]
+        enc_chunk = []
+        for idx, chunk_str in enumerate(chunk_strs):
+            curr_chunk = self(chunk_str, add_special_tokens=False)["input_ids"]
+            enc_chunk += curr_chunk
+        return enc_chunk
+    def encode_prompt(
+        self, prompt: str, max_length: int | None = None, image_meta: dict | None = None
+    ) -> dict:
+        """Tokenize prompt which consists of image-text or text only, with role tokens.
+        Role pattern is "AI: " or "Human: ".
+        Args:
+            prompt
+            max_length (int or None): here, max_length is used for truncation.
+                If max_length is None, no truncation is applied.
+        """
+        max_length = max_length or _INFINITE  # if None, set to infinite for no-truncation
+        # output enc_chunk
+        enc_chunk = []
+        # Text-only or Image-Text Data
+        # split prompt into chunks by media and role tokens
+        tokens_to_split = list(self.media_tokens.keys()) + [_AI, _HUMAN]
+        pattern = "|".join(map(re.escape, tokens_to_split))
+        chunk_strs = re.split(f"({pattern})", prompt)
+        chunk_strs = [x for x in chunk_strs if len(x) > 0]
+        # tokenize chunks
+        img_idx = 0  # for sync with image_meta
+        for idx, chunk_str in enumerate(chunk_strs):
+            if chunk_str in self.media_tokens:
+                if chunk_str == IMAGE:
+                    image_token_thw = (
+                        image_meta["image_token_thw"][img_idx]
+                        if image_meta.get("image_token_thw")
+                        else None
+                    )
+                    media_tokens = self.repeat_image_tokens(
+                        image_token_thw,
+                        with_row_separator=True,
+                        add_global_local_separator=True,
+                    )
+                    # increment image index
+                    img_idx += 1
+                else:
+                    raise ValueError("Unknown chunk str", chunk_str)
+                enc_chunk += media_tokens
+            else:
+                curr_chunk = self(chunk_str, add_special_tokens=False)["input_ids"]
+                enc_chunk += curr_chunk
+        L = len(enc_chunk)
+        input_ids = torch.as_tensor(enc_chunk, dtype=torch.long)
+        attention_mask = torch.ones_like(input_ids)
+        assert L <= max_length, (
+            f"[Length exceeded] Input sequence length ({L}) is greater than "
+            f"the allowed max_length ({max_length}). "
+            "Please truncate the sequence or increase max_length."
+        )
+        return {
+            "input_ids": input_ids,  # [L]
+            "seq_length": L,  # int
+            "attention_mask": attention_mask,  # [L]
+        }
+    def batch_collate_pad(
+        self,
+        batch: list,
+        padding: str,
+        padding_side: str,
+        max_length: int | None,
+    ) -> dict[str, torch.LongTensor]:
+        """Collate batch and pad/truncate to the same length
+        Args:
+            batch
+            padding ("longest" or "max_length")
+            padding_side ("left" or "right")
+            pad_value (int)
+            max_length (int or None): if padding == "max_length", max_length should be given
+        """
+        if padding == "max_length":
+            assert max_length is not None, "max_length should be given if padding == 'max_length'"
+        else:
+            # if padding == 'longest' and max_length is None, set to infinite for no-truncation
+            max_length = max_length or _INFINITE
+        input_ids = [sample["input_ids"] for sample in batch]
+        attention_mask = [sample["attention_mask"] for sample in batch]
+        seq_length = [sample["seq_length"] for sample in batch]
+        input_ids = _pad_trunc(input_ids, padding, padding_side, self.pad_token_id, max_length)
+        attention_mask = _pad_trunc(attention_mask, padding, padding_side, 0, max_length)
+        seq_length = torch.as_tensor(seq_length, dtype=torch.long)
+        return {
+            "input_ids": input_ids,
+            "attention_mask": attention_mask,
+            "seq_length": seq_length,
+        }
+    def get_chat_template(self) -> str:
+        """Method for bw-compat: old HF transformers (e.g., 4.41.0) does not have get_chat_template
+        """
+        return self.chat_template
+class KananaVTokenizer(PreTrainedTokenizer, KananaVTokenizerMixin):
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+    def encode(self, text, add_special_tokens=False) -> list:
+        return self.encode_text_only(prompt=text, add_special_tokens=add_special_tokens)
+class KananaVTokenizerFast(PreTrainedTokenizerFast, KananaVTokenizerMixin):
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+    def encode(self, text, add_special_tokens=False) -> list:
+        return self.encode_text_only(prompt=text, add_special_tokens=add_special_tokens)

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d44e2a3cfdfa7530be35f0d72c39b37ff438d4a1e69cc285b3ee461987d0bfa7
+size 17210623

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,2095 @@

+{
+  "auto_map": {
+    "AutoTokenizer": ["tokenization.KananaVTokenizer", "tokenization.KananaVTokenizerFast"]
+  },
+  "added_tokens_decoder": {
+    "128000": {
+      "content": "<|begin_of_text|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128001": {
+      "content": "<|end_of_text|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128002": {
+      "content": "<|reserved_special_token_0|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128003": {
+      "content": "<|reserved_special_token_1|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128004": {
+      "content": "<|reserved_special_token_2|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128005": {
+      "content": "<|reserved_special_token_3|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128006": {
+      "content": "<|start_header_id|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128007": {
+      "content": "<|end_header_id|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128008": {
+      "content": "<|reserved_special_token_4|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128009": {
+      "content": "<|eot_id|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128010": {
+      "content": "<|reserved_special_token_5|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128011": {
+      "content": "<|reserved_special_token_6|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128012": {
+      "content": "<|reserved_special_token_7|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128013": {
+      "content": "<|reserved_special_token_8|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128014": {
+      "content": "<|reserved_special_token_9|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128015": {
+      "content": "<|reserved_special_token_10|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128016": {
+      "content": "<|reserved_special_token_11|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128017": {
+      "content": "<|reserved_special_token_12|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128018": {
+      "content": "<|reserved_special_token_13|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128019": {
+      "content": "<|reserved_special_token_14|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128020": {
+      "content": "<|reserved_special_token_15|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128021": {
+      "content": "<|reserved_special_token_16|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128022": {
+      "content": "<|reserved_special_token_17|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128023": {
+      "content": "<|reserved_special_token_18|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128024": {
+      "content": "<|reserved_special_token_19|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128025": {
+      "content": "<|reserved_special_token_20|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128026": {
+      "content": "<|reserved_special_token_21|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128027": {
+      "content": "<|reserved_special_token_22|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128028": {
+      "content": "<|reserved_special_token_23|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128029": {
+      "content": "<|reserved_special_token_24|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128030": {
+      "content": "<|reserved_special_token_25|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128031": {
+      "content": "<|reserved_special_token_26|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128032": {
+      "content": "<|reserved_special_token_27|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128033": {
+      "content": "<|reserved_special_token_28|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128034": {
+      "content": "<|reserved_special_token_29|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128035": {
+      "content": "<|reserved_special_token_30|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128036": {
+      "content": "<|reserved_special_token_31|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128037": {
+      "content": "<|reserved_special_token_32|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128038": {
+      "content": "<|reserved_special_token_33|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128039": {
+      "content": "<|reserved_special_token_34|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128040": {
+      "content": "<|reserved_special_token_35|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128041": {
+      "content": "<|reserved_special_token_36|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128042": {
+      "content": "<|reserved_special_token_37|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128043": {
+      "content": "<|reserved_special_token_38|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128044": {
+      "content": "<|reserved_special_token_39|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128045": {
+      "content": "<|reserved_special_token_40|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128046": {
+      "content": "<|reserved_special_token_41|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128047": {
+      "content": "<|reserved_special_token_42|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128048": {
+      "content": "<|reserved_special_token_43|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128049": {
+      "content": "<|reserved_special_token_44|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128050": {
+      "content": "<|reserved_special_token_45|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128051": {
+      "content": "<|reserved_special_token_46|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128052": {
+      "content": "<|reserved_special_token_47|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128053": {
+      "content": "<|reserved_special_token_48|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128054": {
+      "content": "<|reserved_special_token_49|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128055": {
+      "content": "<|reserved_special_token_50|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128056": {
+      "content": "<|reserved_special_token_51|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128057": {
+      "content": "<|reserved_special_token_52|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128058": {
+      "content": "<|reserved_special_token_53|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128059": {
+      "content": "<|reserved_special_token_54|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128060": {
+      "content": "<|reserved_special_token_55|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128061": {
+      "content": "<|reserved_special_token_56|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128062": {
+      "content": "<|reserved_special_token_57|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128063": {
+      "content": "<|reserved_special_token_58|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128064": {
+      "content": "<|reserved_special_token_59|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128065": {
+      "content": "<|reserved_special_token_60|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128066": {
+      "content": "<|reserved_special_token_61|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128067": {
+      "content": "<|reserved_special_token_62|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128068": {
+      "content": "<|reserved_special_token_63|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128069": {
+      "content": "<|reserved_special_token_64|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128070": {
+      "content": "<|reserved_special_token_65|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128071": {
+      "content": "<|reserved_special_token_66|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128072": {
+      "content": "<|reserved_special_token_67|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128073": {
+      "content": "<|reserved_special_token_68|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128074": {
+      "content": "<|reserved_special_token_69|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128075": {
+      "content": "<|reserved_special_token_70|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128076": {
+      "content": "<|reserved_special_token_71|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128077": {
+      "content": "<|reserved_special_token_72|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128078": {
+      "content": "<|reserved_special_token_73|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128079": {
+      "content": "<|reserved_special_token_74|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128080": {
+      "content": "<|reserved_special_token_75|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128081": {
+      "content": "<|reserved_special_token_76|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128082": {
+      "content": "<|reserved_special_token_77|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128083": {
+      "content": "<|reserved_special_token_78|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128084": {
+      "content": "<|reserved_special_token_79|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128085": {
+      "content": "<|reserved_special_token_80|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128086": {
+      "content": "<|reserved_special_token_81|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128087": {
+      "content": "<|reserved_special_token_82|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128088": {
+      "content": "<|reserved_special_token_83|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128089": {
+      "content": "<|reserved_special_token_84|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128090": {
+      "content": "<|reserved_special_token_85|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128091": {
+      "content": "<|reserved_special_token_86|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128092": {
+      "content": "<|reserved_special_token_87|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128093": {
+      "content": "<|reserved_special_token_88|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128094": {
+      "content": "<|reserved_special_token_89|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128095": {
+      "content": "<|reserved_special_token_90|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128096": {
+      "content": "<|reserved_special_token_91|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128097": {
+      "content": "<|reserved_special_token_92|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128098": {
+      "content": "<|reserved_special_token_93|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128099": {
+      "content": "<|reserved_special_token_94|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128100": {
+      "content": "<|reserved_special_token_95|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128101": {
+      "content": "<|reserved_special_token_96|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128102": {
+      "content": "<|reserved_special_token_97|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128103": {
+      "content": "<|reserved_special_token_98|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128104": {
+      "content": "<|reserved_special_token_99|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128105": {
+      "content": "<|reserved_special_token_100|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128106": {
+      "content": "<|reserved_special_token_101|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128107": {
+      "content": "<|reserved_special_token_102|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128108": {
+      "content": "<|reserved_special_token_103|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128109": {
+      "content": "<|reserved_special_token_104|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128110": {
+      "content": "<|reserved_special_token_105|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128111": {
+      "content": "<|reserved_special_token_106|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128112": {
+      "content": "<|reserved_special_token_107|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128113": {
+      "content": "<|reserved_special_token_108|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128114": {
+      "content": "<|reserved_special_token_109|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128115": {
+      "content": "<|reserved_special_token_110|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128116": {
+      "content": "<|reserved_special_token_111|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128117": {
+      "content": "<|reserved_special_token_112|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128118": {
+      "content": "<|reserved_special_token_113|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128119": {
+      "content": "<|reserved_special_token_114|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128120": {
+      "content": "<|reserved_special_token_115|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128121": {
+      "content": "<|reserved_special_token_116|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128122": {
+      "content": "<|reserved_special_token_117|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128123": {
+      "content": "<|reserved_special_token_118|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128124": {
+      "content": "<|reserved_special_token_119|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128125": {
+      "content": "<|reserved_special_token_120|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128126": {
+      "content": "<|reserved_special_token_121|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128127": {
+      "content": "<|reserved_special_token_122|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128128": {
+      "content": "<|reserved_special_token_123|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128129": {
+      "content": "<|reserved_special_token_124|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128130": {
+      "content": "<|reserved_special_token_125|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128131": {
+      "content": "<|reserved_special_token_126|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128132": {
+      "content": "<|reserved_special_token_127|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128133": {
+      "content": "<|reserved_special_token_128|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128134": {
+      "content": "<|reserved_special_token_129|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128135": {
+      "content": "<|reserved_special_token_130|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128136": {
+      "content": "<|reserved_special_token_131|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128137": {
+      "content": "<|reserved_special_token_132|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128138": {
+      "content": "<|reserved_special_token_133|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128139": {
+      "content": "<|reserved_special_token_134|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128140": {
+      "content": "<|reserved_special_token_135|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128141": {
+      "content": "<|reserved_special_token_136|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128142": {
+      "content": "<|reserved_special_token_137|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128143": {
+      "content": "<|reserved_special_token_138|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128144": {
+      "content": "<|reserved_special_token_139|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128145": {
+      "content": "<|reserved_special_token_140|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128146": {
+      "content": "<|reserved_special_token_141|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128147": {
+      "content": "<|reserved_special_token_142|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128148": {
+      "content": "<|reserved_special_token_143|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128149": {
+      "content": "<|reserved_special_token_144|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128150": {
+      "content": "<|reserved_special_token_145|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128151": {
+      "content": "<|reserved_special_token_146|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128152": {
+      "content": "<|reserved_special_token_147|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128153": {
+      "content": "<|reserved_special_token_148|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128154": {
+      "content": "<|reserved_special_token_149|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128155": {
+      "content": "<|reserved_special_token_150|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128156": {
+      "content": "<|reserved_special_token_151|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128157": {
+      "content": "<|reserved_special_token_152|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128158": {
+      "content": "<|reserved_special_token_153|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128159": {
+      "content": "<|reserved_special_token_154|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128160": {
+      "content": "<|reserved_special_token_155|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128161": {
+      "content": "<|reserved_special_token_156|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128162": {
+      "content": "<|reserved_special_token_157|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128163": {
+      "content": "<|reserved_special_token_158|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128164": {
+      "content": "<|reserved_special_token_159|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128165": {
+      "content": "<|reserved_special_token_160|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128166": {
+      "content": "<|reserved_special_token_161|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128167": {
+      "content": "<|reserved_special_token_162|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128168": {
+      "content": "<|reserved_special_token_163|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128169": {
+      "content": "<|reserved_special_token_164|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128170": {
+      "content": "<|reserved_special_token_165|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128171": {
+      "content": "<|reserved_special_token_166|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128172": {
+      "content": "<|reserved_special_token_167|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128173": {
+      "content": "<|reserved_special_token_168|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128174": {
+      "content": "<|reserved_special_token_169|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128175": {
+      "content": "<|reserved_special_token_170|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128176": {
+      "content": "<|reserved_special_token_171|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128177": {
+      "content": "<|reserved_special_token_172|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128178": {
+      "content": "<|reserved_special_token_173|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128179": {
+      "content": "<|reserved_special_token_174|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128180": {
+      "content": "<|reserved_special_token_175|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128181": {
+      "content": "<|reserved_special_token_176|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128182": {
+      "content": "<|reserved_special_token_177|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128183": {
+      "content": "<|reserved_special_token_178|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128184": {
+      "content": "<|reserved_special_token_179|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128185": {
+      "content": "<|reserved_special_token_180|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128186": {
+      "content": "<|reserved_special_token_181|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128187": {
+      "content": "<|reserved_special_token_182|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128188": {
+      "content": "<|reserved_special_token_183|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128189": {
+      "content": "<|reserved_special_token_184|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128190": {
+      "content": "<|reserved_special_token_185|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128191": {
+      "content": "<|reserved_special_token_186|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128192": {
+      "content": "<|reserved_special_token_187|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128193": {
+      "content": "<|reserved_special_token_188|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128194": {
+      "content": "<|reserved_special_token_189|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128195": {
+      "content": "<|reserved_special_token_190|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128196": {
+      "content": "<|reserved_special_token_191|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128197": {
+      "content": "<|reserved_special_token_192|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128198": {
+      "content": "<|reserved_special_token_193|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128199": {
+      "content": "<|reserved_special_token_194|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128200": {
+      "content": "<|reserved_special_token_195|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128201": {
+      "content": "<|reserved_special_token_196|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128202": {
+      "content": "<|reserved_special_token_197|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128203": {
+      "content": "<|reserved_special_token_198|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128204": {
+      "content": "<|reserved_special_token_199|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128205": {
+      "content": "<|reserved_special_token_200|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128206": {
+      "content": "<|reserved_special_token_201|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128207": {
+      "content": "<|reserved_special_token_202|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128208": {
+      "content": "<|reserved_special_token_203|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128209": {
+      "content": "<|reserved_special_token_204|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128210": {
+      "content": "<|reserved_special_token_205|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128211": {
+      "content": "<|reserved_special_token_206|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128212": {
+      "content": "<|reserved_special_token_207|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128213": {
+      "content": "<|reserved_special_token_208|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128214": {
+      "content": "<|reserved_special_token_209|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128215": {
+      "content": "<|reserved_special_token_210|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128216": {
+      "content": "<|reserved_special_token_211|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128217": {
+      "content": "<|reserved_special_token_212|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128218": {
+      "content": "<|reserved_special_token_213|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128219": {
+      "content": "<|reserved_special_token_214|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128220": {
+      "content": "<|reserved_special_token_215|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128221": {
+      "content": "<|reserved_special_token_216|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128222": {
+      "content": "<|reserved_special_token_217|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128223": {
+      "content": "<|reserved_special_token_218|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128224": {
+      "content": "<|reserved_special_token_219|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128225": {
+      "content": "<|reserved_special_token_220|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128226": {
+      "content": "<|reserved_special_token_221|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128227": {
+      "content": "<|reserved_special_token_222|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128228": {
+      "content": "<|reserved_special_token_223|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128229": {
+      "content": "<|reserved_special_token_224|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128230": {
+      "content": "<|reserved_special_token_225|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128231": {
+      "content": "<|reserved_special_token_226|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128232": {
+      "content": "<|reserved_special_token_227|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128233": {
+      "content": "<|reserved_special_token_228|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128234": {
+      "content": "<|reserved_special_token_229|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128235": {
+      "content": "<|reserved_special_token_230|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128236": {
+      "content": "<|reserved_special_token_231|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128237": {
+      "content": "<|reserved_special_token_232|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128238": {
+      "content": "<|reserved_special_token_233|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128239": {
+      "content": "<|reserved_special_token_234|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128240": {
+      "content": "<|reserved_special_token_235|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128241": {
+      "content": "<|reserved_special_token_236|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128242": {
+      "content": "<|reserved_special_token_237|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128243": {
+      "content": "<|reserved_special_token_238|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128244": {
+      "content": "<|reserved_special_token_239|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128245": {
+      "content": "<|reserved_special_token_240|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128246": {
+      "content": "<|reserved_special_token_241|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128247": {
+      "content": "<|reserved_special_token_242|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128248": {
+      "content": "<|reserved_special_token_243|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128249": {
+      "content": "<|reserved_special_token_244|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128250": {
+      "content": "<|reserved_special_token_245|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128251": {
+      "content": "<|reserved_special_token_246|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128252": {
+      "content": "<|reserved_special_token_247|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128253": {
+      "content": "<|reserved_special_token_248|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128254": {
+      "content": "<|reserved_special_token_249|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128255": {
+      "content": "<|reserved_special_token_250|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128256": {
+      "content": "<|eom_id|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128257": {
+      "content": "<|python_tag|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "128258": {
+      "content": "<|NONE|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<|begin_of_text|>",
+  "chat_template": "{# version=v3-llama3.1 #}{%- macro append_new_param_info(param_declaration, comment_info, examples_info, depth) -%}\n    {%- set offset = \"\" -%}\n    {%- if depth >= 1 -%}\n        {%- set offset = \"    \" * depth -%}\n    {%- endif -%}\n    {%- if comment_info != \"<|NONE|>\" -%}\n        {{ \"\\n\" + offset + comment_info }}\n        {%- if examples_info | length > 0 -%}\n            {# Append each example info #}\n            {%- for example in examples_info -%}\n                {{ \"\\n\" + offset + \"// \" + example|string|replace(\"'\", '\"') }}\n            {%- endfor -%}\n        {%- endif -%}\n    {%- endif -%}\n    {{ \"\\n\" + offset + param_declaration }}\n{%- endmacro -%}\n\n{%- macro convert_data_type(param_type) -%}\n    {%- if param_type == \"integer\" or param_type == \"float\" -%}\n        {{ \"number\" }}\n    {%- else -%}\n        {{ param_type }}\n    {%- endif -%}\n{%- endmacro -%}\n\n{%- macro get_param_type(param) -%}\n    {%- set param_type = \"any\" -%}\n\n    {%- if \"type\" in param -%}\n        {%- set raw_param_type = param[\"type\"] -%}\n        {%- if raw_param_type is iterable and raw_param_type is not string -%}\n            {%- set param_type = raw_param_type | join(\" | \") -%}\n        {%- else -%}\n            {%- set param_type = raw_param_type -%}\n        {%- endif -%}\n        {{ convert_data_type(param_type) }}\n    {%- elif \"oneOf\" in param -%}\n        {%- set one_of_types = param[\"oneOf\"]|selectattr(\"type\", \"defined\")|list -%}\n        {%- set one_of_types = one_of_types|map(attribute=\"type\")|unique|list -%}\n        {{ convert_data_type(one_of_types | join(\" | \")) }}\n    {%- endif -%}\n{%- endmacro -%}\n\n{%- macro get_format_param(param) -%}\n    {%- if \"format\" in param -%}\n        {{ param[\"format\"] }}\n    {%- elif \"oneOf\" in param -%}\n        {%- set formats = [] -%}\n        {%- for item in param[\"oneOf\"] -%}\n            {%- if \"format\" in item -%}\n                {%- if item[\"format\"] == param[\"oneOf\"][-1][\"format\"] -%}\n                    {{ item[\"format\"] }}\n                {%- else -%}\n                    {{ item[\"format\"] + \" or \"}}\n                {%- endif -%}\n            {%- endif -%}\n        {%- endfor -%}\n    {%- else -%}\n        {{ \"<|NONE|>\" }}\n    {%- endif -%}\n{%- endmacro -%}\n\n{%- macro get_param_info(param) -%}\n    {%- set param_type = param.get(\"type\", \"any\") -%}\n    {%- set format_param = get_format_param(param) -%}\n\n    {%- if \"description\" in param or \"default\" in param or format_param != \"<|NONE|>\" or param[\"maximum\"] or param[\"minimum\"] or param[\"maxLength\"] or param[\"minLength\"] -%}\n        {{ \"//\" }}\n        {%- if \"description\" in param -%}\n            {%- set desc = param[\"description\"] -%}\n            {%- if not desc.endswith(\".\") -%}\n                {%- set desc = desc + \".\" -%}\n            {%- endif -%}\n            {{ \" \" + desc }}\n        {%- endif -%}\n\n        {%- if \"default\" in param -%}\n            {%- set default_value = param[\"default\"] -%}\n            {%- if param_type == \"string\" -%}\n                {%- set default_value = '\"' ~ default_value ~ '\"' -%}\n            {%- endif -%}\n            {{ \" Default=\" ~ default_value ~ \".\" }}\n        {%- endif -%}\n\n        {%- set format_param = get_format_param(param) -%}\n        {%- if format_param != \"<|NONE|>\" -%}\n            {{ \" Format=\" ~ format_param }}\n        {%- endif -%}\n\n        {%- for field, field_name in [(\"maximum\", \"Maximum\"), (\"minimum\", \"Minimum\"), (\"maxLength\", \"Maximum length\"), (\"minLength\", \"Minimum length\")] -%}\n            {%- if field in param -%}\n                {{ \" \" + field_name ~ \"=\" ~ param[field] }}\n            {%- endif -%}\n        {%- endfor -%}\n    {%- else -%}\n        {{ \"<|NONE|>\"}}\n    {%- endif -%}\n{%- endmacro -%}\n\n{%- macro get_enum_option_str(enum_options) -%}\n    {%- for v in enum_options -%}\n        {%- if v is string -%}\n            {{ '\"' + v + '\"' }}\n        {%- else -%}\n            {{ v }}\n        {%- endif -%}\n        {%- if enum_options|length > 0 and v != enum_options[-1] -%}\n            {{ \" | \" }}\n        {%- endif -%}\n    {%- endfor -%}\n{%- endmacro -%}\n\n{%- macro get_array_typescript(param_name, param_dic, depth) -%}\n    {%- set offset = '' -%}\n    {%- if depth >= 1 -%}\n        {%- set offset = \"    \" * depth -%}\n    {%- endif -%}\n    {%- set items_info = param_dic.get('items', {}) -%}\n\n    {%- if items_info|length == 0 -%}\n        {%- if param_name -%}\n            {{ \"\\n\" + offset + param_name + \": []\" }}\n        {%- else -%}\n            {{ \"\\n\" + offset + \"[]\" }}\n        {%- endif -%}\n    {%- else -%}\n        {%- set array_type = get_param_type(items_info) -%}\n        {%- if array_type == 'object' -%}\n            {%- if param_name -%}\n                {{ \"\\n\" + offset + param_name + \": {\" }}\n            {%- else -%}\n                {{ \"\\n\" + offset + \"{\" }}\n            {%- endif -%}\n            {{ get_parameter_typescript(items_info.get('properties', {}), items_info.get('required', []), depth + 1) -}}\n            {{- \"\\n\" + offset + \"}[]\" }}\n        {%- elif array_type == 'array' -%}\n            {%- set item_info = get_array_typescript(None, items_info, depth + 1) -%}\n            {%- if not param_name -%}\n                {{ \"\\n\" + item_info + \"[]\" }}\n            {%- else -%}\n                {{ \"\\n\" + offset + param_name + \": \" + item_info|trim + \"[]\" }}\n            {%- endif -%}\n        {%- else -%}\n            {%- if 'enum' in items_info -%}\n                {%- set item_type = get_enum_option_str(items_info['enum']) -%}\n                {%- if param_name is none -%}\n                    {{ \"(\" + item_type + \")[]\"}}\n                {%- else -%}\n                    {{ \"\\n\" + offset + param_name + \": (\" + item_type + \")[]\" }}\n                {%- endif -%}\n            {%- else -%}\n                {%- if param_name is none -%}\n                    {{ \"\\n\" + array_type + \"[]\" }}\n                {%- else -%}\n                    {{ \"\\n\" + offset + param_name + \": \" + array_type + \"[],\" }}\n                {%- endif -%}\n            {%- endif -%}\n        {%- endif -%}\n    {%- endif -%}\n{%- endmacro -%}\n\n{%- macro get_parameter_typescript(properties, required_params, depth=0) -%}\n    {%- set res = \"\" -%}\n    {%- for param_name, param in properties.items() -%}\n        {%- if param is mapping -%}\n            {%- set comment_info = get_param_info(param) -%}\n            {# Param Examples #}\n            {%- set examples_info = [] -%}\n            {%- if \"examples\" in param -%}\n                {%- set examples_info = [\"Example \" + param_name + \":\"] -%}\n                {%- set examples_info = examples_info + param[\"examples\"] -%}\n            {%- endif -%}\n\n            {# Param Name declaration #}\n            {%- set param_declaration = param_name -%}\n            {%- if required_params is iterable and param_name not in required_params -%}\n                {%- set param_declaration = param_declaration + \"?\" -%}\n            {%- endif -%}\n\n            {%- set param_type = get_param_type(param) -%}\n\n            {# Handle indentation based on depth #}\n            {%- set offset = \"\" -%}\n            {%- if depth >= 1 -%}\n                {%- set offset = \"    \" * depth -%}\n            {%- endif -%}\n\n            {%- if param_type == \"object\" -%}\n                {%- if comment_info != \"<|NONE|>\" -%}\n                    {{ \"\\n\" + offset + comment_info }}\n                {%- endif -%}\n                {%- if examples_info|length > 0 -%}\n                    {%- for example in examples_info -%}\n                        {{ \"\\n\" + offset + \"// \" + example|string|replace(\"'\", '\"') }}\n                    {%- endfor -%}\n                {%- endif -%}\n                {%- set param_declaration = param_declaration + \": {\" -%}\n                {{ \"\\n\" + offset + param_declaration -}}\n                {{- get_parameter_typescript(param.get(\"properties\", {}), param.get(\"required\", []), depth + 1) -}}\n                {{- \"\\n\" + offset + \"},\" }}\n            {%- elif param_type == \"array\" -%}\n                {%- set item_info = param.get(\"items\", {}) -%}\n                {%- if \"type\" not in item_info -%}\n                    {%- set param_declaration = param_declaration + \": [],\" -%}\n                    {{ append_new_param_info(param_declaration, comment_info, examples_info, depth) }}\n                {%- else -%}\n                    {%- if comment_info != \"<|NONE|>\" -%}\n                        {{ \"\\n\" + offset + comment_info }}\n                    {%- endif -%}\n                    {%- if examples_info|length > 0 -%}\n                        {%- for example in examples_info -%}\n                            {{ \"\\n\" + offset + \"// \" + example|string|replace(\"'\", '\"') }}\n                        {%- endfor -%}\n                    {%- endif -%}\n                    {%- set array_declaration = get_array_typescript(param_declaration, param, depth) -%}\n                    {%- if not array_declaration.endswith(\",\") -%}\n                        {%- set array_declaration = array_declaration + \",\" -%}\n                    {%- endif -%}\n                    {{ array_declaration}}\n                {%- endif -%}\n            {%- else -%}\n                {%- if \"enum\" in param -%}\n                    {%- set param_type = get_enum_option_str(param[\"enum\"]) -%}\n                {%- endif -%}\n                {%- if \"nullable\" in param and param[\"nullable\"] -%}\n                    {%- set param_type = param_type + \" | null\" -%}\n                {%- endif -%}\n                {%- set param_declaration = param_declaration + \": \" + param_type + \",\" -%}\n                {{ append_new_param_info(param_declaration, comment_info, examples_info, depth) }}\n            {%- endif -%}\n        {%- endif -%}\n    {%- endfor -%}\n{%- endmacro -%}\n\n{%- macro generate_schema_from_functions(functions, namespace='functions') -%}\n    {{ \"// Supported function definitions that should be called when necessary.\\n\" -}}\n    {{- \"namespace \" + namespace + \" {\\n\\n\" -}}\n\n    {%- for function in functions -%}\n        {%- if function.get(\"function\") -%}\n            {%- set function = function.get(\"function\") -%}\n        {%- endif -%}\n\n        {%- set function_name = function.get(\"name\") -%}\n        {%- if function_name -%}\n            {%- set description = function.get('description', '') -%}\n            {%- set parameters = function.get('parameters', {}) -%}\n            {{- \"// \" + description + \"\\n\" -}}\n            {{- \"type \" + function_name -}}\n            {%- if parameters and parameters.get(\"properties\") -%}\n                {{- \" = (_: {\" -}}\n                {%- set required_params = parameters.get(\"required\", []) -%}\n                {{ get_parameter_typescript(parameters.get(\"properties\"), required_params, 0) -}}\n                {{- \"\\n}) => any;\\n\\n\" }}\n            {%- else -%}\n                {{ \" = () => any;\\n\\n\" }}\n            {%- endif -%}\n        {%- endif -%}\n    {%- endfor -%}\n    {{ \"} // namespace \" + namespace }}\n{%- endmacro -%}\n\n{%- if not tools is defined -%}\n    {%- set tools = none -%}\n{%- endif -%}\n\n{%- set has_code_interpreter = tools | selectattr(\"type\", \"equalto\", \"code_interpreter\") | list | length > 0 -%}\n{%- if has_code_interpreter -%}\n    {%- set tools = tools | rejectattr(\"type\", \"equalto\", \"code_interpreter\") | list -%}\n{%- endif -%}\n\n{#- System message + builtin tools #}\n{{- bos_token + \"<|start_header_id|>system<|end_header_id|>\\n\\n\" }}\n{%- if has_code_interpreter %}\n    {{- \"Environment: ipython\\n\\n\" }}\n{%- else -%}\n    {{ \"\"}}\n{%- endif %}\n{%- if tools %}\n    {{- \"\\nYou have access to the following functions:\\n\\n\" }}\n    {%- for t in tools %}\n        {%- if \"type\" in t -%}\n            {{ \"Use the function '\" + t[\"function\"][\"name\"] + \"' to '\" + t[\"function\"][\"description\"] + \"'\\n\" + t[\"function\"] | tojson() }}\n        {%- else -%}\n            {{ \"Use the function '\" + t[\"name\"] + \"' to '\" + t[\"description\"] + \"'\\n\" + t | tojson }}\n        {%- endif -%}\n        {{- \"\\n\\n\" }}\n    {%- endfor %}\n    {{- '\\nThink very carefully before calling functions.\\nIf a you choose to call a function ONLY reply in the following format:\\n<{start_tag}={function_name}>{parameters}{end_tag}\\nwhere\\n\\nstart_tag => `<function`\\nparameters => a JSON dict with the function argument name as key and function argument value as value.\\nend_tag => `</function>`\\n\\nHere is an example,\\n<function=example_function_name>{\"example_name\": \"example_value\"}</function>\\n\\nReminder:\\n- If looking for real time information use relevant functions before falling back to brave_search\\n- Function calls MUST follow the specified format, start with <function= and end with </function>\\n- Required parameters MUST be specified\\n- Only call one function at a time\\n- Put the entire function call reply on one line\\n\\n' -}}\n{%- endif %}\n{{- \"<|eot_id|>\" -}}\n\n{%- for message in messages -%}\n    {%- if message['role'] == 'user' or message['role'] == 'system' -%}\n        {{ '<|start_header_id|>' + message['role'] + '<|end_header_id|>\\n\\n' + message['content'] + '<|eot_id|>' }}\n    {%- elif message['role'] == 'tool' -%}\n        {{ '<|start_header_id|>ipython<|end_header_id|>\\n\\n' + message['content'] + '<|eot_id|>' }}\n    {%- else -%}\n        {%- if (message['content'] and message['content']|length > 0) or ('tool_calls' in message and message['tool_calls'] and message['tool_calls']|length > 0) -%}\n            {{ '<|start_header_id|>' + message['role'] + '<|end_header_id|>\\n\\n'}}\n        {%- endif -%}\n        {%- if message['content'] and message['content']|length > 0 -%}\n            {{ message['content'] }}\n        {%- endif -%}\n        {%- if 'tool_calls' in message and message['tool_calls'] and message['tool_calls']|length > 0 -%}\n            {%- for tool_call in message['tool_calls'] -%}\n                {%- if tool_call[\"function\"][\"name\"] == \"python\" -%}\n                    {{ '<|python_tag|>' + tool_call['function']['arguments'] }}\n                {%- else -%}\n                    {{ '<function=' + tool_call['function']['name'] + '>' + tool_call['function']['arguments'] + '</function>' }}\n                {%- endif -%}\n            {%- endfor -%}\n            {{ '<|eom_id|>' }}\n        {%- elif message['content'] and message['content']|length > 0 -%}\n            {{ '<|eot_id|>' }}\n        {%- endif -%}\n    {%- endif -%}\n{%- endfor -%}\n{%- if add_generation_prompt -%}\n    {{ '<|start_header_id|>assistant<|end_header_id|>\\n\\n' }}\n{%- endif -%}\n",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|eot_id|>",
+  "extra_special_tokens": {},
+  "max_length": 8192,
+  "model_input_names": [
+    "input_ids",
+    "attention_mask"
+  ],
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<|end_of_text|>",
+  "stride": 0,
+  "tokenizer_class": "PreTrainedTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first"
+}