danielhanchen commited on
Commit
501c7d1
·
verified ·
1 Parent(s): 97292bb

Add files using upload-large-folder tool

Browse files
README.md CHANGED
@@ -1,52 +1,14 @@
1
  ---
2
- base_model: google/gemma-3-12b-it
3
- language:
4
- - en
5
- library_name: transformers
6
  license: gemma
7
- tags:
8
- - unsloth
9
- - transformers
10
- - gemma3
11
- - gemma
12
- - google
 
 
13
  ---
14
- <div>
15
- <p style="margin-bottom: 0; margin-top: 0;">
16
- <strong>See <a href="https://huggingface.co/collections/unsloth/gemma-3-67d12b7e8816ec6efa7e4e5b">our collection</a> for all versions of Gemma 3 including GGUF, 4-bit & 16-bit formats.</strong>
17
- </p>
18
- <p style="margin-bottom: 0;">
19
- <em><a href="https://docs.unsloth.ai/basics/tutorial-how-to-run-gemma-3-effectively">Read our Guide</a> to see how to Run Gemma 3 correctly.</em>
20
- </p>
21
- <div style="display: flex; gap: 5px; align-items: center; ">
22
- <a href="https://github.com/unslothai/unsloth/">
23
- <img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133">
24
- </a>
25
- <a href="https://discord.gg/unsloth">
26
- <img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
27
- </a>
28
- <a href="https://docs.unsloth.ai/basics/tutorial-how-to-run-deepseek-r1-on-your-own-local-device">
29
- <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
30
- </a>
31
- </div>
32
- <h1 style="margin-top: 0rem;">✨ Fine-tune Gemma 3 with Unsloth!</h1>
33
- </div>
34
-
35
- - Fine-tune Gemma 3 (12B) for free using our Google [Colab notebook here](https://docs.unsloth.ai/get-started/unsloth-notebooks)!
36
- - Read our Blog about Gemma 3 support: [unsloth.ai/blog/gemma3](https://unsloth.ai/blog/gemma3)
37
- - View the rest of our notebooks in our [docs here](https://docs.unsloth.ai/get-started/unsloth-notebooks).
38
- - Export your fine-tuned model to GGUF, Ollama, llama.cpp or 🤗HF.
39
-
40
- | Unsloth supports | Free Notebooks | Performance | Memory use |
41
- |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------|
42
- | **GRPO with Gemma 3 (12B)** | [▶️ Start on Colab](https://docs.unsloth.ai/get-started/unsloth-notebooks) | 2x faster | 80% less |
43
- | **Llama-3.2 (3B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb) | 2.4x faster | 58% less |
44
- | **Llama-3.2 (11B vision)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) | 2x faster | 60% less |
45
- | **Qwen2.5 (7B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_(7B)-Alpaca.ipynb) | 2x faster | 60% less |
46
- | **Phi-4 (14B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4-Conversational.ipynb) | 2x faster | 50% less |
47
- | **Mistral (7B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_(7B)-Conversational.ipynb) | 2.2x faster | 62% less |
48
-
49
- <br>
50
 
51
  # Gemma 3 model card
52
 
@@ -96,6 +58,107 @@ for everyone.
96
  question, analysis of image content, or a summary of a document
97
  - Total output context of 8192 tokens
98
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
  ### Citation
100
 
101
  ```none
 
1
  ---
 
 
 
 
2
  license: gemma
3
+ library_name: transformers
4
+ pipeline_tag: image-text-to-text
5
+ extra_gated_heading: Access Gemma on Hugging Face
6
+ extra_gated_prompt: To access Gemma on Hugging Face, you’re required to review and
7
+ agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging
8
+ Face and click below. Requests are processed immediately.
9
+ extra_gated_button_content: Acknowledge license
10
+ base_model: google/gemma-3-12b-pt
11
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
  # Gemma 3 model card
14
 
 
58
  question, analysis of image content, or a summary of a document
59
  - Total output context of 8192 tokens
60
 
61
+ ### Usage
62
+
63
+ Below, there are some code snippets on how to get quickly started with running the model. First, install the Transformers library. Gemma 3 is supported starting from transformers 4.50.0.
64
+
65
+ ```sh
66
+ $ pip install -U transformers
67
+ ```
68
+
69
+ Then, copy the snippet from the section that is relevant for your use case.
70
+
71
+ #### Running with the `pipeline` API
72
+
73
+ You can initialize the model and processor for inference with `pipeline` as follows.
74
+
75
+ ```python
76
+ from transformers import pipeline
77
+ import torch
78
+
79
+ pipe = pipeline(
80
+ "image-text-to-text",
81
+ model="google/gemma-3-12b-it",
82
+ device="cuda",
83
+ torch_dtype=torch.bfloat16
84
+ )
85
+ ```
86
+
87
+ With instruction-tuned models, you need to use chat templates to process our inputs first. Then, you can pass it to the pipeline.
88
+
89
+ ```python
90
+ messages = [
91
+ {
92
+ "role": "system",
93
+ "content": [{"type": "text", "text": "You are a helpful assistant."}]
94
+ },
95
+ {
96
+ "role": "user",
97
+ "content": [
98
+ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
99
+ {"type": "text", "text": "What animal is on the candy?"}
100
+ ]
101
+ }
102
+ ]
103
+
104
+ output = pipe(text=messages, max_new_tokens=200)
105
+ print(output[0]["generated_text"][-1]["content"])
106
+ # Okay, let's take a look!
107
+ # Based on the image, the animal on the candy is a **turtle**.
108
+ # You can see the shell shape and the head and legs.
109
+ ```
110
+
111
+ #### Running the model on a single / multi GPU
112
+
113
+ ```python
114
+ # pip install accelerate
115
+
116
+ from transformers import AutoProcessor, Gemma3ForConditionalGeneration
117
+ from PIL import Image
118
+ import requests
119
+ import torch
120
+
121
+ model_id = "google/gemma-3-12b-it"
122
+
123
+ model = Gemma3ForConditionalGeneration.from_pretrained(
124
+ model_id, device_map="auto"
125
+ ).eval()
126
+
127
+ processor = AutoProcessor.from_pretrained(model_id)
128
+
129
+ messages = [
130
+ {
131
+ "role": "system",
132
+ "content": [{"type": "text", "text": "You are a helpful assistant."}]
133
+ },
134
+ {
135
+ "role": "user",
136
+ "content": [
137
+ {"type": "image", "image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
138
+ {"type": "text", "text": "Describe this image in detail."}
139
+ ]
140
+ }
141
+ ]
142
+
143
+ inputs = processor.apply_chat_template(
144
+ messages, add_generation_prompt=True, tokenize=True,
145
+ return_dict=True, return_tensors="pt"
146
+ ).to(model.device, dtype=torch.bfloat16)
147
+
148
+ input_len = inputs["input_ids"].shape[-1]
149
+
150
+ with torch.inference_mode():
151
+ generation = model.generate(**inputs, max_new_tokens=100, do_sample=False)
152
+ generation = generation[0][input_len:]
153
+
154
+ decoded = processor.decode(generation, skip_special_tokens=True)
155
+ print(decoded)
156
+
157
+ # **Overall Impression:** The image is a close-up shot of a vibrant garden scene,
158
+ # focusing on a cluster of pink cosmos flowers and a busy bumblebee.
159
+ # It has a slightly soft, natural feel, likely captured in daylight.
160
+ ```
161
+
162
  ### Citation
163
 
164
  ```none
config.json CHANGED
@@ -3,60 +3,38 @@
3
  "Gemma3ForConditionalGeneration"
4
  ],
5
  "boi_token_index": 255999,
6
- "bos_token_id": 2,
7
  "eoi_token_index": 256000,
8
- "eos_token_id": 106,
 
 
 
9
  "image_token_index": 262144,
10
  "initializer_range": 0.02,
11
  "mm_tokens_per_image": 256,
12
  "model_type": "gemma3",
13
- "pad_token_id": 0,
14
  "text_config": {
15
- "attention_bias": false,
16
- "attention_dropout": 0.0,
17
- "attn_logit_softcapping": null,
18
- "cache_implementation": "hybrid",
19
- "final_logit_softcapping": null,
20
- "head_dim": 256,
21
- "hidden_activation": "gelu_pytorch_tanh",
22
  "hidden_size": 3840,
23
- "initializer_range": 0.02,
24
  "intermediate_size": 15360,
25
- "max_position_embeddings": 131072,
26
  "model_type": "gemma3_text",
27
  "num_attention_heads": 16,
28
  "num_hidden_layers": 48,
29
  "num_key_value_heads": 8,
30
- "query_pre_attn_scalar": 256,
31
- "rms_norm_eps": 1e-06,
32
- "rope_local_base_freq": 10000.0,
33
  "rope_scaling": {
34
  "factor": 8.0,
35
  "rope_type": "linear"
36
  },
37
- "rope_theta": 1000000.0,
38
- "sliding_window": 1024,
39
- "sliding_window_pattern": 6,
40
- "torch_dtype": "bfloat16",
41
- "use_cache": true,
42
- "vocab_size": 262208
43
  },
44
  "torch_dtype": "bfloat16",
45
- "transformers_version": "4.51.0",
46
- "unsloth_fixed": true,
47
  "vision_config": {
48
- "attention_dropout": 0.0,
49
- "hidden_act": "gelu_pytorch_tanh",
50
  "hidden_size": 1152,
51
  "image_size": 896,
52
  "intermediate_size": 4304,
53
- "layer_norm_eps": 1e-06,
54
  "model_type": "siglip_vision_model",
55
  "num_attention_heads": 16,
56
- "num_channels": 3,
57
  "num_hidden_layers": 27,
58
  "patch_size": 14,
59
- "torch_dtype": "bfloat16",
60
  "vision_use_head": false
61
  }
62
  }
 
3
  "Gemma3ForConditionalGeneration"
4
  ],
5
  "boi_token_index": 255999,
 
6
  "eoi_token_index": 256000,
7
+ "eos_token_id": [
8
+ 1,
9
+ 106
10
+ ],
11
  "image_token_index": 262144,
12
  "initializer_range": 0.02,
13
  "mm_tokens_per_image": 256,
14
  "model_type": "gemma3",
 
15
  "text_config": {
 
 
 
 
 
 
 
16
  "hidden_size": 3840,
 
17
  "intermediate_size": 15360,
 
18
  "model_type": "gemma3_text",
19
  "num_attention_heads": 16,
20
  "num_hidden_layers": 48,
21
  "num_key_value_heads": 8,
 
 
 
22
  "rope_scaling": {
23
  "factor": 8.0,
24
  "rope_type": "linear"
25
  },
26
+ "sliding_window": 1024
 
 
 
 
 
27
  },
28
  "torch_dtype": "bfloat16",
29
+ "transformers_version": "4.50.0.dev0",
 
30
  "vision_config": {
 
 
31
  "hidden_size": 1152,
32
  "image_size": 896,
33
  "intermediate_size": 4304,
 
34
  "model_type": "siglip_vision_model",
35
  "num_attention_heads": 16,
 
36
  "num_hidden_layers": 27,
37
  "patch_size": 14,
 
38
  "vision_use_head": false
39
  }
40
  }
generation_config.json CHANGED
@@ -9,5 +9,5 @@
9
  "pad_token_id": 0,
10
  "top_k": 64,
11
  "top_p": 0.95,
12
- "transformers_version": "4.51.0"
13
  }
 
9
  "pad_token_id": 0,
10
  "top_k": 64,
11
  "top_p": 0.95,
12
+ "transformers_version": "4.50.0.dev0"
13
  }
special_tokens_map.json CHANGED
@@ -9,7 +9,7 @@
9
  },
10
  "eoi_token": "<end_of_image>",
11
  "eos_token": {
12
- "content": "<end_of_turn>",
13
  "lstrip": false,
14
  "normalized": false,
15
  "rstrip": false,
 
9
  },
10
  "eoi_token": "<end_of_image>",
11
  "eos_token": {
12
+ "content": "<eos>",
13
  "lstrip": false,
14
  "normalized": false,
15
  "rstrip": false,
tokenizer_config.json CHANGED
@@ -51328,16 +51328,15 @@
51328
  "chat_template": "{{ bos_token }}\n{%- if messages[0]['role'] == 'system' -%}\n {%- if messages[0]['content'] is string -%}\n {%- set first_user_prefix = messages[0]['content'] + '\n\n' -%}\n {%- else -%}\n {%- set first_user_prefix = messages[0]['content'][0]['text'] + '\n\n' -%}\n {%- endif -%}\n {%- set loop_messages = messages[1:] -%}\n{%- else -%}\n {%- set first_user_prefix = \"\" -%}\n {%- set loop_messages = messages -%}\n{%- endif -%}\n{%- for message in loop_messages -%}\n {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}\n {{ raise_exception(\"Conversation roles must alternate user/assistant/user/assistant/...\") }}\n {%- endif -%}\n {%- if (message['role'] == 'assistant') -%}\n {%- set role = \"model\" -%}\n {%- else -%}\n {%- set role = message['role'] -%}\n {%- endif -%}\n {{ '<start_of_turn>' + role + '\n' + (first_user_prefix if loop.first else \"\") }}\n {%- if message['content'] is string -%}\n {{ message['content'] | trim }}\n {%- elif message['content'] is iterable -%}\n {%- for item in message['content'] -%}\n {%- if item['type'] == 'image' -%}\n {{ '<start_of_image>' }}\n {%- elif item['type'] == 'text' -%}\n {{ item['text'] | trim }}\n {%- endif -%}\n {%- endfor -%}\n {%- else -%}\n {{ raise_exception(\"Invalid content type\") }}\n {%- endif -%}\n {{ '<end_of_turn>\n' }}\n{%- endfor -%}\n{%- if add_generation_prompt -%}\n {{'<start_of_turn>model\n'}}\n{%- endif -%}\n",
51329
  "clean_up_tokenization_spaces": false,
51330
  "eoi_token": "<end_of_image>",
51331
- "eos_token": "<end_of_turn>",
51332
  "extra_special_tokens": {
51333
  "boi_token": "<start_of_image>",
51334
  "eoi_token": "<end_of_image>",
51335
  "image_token": "<image_soft_token>"
51336
  },
51337
  "image_token": "<image_soft_token>",
51338
- "model_max_length": 131072,
51339
  "pad_token": "<pad>",
51340
- "padding_side": "left",
51341
  "processor_class": "Gemma3Processor",
51342
  "sp_model_kwargs": null,
51343
  "spaces_between_special_tokens": false,
 
51328
  "chat_template": "{{ bos_token }}\n{%- if messages[0]['role'] == 'system' -%}\n {%- if messages[0]['content'] is string -%}\n {%- set first_user_prefix = messages[0]['content'] + '\n\n' -%}\n {%- else -%}\n {%- set first_user_prefix = messages[0]['content'][0]['text'] + '\n\n' -%}\n {%- endif -%}\n {%- set loop_messages = messages[1:] -%}\n{%- else -%}\n {%- set first_user_prefix = \"\" -%}\n {%- set loop_messages = messages -%}\n{%- endif -%}\n{%- for message in loop_messages -%}\n {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}\n {{ raise_exception(\"Conversation roles must alternate user/assistant/user/assistant/...\") }}\n {%- endif -%}\n {%- if (message['role'] == 'assistant') -%}\n {%- set role = \"model\" -%}\n {%- else -%}\n {%- set role = message['role'] -%}\n {%- endif -%}\n {{ '<start_of_turn>' + role + '\n' + (first_user_prefix if loop.first else \"\") }}\n {%- if message['content'] is string -%}\n {{ message['content'] | trim }}\n {%- elif message['content'] is iterable -%}\n {%- for item in message['content'] -%}\n {%- if item['type'] == 'image' -%}\n {{ '<start_of_image>' }}\n {%- elif item['type'] == 'text' -%}\n {{ item['text'] | trim }}\n {%- endif -%}\n {%- endfor -%}\n {%- else -%}\n {{ raise_exception(\"Invalid content type\") }}\n {%- endif -%}\n {{ '<end_of_turn>\n' }}\n{%- endfor -%}\n{%- if add_generation_prompt -%}\n {{'<start_of_turn>model\n'}}\n{%- endif -%}\n",
51329
  "clean_up_tokenization_spaces": false,
51330
  "eoi_token": "<end_of_image>",
51331
+ "eos_token": "<eos>",
51332
  "extra_special_tokens": {
51333
  "boi_token": "<start_of_image>",
51334
  "eoi_token": "<end_of_image>",
51335
  "image_token": "<image_soft_token>"
51336
  },
51337
  "image_token": "<image_soft_token>",
51338
+ "model_max_length": 1000000000000000019884624838656,
51339
  "pad_token": "<pad>",
 
51340
  "processor_class": "Gemma3Processor",
51341
  "sp_model_kwargs": null,
51342
  "spaces_between_special_tokens": false,