nasselt48 commited on
Commit
aeab8be
·
verified ·
1 Parent(s): 5e613df

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +827 -324
README.md CHANGED
@@ -1,324 +1,827 @@
1
- ---
2
- license: gemma
3
- base_model: google/gemma-3n-e4b-it
4
- tags:
5
- - gemma3n
6
- - gguf
7
- - quantized
8
- - llama.cpp
9
- - ollama
10
- - inference
11
- - conversational
12
- - multilingual
13
- language:
14
- - en
15
- - multilingual
16
- pipeline_tag: text-generation
17
- model_type: gemma3n
18
- ---
19
-
20
- # Gemma 3n E4B IT - Complete GGUF Collection
21
-
22
- This repository contains a comprehensive collection of **Gemma 3n E4B Instruction-Tuned** models quantized to various GGUF formats for efficient inference on different hardware configurations.
23
-
24
- ## 📊 **Quantization Overview**
25
-
26
- | **Format** | **Size (GB)** | **Compression** | **Quality** | **Use Case** |
27
- |------------|---------------|-----------------|-------------|--------------|
28
- | **F16** | 13.0 | 1.0x | 🌟🌟🌟🌟🌟 | Research, maximum quality |
29
- | **Q8_0** | 6.8 | 1.9x | 🌟🌟🌟🌟🌟 | Production, near-original quality |
30
- | **Q6_K** | 5.3 | 2.5x | 🌟🌟🌟🌟 | High-quality inference |
31
- | **Q5_1** | 4.9 | 2.6x | 🌟🌟🌟🌟 | Balanced quality/speed |
32
- | **Q5_K_M** | 4.6 | 2.8x | 🌟🌟🌟🌟 | **Recommended for most users** |
33
- | **Q5_K_S** | 4.5 | 2.8x | 🌟🌟🌟🌟 | Slightly faster Q5 |
34
- | **Q5_0** | 4.5 | 2.8x | 🌟🌟🌟🌟 | Good balance |
35
- | **Q4_1** | 4.2 | 3.1x | 🌟🌟🌟 | Better 4-bit quality |
36
- | **Q4_K_M** | 3.9 | 3.3x | 🌟🌟🌟 | **Popular choice** |
37
- | **Q4_K_S** | 3.8 | 3.4x | 🌟🌟🌟 | Efficient 4-bit |
38
- | **Q4_0** | 3.8 | 3.4x | 🌟🌟🌟 | **Most compatible** |
39
- | **Q3_K_L** | 3.4 | 3.8x | 🌟🌟 | High compression |
40
- | **Q3_K_M** | 3.2 | 4.0x | 🌟🌟 | Aggressive compression |
41
- | **Q3_K_S** | 3.0 | 4.3x | 🌟🌟 | Maximum speed |
42
- | **Q2_K** | 2.6 | 5.0x | 🌟 | Extreme compression |
43
-
44
- ## 🚀 **Quick Start**
45
-
46
- ### Using with Ollama
47
- ```bash
48
- # Option 1: Use pre-built model (requires Ollama ≥ 0.10.0)
49
- ollama run gemma3n:e4b
50
-
51
- # Option 2: Import custom quantization
52
- ollama create my-gemma3n -f Modelfile
53
- ```
54
-
55
- ### Using with llama.cpp
56
- ```bash
57
- # Download your preferred quantization
58
- ./llama-server -m gemma-3n-e4b-it-q4_k_m.gguf -c 4096
59
-
60
- # For chat interface
61
- ./llama-chat -m gemma-3n-e4b-it-q4_k_m.gguf --color -i
62
- ```
63
-
64
- ### Using with Python (llama-cpp-python)
65
- ```python
66
- from llama_cpp import Llama
67
-
68
- # Load model
69
- llm = Llama(
70
- model_path="gemma-3n-e4b-it-q4_k_m.gguf",
71
- n_ctx=4096,
72
- n_threads=8,
73
- verbose=False
74
- )
75
-
76
- # Generate response
77
- response = llm.create_chat_completion(
78
- messages=[
79
- {"role": "user", "content": "Hello! Can you introduce yourself?"}
80
- ],
81
- max_tokens=512,
82
- temperature=0.7
83
- )
84
-
85
- print(response['choices'][0]['message']['content'])
86
- ```
87
-
88
- ## 💬 **Chat Template**
89
-
90
- Gemma 3n uses the following chat template:
91
- ```
92
- <bos><start_of_turn>user
93
- {user_message}<end_of_turn>
94
- <start_of_turn>model
95
- {assistant_response}<end_of_turn>
96
- ```
97
-
98
- ### Recommended Parameters
99
- ```yaml
100
- temperature: 0.7
101
- top_p: 0.9
102
- top_k: 40
103
- repeat_penalty: 1.1
104
- max_tokens: 4096
105
- stop_tokens: ["<end_of_turn>"]
106
- ```
107
-
108
- ## 📋 **Model Details**
109
-
110
- - **Base Model**: [google/gemma-3n-e4b-it](https://huggingface.co/google/gemma-3n-e4b-it)
111
- - **Architecture**: Gemma 3n (Effective 4B parameters)
112
- - **Context Length**: 32,768 tokens
113
- - **Vocabulary Size**: 256,000 tokens
114
- - **Training Data**: Multilingual (140+ languages)
115
- - **License**: Gemma License
116
-
117
- ### Key Features
118
- - **Efficient Architecture**: Uses selective parameter activation
119
- - ✅ **Multilingual Support**: Trained on 140+ spoken languages
120
- - ✅ **Instruction-Tuned**: Optimized for conversational AI
121
- - ✅ **Extended Context**: 32K token context window
122
- - ✅ **Hardware Optimized**: Designed for everyday devices
123
-
124
- ## 🔧 **Quantization Details**
125
-
126
- All models were quantized using the latest **llama.cpp** with full Gemma 3n architecture support:
127
-
128
- ### Quantization Process
129
- ```bash
130
- # Example quantization command
131
- ./llama-quantize input.gguf output.gguf q4_k_m
132
- ```
133
-
134
- ### Architecture-Specific Tensors
135
- The quantization properly handles Gemma 3n-specific components:
136
- - `altup_correct_coef`, `altup_correct_scale`, `altup_predict_coef`
137
- - `altup_router`, `altup_router_norm`
138
- - `laurel_l`, `laurel_r`, `laurel_post_norm`
139
- - Standard attention, FFN, and normalization layers
140
-
141
- ## 📈 **Performance Benchmarks**
142
-
143
- Based on the original Gemma 3n E4B IT model:
144
-
145
- | **Benchmark** | **Score** | **Category** |
146
- |---------------|-----------|--------------|
147
- | MMLU | 64.9% | General Knowledge |
148
- | HumanEval | 75.0% | Code Generation |
149
- | HellaSwag | 78.6% | Commonsense Reasoning |
150
- | ARC-E | 81.6% | Elementary Science |
151
- | TriviaQA | 70.2% | Factual Knowledge |
152
- | MBPP | 63.6% | Programming |
153
-
154
- ## 🎯 **Recommended Usage**
155
-
156
- ### For Different Hardware:
157
- - **High-end GPU (24GB+ VRAM)**: Q8_0 or F16
158
- - **Mid-range GPU (12-16GB VRAM)**: Q6_K or Q5_K_M
159
- - **Consumer GPU (8-12GB VRAM)**: Q4_K_M or Q5_K_S
160
- - **CPU-only or low VRAM**: Q4_0 or Q3_K_M
161
- - **Mobile/Edge devices**: Q3_K_S or Q2_K
162
-
163
- ### For Different Use Cases:
164
- - **Production chatbots**: Q5_K_M or Q4_K_M
165
- - **Research/evaluation**: Q8_0 or F16
166
- - **Rapid prototyping**: Q4_0
167
- - **Resource-constrained**: Q3_K_M
168
- - **Edge deployment**: Q2_K
169
-
170
- ## 🛠️ **Technical Requirements**
171
-
172
- ### Minimum System Requirements:
173
- - **RAM**: 4GB (for Q2_K) to 16GB (for F16)
174
- - **Storage**: 3GB to 13GB depending on quantization
175
- - **CPU**: Modern x64 processor with AVX support
176
- - **GPU** (optional): CUDA, OpenCL, or Metal support
177
-
178
- ### Software Dependencies:
179
- - **llama.cpp**: Latest version with Gemma 3n support
180
- - **Python**: 3.8+ for llama-cpp-python
181
- - **Ollama**: 0.10.0+ for official support
182
-
183
- ## 📚 **Usage Examples**
184
-
185
- ### Example Conversations:
186
- ```
187
- User: What is Gemma 3n and how does it differ from other models?
188
-
189
- Gemma 3n: I'm Gemma 3n, a helpful AI assistant created by Google. I'm part of the Gemma 3n family, which is designed for efficient execution on everyday devices like laptops, tablets, and phones.
190
-
191
- What makes me different from other models:
192
- - **Selective Parameter Activation**: I use an innovative technology that allows me to operate effectively with 4B parameters while containing more total parameters
193
- - **Multilingual Capabilities**: I was trained on data in over 140 spoken languages
194
- - **Hardware Efficiency**: I'm optimized to run well on consumer hardware, not just high-end servers
195
- - **Extended Context**: I can handle up to 32K tokens in context
196
- - **Instruction-Tuned**: I'm specifically fine-tuned for conversational interactions
197
-
198
- User: Can you help me write a Python function to calculate fibonacci numbers?
199
-
200
- Gemma 3n: Certainly! Here are a few different approaches to calculate Fibonacci numbers:
201
-
202
- def fibonacci_recursive(n):
203
- """Recursive approach - simple but inefficient for large n"""
204
- if n <= 1:
205
- return n
206
- return fibonacci_recursive(n-1) + fibonacci_recursive(n-2)
207
-
208
- def fibonacci_iterative(n):
209
- """Iterative approach - efficient and readable"""
210
- if n <= 1:
211
- return n
212
-
213
- a, b = 0, 1
214
- for _ in range(2, n + 1):
215
- a, b = b, a + b
216
- return b
217
-
218
- def fibonacci_generator(limit):
219
- """Generator for Fibonacci sequence up to limit"""
220
- a, b = 0, 1
221
- while a < limit:
222
- yield a
223
- a, b = b, a + b
224
-
225
- # Examples:
226
- print(fibonacci_iterative(10)) # Output: 55
227
- print(list(fibonacci_generator(100))) # All Fib numbers < 100
228
- ```
229
-
230
- ## 🔧 **Troubleshooting**
231
-
232
- ### Common Issues:
233
-
234
- **1. "unknown model architecture" error:**
235
- - Ensure you're using the latest llama.cpp with Gemma 3n support
236
- - Update to llama.cpp commit `8846aace` or later
237
-
238
- **2. Ollama compatibility issues:**
239
- - Update Ollama to version 0.10.0 or later for native Gemma 3n support
240
- - For older Ollama versions, use custom Modelfile approach
241
-
242
- **3. Out of memory errors:**
243
- - Try a smaller quantization (Q4_0, Q3_K_M, or Q2_K)
244
- - Reduce context length with `-c` parameter
245
- - Use CPU inference instead of GPU
246
-
247
- **4. Slow inference:**
248
- - Use GPU acceleration if available
249
- - Try quantizations like Q4_K_M or Q4_0 for better speed
250
- - Adjust thread count with `-t` parameter
251
-
252
- ### Performance Tips:
253
- - **GPU users**: Q4_K_M or Q5_K_M offer the best speed/quality balance
254
- - **CPU users**: Q4_0 provides good compatibility and reasonable speed
255
- - **Memory-constrained**: Q3_K_M or Q2_K can run on limited hardware
256
-
257
- ## ⚠️ **Limitations**
258
-
259
- - **Quantization Quality**: Lower bit quantizations (Q2_K, Q3_K) may have reduced quality
260
- - **Context Length**: While supporting 32K tokens, performance may degrade with very long contexts
261
- - **Language Coverage**: While multilingual, performance may vary across different languages
262
- - **Knowledge Cutoff**: Training data has a cutoff date (June 2024)
263
- - **Factual Accuracy**: May generate plausible but incorrect information
264
- - **Code Generation**: While capable, may require verification for production use
265
-
266
- ## 🔒 **Ethical Considerations**
267
-
268
- - **Bias**: Model may reflect biases present in training data
269
- - **Safety**: Content filtering is not included in these GGUF versions
270
- - **Verification**: Always verify outputs, especially for critical applications
271
- - **Responsible Use**: Follow the Gemma License and ethical AI guidelines
272
-
273
- ## 📄 **Files Included**
274
-
275
- ```
276
- gemma-3n-e4b-it-f16.gguf # 13.0 GB - Full precision
277
- gemma-3n-e4b-it-q8_0.gguf # 6.8 GB - 8-bit quantization
278
- gemma-3n-e4b-it-q6_k.gguf # 5.3 GB - 6-bit K-quant
279
- gemma-3n-e4b-it-q5_1.gguf # 4.9 GB - 5-bit (high quality)
280
- gemma-3n-e4b-it-q5_k_m.gguf # 4.6 GB - 5-bit K-quant medium
281
- gemma-3n-e4b-it-q5_k_s.gguf # 4.5 GB - 5-bit K-quant small
282
- gemma-3n-e4b-it-q5_0.gguf # 4.5 GB - 5-bit standard
283
- gemma-3n-e4b-it-q4_1.gguf # 4.2 GB - 4-bit (improved)
284
- gemma-3n-e4b-it-q4_k_m.gguf # 3.9 GB - 4-bit K-quant medium
285
- gemma-3n-e4b-it-q4_k_s.gguf # 3.8 GB - 4-bit K-quant small
286
- gemma-3n-e4b-it-q4_0.gguf # 3.8 GB - 4-bit standard
287
- gemma-3n-e4b-it-q3_k_l.gguf # 3.4 GB - 3-bit K-quant large
288
- gemma-3n-e4b-it-q3_k_m.gguf # 3.2 GB - 3-bit K-quant medium
289
- gemma-3n-e4b-it-q3_k_s.gguf # 3.0 GB - 3-bit K-quant small
290
- gemma-3n-e4b-it-q2_k.gguf # 2.6 GB - 2-bit K-quant
291
- ```
292
-
293
- ## 🙏 **Acknowledgments**
294
-
295
- - **Google DeepMind**: For developing and releasing Gemma 3n
296
- - **llama.cpp community**: For implementing Gemma 3n architecture support
297
- - **Hugging Face**: For providing the model hosting platform
298
- - **Quantization**: Performed using the latest llama.cpp tools
299
-
300
- ## 📞 **Support & Community**
301
-
302
- - **Issues**: Report problems in the repository issues
303
- - **Discussions**: Join the community discussions
304
- - **Updates**: Follow for model updates and improvements
305
-
306
- ## 📜 **Citation**
307
-
308
- If you use these models in your research, please cite:
309
-
310
- ```bibtex
311
- @article{gemma_3n_2025,
312
- title={Gemma 3n},
313
- url={https://ai.google.dev/gemma/docs/gemma-3n},
314
- publisher={Google DeepMind},
315
- author={Gemma Team},
316
- year={2025}
317
- }
318
- ```
319
-
320
- ---
321
-
322
- **License**: This model is released under the Gemma License. Please review the license terms before use.
323
-
324
- **Disclaimer**: These quantized models are provided for research and educational purposes. Users are responsible for ensuring compliance with applicable laws and ethical guidelines.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ base_model: google/gemma-3n-e4b-it
4
+ tags:
5
+ - gemma3n
6
+ - gguf
7
+ - quantized
8
+ - llama.cpp
9
+ - ollama
10
+ - inference
11
+ - conversational
12
+ - multilingual
13
+ language:
14
+ - en
15
+ - multilingual
16
+ pipeline_tag: text-generation
17
+ model_type: gemma3n
18
+ library_name: transformers
19
+ ---
20
+
21
+ # Gemma 3n E4B IT - Complete GGUF Collection
22
+
23
+ This repository contains a comprehensive collection of **Gemma 3n E4B Instruction-Tuned** models quantized to various GGUF formats for efficient inference on different hardware configurations.
24
+
25
+ ## 📊 **Quantization Overview**
26
+
27
+ | **Format** | **Size (GB)** | **Compression** | **Quality** | **Use Case** |
28
+ |------------|---------------|-----------------|-------------|--------------|
29
+ | **F16** | 13.0 | 1.0x | 🌟🌟🌟🌟🌟 | Research, maximum quality |
30
+ | **Q8_0** | 6.8 | 1.9x | 🌟🌟🌟🌟🌟 | Production, near-original quality |
31
+ | **Q6_K** | 5.3 | 2.5x | 🌟🌟🌟🌟 | High-quality inference |
32
+ | **Q5_1** | 4.9 | 2.6x | 🌟🌟🌟🌟 | Balanced quality/speed |
33
+ | **Q5_K_M** | 4.6 | 2.8x | 🌟🌟🌟🌟 | **Recommended for most users** |
34
+ | **Q5_K_S** | 4.5 | 2.8x | 🌟🌟🌟🌟 | Slightly faster Q5 |
35
+ | **Q5_0** | 4.5 | 2.8x | 🌟🌟🌟🌟 | Good balance |
36
+ | **Q4_1** | 4.2 | 3.1x | 🌟🌟🌟 | Better 4-bit quality |
37
+ | **Q4_K_M** | 3.9 | 3.3x | 🌟🌟🌟 | **Popular choice** |
38
+ | **Q4_K_S** | 3.8 | 3.4x | 🌟🌟🌟 | Efficient 4-bit |
39
+ | **Q4_0** | 3.8 | 3.4x | 🌟🌟🌟 | **Most compatible** |
40
+ | **Q3_K_L** | 3.4 | 3.8x | 🌟🌟 | High compression |
41
+ | **Q3_K_M** | 3.2 | 4.0x | 🌟🌟 | Aggressive compression |
42
+ | **Q3_K_S** | 3.0 | 4.3x | 🌟🌟 | Maximum speed |
43
+ | **Q2_K** | 2.6 | 5.0x | 🌟 | Extreme compression |
44
+
45
+ ## 🚀 **Quick Start**
46
+
47
+ ### Using with Ollama
48
+ ```bash
49
+ # Option 1: Use pre-built model (requires Ollama ≥ 0.10.0)
50
+ ollama run gemma3n:e4b
51
+
52
+ # Option 2: Import custom quantization
53
+ ollama create my-gemma3n -f Modelfile
54
+ ```
55
+
56
+ ### Using with llama.cpp
57
+ ```bash
58
+ # Download your preferred quantization
59
+ ./llama-server -m gemma-3n-e4b-it-q4_k_m.gguf -c 4096
60
+
61
+ # For chat interface
62
+ ./llama-chat -m gemma-3n-e4b-it-q4_k_m.gguf --color -i
63
+ ```
64
+
65
+ ### Using with Python (llama-cpp-python)
66
+ ```python
67
+ from llama_cpp import Llama
68
+
69
+ # Load model
70
+ llm = Llama(
71
+ model_path="gemma-3n-e4b-it-q4_k_m.gguf",
72
+ n_ctx=4096,
73
+ n_threads=8,
74
+ verbose=False
75
+ )
76
+
77
+ # Generate response
78
+ response = llm.create_chat_completion(
79
+ messages=[
80
+ {"role": "user", "content": "Hello! Can you introduce yourself?"}
81
+ ],
82
+ max_tokens=512,
83
+ temperature=0.7
84
+ )
85
+
86
+ print(response['choices'][0]['message']['content'])
87
+ ```
88
+
89
+ ## 💬 **Chat Template**
90
+
91
+ Gemma 3n uses the following chat template:
92
+ ```
93
+ <bos><start_of_turn>user
94
+ {user_message}<end_of_turn>
95
+ <start_of_turn>model
96
+ {assistant_response}<end_of_turn>
97
+ ```
98
+
99
+ ### Recommended Parameters
100
+ ```yaml
101
+ temperature: 0.7
102
+ top_p: 0.9
103
+ top_k: 40
104
+ repeat_penalty: 1.1
105
+ max_tokens: 4096
106
+ stop_tokens: ["<end_of_turn>"]
107
+ ```
108
+
109
+ ## 📋 **Model Details**
110
+
111
+ - **Base Model**: [google/gemma-3n-e4b-it](https://huggingface.co/google/gemma-3n-e4b-it)
112
+ - **Architecture**: Gemma 3n (Effective 4B parameters)
113
+ - **Context Length**: 32,768 tokens
114
+ - **Vocabulary Size**: 256,000 tokens
115
+ - **Training Data**: Multilingual (140+ languages)
116
+ - **License**: Gemma License
117
+
118
+ ### Key Features
119
+ - ✅ **Efficient Architecture**: Uses selective parameter activation
120
+ - ✅ **Multilingual Support**: Trained on 140+ spoken languages
121
+ - ✅ **Instruction-Tuned**: Optimized for conversational AI
122
+ - ✅ **Extended Context**: 32K token context window
123
+ - ✅ **Hardware Optimized**: Designed for everyday devices
124
+
125
+ ## 🔧 **Quantization Details**
126
+
127
+ All models were quantized using the latest **llama.cpp** with full Gemma 3n architecture support:
128
+
129
+ ### Quantization Process
130
+ ```bash
131
+ # Example quantization command
132
+ ./llama-quantize input.gguf output.gguf q4_k_m
133
+ ```
134
+
135
+ ### Architecture-Specific Tensors
136
+ The quantization properly handles Gemma 3n-specific components:
137
+ - `altup_correct_coef`, `altup_correct_scale`, `altup_predict_coef`
138
+ - `altup_router`, `altup_router_norm`
139
+ - `laurel_l`, `laurel_r`, `laurel_post_norm`
140
+ - Standard attention, FFN, and normalization layers
141
+
142
+ ## 📈 **Performance Benchmarks**
143
+
144
+ Based on the original Gemma 3n E4B IT model:
145
+
146
+ | **Benchmark** | **Score** | **Category** |
147
+ |---------------|-----------|--------------|
148
+ | MMLU | 64.9% | General Knowledge |
149
+ | HumanEval | 75.0% | Code Generation |
150
+ | HellaSwag | 78.6% | Commonsense Reasoning |
151
+ | ARC-E | 81.6% | Elementary Science |
152
+ | TriviaQA | 70.2% | Factual Knowledge |
153
+ | MBPP | 63.6% | Programming |
154
+
155
+ ## 🎯 **Recommended Usage**
156
+
157
+ ### For Different Hardware:
158
+ - **High-end GPU (24GB+ VRAM)**: Q8_0 or F16
159
+ - **Mid-range GPU (12-16GB VRAM)**: Q6_K or Q5_K_M
160
+ - **Consumer GPU (8-12GB VRAM)**: Q4_K_M or Q5_K_S
161
+ - **CPU-only or low VRAM**: Q4_0 or Q3_K_M
162
+ - **Mobile/Edge devices**: Q3_K_S or Q2_K
163
+
164
+ ### For Different Use Cases:
165
+ - **Production chatbots**: Q5_K_M or Q4_K_M
166
+ - **Research/evaluation**: Q8_0 or F16
167
+ - **Rapid prototyping**: Q4_0
168
+ - **Resource-constrained**: Q3_K_M
169
+ - **Edge deployment**: Q2_K
170
+
171
+ ## 🛠️ **Technical Requirements**
172
+
173
+ ### Minimum System Requirements:
174
+ - **RAM**: 4GB (for Q2_K) to 16GB (for F16)
175
+ - **Storage**: 3GB to 13GB depending on quantization
176
+ - **CPU**: Modern x64 processor with AVX support
177
+ - **GPU** (optional): CUDA, OpenCL, or Metal support
178
+
179
+ ### Software Dependencies:
180
+ - **llama.cpp**: Latest version with Gemma 3n support
181
+ - **Python**: 3.8+ for llama-cpp-python
182
+ - **Ollama**: 0.10.0+ for official support
183
+
184
+ ## 📚 **Usage Examples**
185
+
186
+ ### Example Conversations:
187
+ ```
188
+ User: What is Gemma 3n and how does it differ from other models?
189
+
190
+ Gemma 3n: I'm Gemma 3n, a helpful AI assistant created by Google. I'm part of the Gemma 3n family, which is designed for efficient execution on everyday devices like laptops, tablets, and phones.
191
+
192
+ What makes me different from other models:
193
+ - **Selective Parameter Activation**: I use an innovative technology that allows me to operate effectively with 4B parameters while containing more total parameters
194
+ - **Multilingual Capabilities**: I was trained on data in over 140 spoken languages
195
+ - **Hardware Efficiency**: I'm optimized to run well on consumer hardware, not just high-end servers
196
+ - **Extended Context**: I can handle up to 32K tokens in context
197
+ - **Instruction-Tuned**: I'm specifically fine-tuned for conversational interactions
198
+
199
+ User: Can you help me write a Python function to calculate fibonacci numbers?
200
+
201
+ Gemma 3n: Certainly! Here are a few different approaches to calculate Fibonacci numbers:
202
+
203
+ def fibonacci_recursive(n):
204
+ """Recursive approach - simple but inefficient for large n"""
205
+ if n <= 1:
206
+ return n
207
+ return fibonacci_recursive(n-1) + fibonacci_recursive(n-2)
208
+
209
+ def fibonacci_iterative(n):
210
+ """Iterative approach - efficient and readable"""
211
+ if n <= 1:
212
+ return n
213
+
214
+ a, b = 0, 1
215
+ for _ in range(2, n + 1):
216
+ a, b = b, a + b
217
+ return b
218
+
219
+ def fibonacci_generator(limit):
220
+ """Generator for Fibonacci sequence up to limit"""
221
+ a, b = 0, 1
222
+ while a < limit:
223
+ yield a
224
+ a, b = b, a + b
225
+
226
+ # Examples:
227
+ print(fibonacci_iterative(10)) # Output: 55
228
+ print(list(fibonacci_generator(100))) # All Fib numbers < 100
229
+ ```
230
+
231
+ ## 🔧 **Troubleshooting**
232
+
233
+ ### Common Issues:
234
+
235
+ **1. "unknown model architecture" error:**
236
+ - Ensure you're using the latest llama.cpp with Gemma 3n support
237
+ - Update to llama.cpp commit `8846aace` or later
238
+
239
+ **2. Ollama compatibility issues:**
240
+ - Update Ollama to version 0.10.0 or later for native Gemma 3n support
241
+ - For older Ollama versions, use custom Modelfile approach
242
+
243
+ **3. Out of memory errors:**
244
+ - Try a smaller quantization (Q4_0, Q3_K_M, or Q2_K)
245
+ - Reduce context length with `-c` parameter
246
+ - Use CPU inference instead of GPU
247
+
248
+ **4. Slow inference:**
249
+ - Use GPU acceleration if available
250
+ - Try quantizations like Q4_K_M or Q4_0 for better speed
251
+ - Adjust thread count with `-t` parameter
252
+
253
+ ### Performance Tips:
254
+ - **GPU users**: Q4_K_M or Q5_K_M offer the best speed/quality balance
255
+ - **CPU users**: Q4_0 provides good compatibility and reasonable speed
256
+ - **Memory-constrained**: Q3_K_M or Q2_K can run on limited hardware
257
+
258
+ ## ⚠️ **Limitations**
259
+
260
+ - **Quantization Quality**: Lower bit quantizations (Q2_K, Q3_K) may have reduced quality
261
+ - **Context Length**: While supporting 32K tokens, performance may degrade with very long contexts
262
+ - **Language Coverage**: While multilingual, performance may vary across different languages
263
+ - **Knowledge Cutoff**: Training data has a cutoff date (June 2024)
264
+ - **Factual Accuracy**: May generate plausible but incorrect information
265
+ - **Code Generation**: While capable, may require verification for production use
266
+
267
+ ## 🔒 **Ethical Considerations**
268
+
269
+ - **Bias**: Model may reflect biases present in training data
270
+ - **Safety**: Content filtering is not included in these GGUF versions
271
+ - **Verification**: Always verify outputs, especially for critical applications
272
+ - **Responsible Use**: Follow the Gemma License and ethical AI guidelines
273
+
274
+ ## 📄 **Files Included**
275
+
276
+ ```
277
+ gemma-3n-e4b-it-f16.gguf # 13.0 GB - Full precision
278
+ gemma-3n-e4b-it-q8_0.gguf # 6.8 GB - 8-bit quantization
279
+ gemma-3n-e4b-it-q6_k.gguf # 5.3 GB - 6-bit K-quant
280
+ gemma-3n-e4b-it-q5_1.gguf # 4.9 GB - 5-bit (high quality)
281
+ gemma-3n-e4b-it-q5_k_m.gguf # 4.6 GB - 5-bit K-quant medium
282
+ gemma-3n-e4b-it-q5_k_s.gguf # 4.5 GB - 5-bit K-quant small
283
+ gemma-3n-e4b-it-q5_0.gguf # 4.5 GB - 5-bit standard
284
+ gemma-3n-e4b-it-q4_1.gguf # 4.2 GB - 4-bit (improved)
285
+ gemma-3n-e4b-it-q4_k_m.gguf # 3.9 GB - 4-bit K-quant medium
286
+ gemma-3n-e4b-it-q4_k_s.gguf # 3.8 GB - 4-bit K-quant small
287
+ gemma-3n-e4b-it-q4_0.gguf # 3.8 GB - 4-bit standard
288
+ gemma-3n-e4b-it-q3_k_l.gguf # 3.4 GB - 3-bit K-quant large
289
+ gemma-3n-e4b-it-q3_k_m.gguf # 3.2 GB - 3-bit K-quant medium
290
+ gemma-3n-e4b-it-q3_k_s.gguf # 3.0 GB - 3-bit K-quant small
291
+ gemma-3n-e4b-it-q2_k.gguf # 2.6 GB - 2-bit K-quant
292
+ ```
293
+
294
+ ## 🙏 **Acknowledgments**
295
+
296
+ - **Google DeepMind**: For developing and releasing Gemma 3n
297
+ - **llama.cpp community**: For implementing Gemma 3n architecture support
298
+ - **Hugging Face**: For providing the model hosting platform
299
+ - **Quantization**: Performed using the latest llama.cpp tools
300
+
301
+ ## 📞 **Support & Community**
302
+
303
+ - **Issues**: Report problems in the repository issues
304
+ - **Discussions**: Join the community discussions
305
+ - **Updates**: Follow for model updates and improvements
306
+
307
+ ## 📜 **Citation**
308
+
309
+ If you use these models in your research, please cite:
310
+
311
+ ```bibtex
312
+ @article{gemma_3n_2025,
313
+ title={Gemma 3n},
314
+ url={https://ai.google.dev/gemma/docs/gemma-3n},
315
+ publisher={Google DeepMind},
316
+ author={Gemma Team},
317
+ year={2025}
318
+ }
319
+ ```
320
+
321
+ ---
322
+
323
+ **License**: This model is released under the Gemma License. Please review the license terms before use.
324
+
325
+ **Disclaimer**: These quantized models are provided for research and educational purposes. Users are responsible for ensuring compliance with applicable laws and ethical guidelines.
326
+
327
+
328
+ # Gemma 3n model card
329
+
330
+ **Model Page**: [Gemma 3n](https://ai.google.dev/gemma/docs/gemma-3n)
331
+
332
+ **Resources and Technical Documentation**:
333
+
334
+ - [Responsible Generative AI Toolkit](https://ai.google.dev/responsible)
335
+ - [Gemma on Kaggle](https://www.kaggle.com/models/google/gemma-3n)
336
+ - [Gemma on HuggingFace](https://huggingface.co/collections/google/gemma-3n-685065323f5984ef315c93f4)
337
+ - [Gemma on Vertex Model Garden](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/gemma3n)
338
+
339
+ **Terms of Use**: [Terms](https://ai.google.dev/gemma/terms)\
340
+ **Authors**: Google DeepMind
341
+
342
+ ## Model Information
343
+
344
+ Summary description and brief definition of inputs and outputs.
345
+
346
+ ### Description
347
+
348
+ Gemma is a family of lightweight, state-of-the-art open models from Google,
349
+ built from the same research and technology used to create the Gemini models.
350
+ Gemma 3n models are designed for efficient execution on low-resource devices.
351
+ They are capable of multimodal input, handling text, image, video, and audio
352
+ input, and generating text outputs, with open weights for pre-trained and
353
+ instruction-tuned variants. These models were trained with data in over 140
354
+ spoken languages.
355
+
356
+ Gemma 3n models use selective parameter activation technology to reduce resource
357
+ requirements. This technique allows the models to operate at an effective size
358
+ of 2B and 4B parameters, which is lower than the total number of parameters they
359
+ contain. For more information on Gemma 3n's efficient parameter management
360
+ technology, see the
361
+ [Gemma 3n](https://ai.google.dev/gemma/docs/gemma-3n#parameters)
362
+ page.
363
+
364
+ ### Inputs and outputs
365
+
366
+ - **Input:**
367
+ - Text string, such as a question, a prompt, or a document to be
368
+ summarized
369
+ - Images, normalized to 256x256, 512x512, or 768x768 resolution
370
+ and encoded to 256 tokens each
371
+ - Audio data encoded to 6.25 tokens per second from a single channel
372
+ - Total input context of 32K tokens
373
+ - **Output:**
374
+ - Generated text in response to the input, such as an answer to a
375
+ question, analysis of image content, or a summary of a document
376
+ - Total output length up to 32K tokens, subtracting the request
377
+ input tokens
378
+
379
+ ### Usage
380
+
381
+ Below, there are some code snippets on how to get quickly started with running
382
+ the model. First, install the Transformers library. Gemma 3n is supported
383
+ starting from transformers 4.53.0.
384
+
385
+ ```sh
386
+ $ pip install -U transformers
387
+ ```
388
+
389
+ Then, copy the snippet from the section that is relevant for your use case.
390
+
391
+ #### Running with the `pipeline` API
392
+
393
+ You can initialize the model and processor for inference with `pipeline` as
394
+ follows.
395
+
396
+ ```python
397
+ from transformers import pipeline
398
+ import torch
399
+
400
+ pipe = pipeline(
401
+ "image-text-to-text",
402
+ model="google/gemma-3n-e4b-it",
403
+ device="cuda",
404
+ torch_dtype=torch.bfloat16,
405
+ )
406
+ ```
407
+
408
+ With instruction-tuned models, you need to use chat templates to process our
409
+ inputs first. Then, you can pass it to the pipeline.
410
+
411
+ ```python
412
+ messages = [
413
+ {
414
+ "role": "system",
415
+ "content": [{"type": "text", "text": "You are a helpful assistant."}]
416
+ },
417
+ {
418
+ "role": "user",
419
+ "content": [
420
+ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
421
+ {"type": "text", "text": "What animal is on the candy?"}
422
+ ]
423
+ }
424
+ ]
425
+
426
+ output = pipe(text=messages, max_new_tokens=200)
427
+ print(output[0]["generated_text"][-1]["content"])
428
+ # Okay, let's take a look!
429
+ # Based on the image, the animal on the candy is a **turtle**.
430
+ # You can see the shell shape and the head and legs.
431
+ ```
432
+
433
+ #### Running the model on a single GPU
434
+
435
+ ```python
436
+ from transformers import AutoProcessor, Gemma3nForConditionalGeneration
437
+ from PIL import Image
438
+ import requests
439
+ import torch
440
+
441
+ model_id = "google/gemma-3n-e4b-it"
442
+
443
+ model = Gemma3nForConditionalGeneration.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16,).eval()
444
+
445
+ processor = AutoProcessor.from_pretrained(model_id)
446
+
447
+ messages = [
448
+ {
449
+ "role": "system",
450
+ "content": [{"type": "text", "text": "You are a helpful assistant."}]
451
+ },
452
+ {
453
+ "role": "user",
454
+ "content": [
455
+ {"type": "image", "image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
456
+ {"type": "text", "text": "Describe this image in detail."}
457
+ ]
458
+ }
459
+ ]
460
+
461
+ inputs = processor.apply_chat_template(
462
+ messages,
463
+ add_generation_prompt=True,
464
+ tokenize=True,
465
+ return_dict=True,
466
+ return_tensors="pt",
467
+ ).to(model.device)
468
+
469
+ input_len = inputs["input_ids"].shape[-1]
470
+
471
+ with torch.inference_mode():
472
+ generation = model.generate(**inputs, max_new_tokens=100, do_sample=False)
473
+ generation = generation[0][input_len:]
474
+
475
+ decoded = processor.decode(generation, skip_special_tokens=True)
476
+ print(decoded)
477
+
478
+ # **Overall Impression:** The image is a close-up shot of a vibrant garden scene,
479
+ # focusing on a cluster of pink cosmos flowers and a busy bumblebee.
480
+ # It has a slightly soft, natural feel, likely captured in daylight.
481
+ ```
482
+
483
+ ### Citation
484
+
485
+ ```
486
+ @article{gemma_3n_2025,
487
+ title={Gemma 3n},
488
+ url={https://ai.google.dev/gemma/docs/gemma-3n},
489
+ publisher={Google DeepMind},
490
+ author={Gemma Team},
491
+ year={2025}
492
+ }
493
+ ```
494
+
495
+ ## Model Data
496
+
497
+ Data used for model training and how the data was processed.
498
+
499
+ ### Training Dataset
500
+
501
+ These models were trained on a dataset that includes a wide variety of sources
502
+ totalling approximately 11 trillion tokens. The knowledge cutoff date for the
503
+ training data was June 2024. Here are the key components:
504
+
505
+ - **Web Documents**: A diverse collection of web text ensures the model
506
+ is exposed to a broad range of linguistic styles, topics, and vocabulary.
507
+ The training dataset includes content in over 140 languages.
508
+ - **Code**: Exposing the model to code helps it to learn the syntax and
509
+ patterns of programming languages, which improves its ability to generate
510
+ code and understand code-related questions.
511
+ - **Mathematics**: Training on mathematical text helps the model learn
512
+ logical reasoning, symbolic representation, and to address mathematical queries.
513
+ - **Images**: A wide range of images enables the model to perform image
514
+ analysis and visual data extraction tasks.
515
+ - Audio: A diverse set of sound samples enables the model to recognize
516
+ speech, transcribe text from recordings, and identify information in audio data.
517
+
518
+ The combination of these diverse data sources is crucial for training a
519
+ powerful multimodal model that can handle a wide variety of different tasks and
520
+ data formats.
521
+
522
+ ### Data Preprocessing
523
+
524
+ Here are the key data cleaning and filtering methods applied to the training
525
+ data:
526
+
527
+ - **CSAM Filtering**: Rigorous CSAM (Child Sexual Abuse Material)
528
+ filtering was applied at multiple stages in the data preparation process to
529
+ ensure the exclusion of harmful and illegal content.
530
+ - **Sensitive Data Filtering**: As part of making Gemma pre-trained models
531
+ safe and reliable, automated techniques were used to filter out certain
532
+ personal information and other sensitive data from training sets.
533
+ - **Additional methods**: Filtering based on content quality and safety in
534
+ line with
535
+ [our policies](https://ai.google/static/documents/ai-responsibility-update-published-february-2025.pdf).
536
+
537
+ ## Implementation Information
538
+
539
+ Details about the model internals.
540
+
541
+ ### Hardware
542
+
543
+ Gemma was trained using [Tensor Processing Unit
544
+ (TPU)](https://cloud.google.com/tpu/docs/intro-to-tpu) hardware (TPUv4p, TPUv5p
545
+ and TPUv5e). Training generative models requires significant computational
546
+ power. TPUs, designed specifically for matrix operations common in machine
547
+ learning, offer several advantages in this domain:
548
+
549
+ - **Performance**: TPUs are specifically designed to handle the massive
550
+ computations involved in training generative models. They can speed up
551
+ training considerably compared to CPUs.
552
+ - **Memory**: TPUs often come with large amounts of high-bandwidth memory,
553
+ allowing for the handling of large models and batch sizes during training.
554
+ This can lead to better model quality.
555
+ - **Scalability**: TPU Pods (large clusters of TPUs) provide a scalable
556
+ solution for handling the growing complexity of large foundation models.
557
+ You can distribute training across multiple TPU devices for faster and more
558
+ efficient processing.
559
+ - **Cost-effectiveness**: In many scenarios, TPUs can provide a more
560
+ cost-effective solution for training large models compared to CPU-based
561
+ infrastructure, especially when considering the time and resources saved
562
+ due to faster training.
563
+
564
+ These advantages are aligned with
565
+ [Google's commitments to operate sustainably](https://sustainability.google/operating-sustainably/).
566
+
567
+ ### Software
568
+
569
+ Training was done using [JAX](https://github.com/jax-ml/jax) and
570
+ [ML Pathways](https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/).
571
+ JAX allows researchers to take advantage of the latest generation of hardware,
572
+ including TPUs, for faster and more efficient training of large models. ML
573
+ Pathways is Google's latest effort to build artificially intelligent systems
574
+ capable of generalizing across multiple tasks. This is specially suitable for
575
+ foundation models, including large language models like these ones.
576
+
577
+ Together, JAX and ML Pathways are used as described in the
578
+ [paper about the Gemini family of models](https://goo.gle/gemma2report):
579
+ *"the 'single controller' programming model of Jax and Pathways allows a single
580
+ Python process to orchestrate the entire training run, dramatically simplifying
581
+ the development workflow."*
582
+
583
+ ## Evaluation
584
+
585
+ Model evaluation metrics and results.
586
+
587
+ ### Benchmark Results
588
+
589
+ These models were evaluated at full precision (float32) against a large
590
+ collection of different datasets and metrics to cover different aspects of
591
+ content generation. Evaluation results marked with **IT** are for
592
+ instruction-tuned models. Evaluation results marked with **PT** are for
593
+ pre-trained models.
594
+
595
+ #### Reasoning and factuality
596
+
597
+ | Benchmark | Metric | n-shot | E2B PT | E4B PT |
598
+ | ------------------------------ |----------------|----------|:--------:|:--------:|
599
+ | [HellaSwag][hellaswag] | Accuracy | 10-shot | 72.2 | 78.6 |
600
+ | [BoolQ][boolq] | Accuracy | 0-shot | 76.4 | 81.6 |
601
+ | [PIQA][piqa] | Accuracy | 0-shot | 78.9 | 81.0 |
602
+ | [SocialIQA][socialiqa] | Accuracy | 0-shot | 48.8 | 50.0 |
603
+ | [TriviaQA][triviaqa] | Accuracy | 5-shot | 60.8 | 70.2 |
604
+ | [Natural Questions][naturalq] | Accuracy | 5-shot | 15.5 | 20.9 |
605
+ | [ARC-c][arc] | Accuracy | 25-shot | 51.7 | 61.6 |
606
+ | [ARC-e][arc] | Accuracy | 0-shot | 75.8 | 81.6 |
607
+ | [WinoGrande][winogrande] | Accuracy | 5-shot | 66.8 | 71.7 |
608
+ | [BIG-Bench Hard][bbh] | Accuracy | few-shot | 44.3 | 52.9 |
609
+ | [DROP][drop] | Token F1 score | 1-shot | 53.9 | 60.8 |
610
+
611
+ [hellaswag]: https://arxiv.org/abs/1905.07830
612
+ [boolq]: https://arxiv.org/abs/1905.10044
613
+ [piqa]: https://arxiv.org/abs/1911.11641
614
+ [socialiqa]: https://arxiv.org/abs/1904.09728
615
+ [triviaqa]: https://arxiv.org/abs/1705.03551
616
+ [naturalq]: https://github.com/google-research-datasets/natural-questions
617
+ [arc]: https://arxiv.org/abs/1911.01547
618
+ [winogrande]: https://arxiv.org/abs/1907.10641
619
+ [bbh]: https://paperswithcode.com/dataset/bbh
620
+ [drop]: https://arxiv.org/abs/1903.00161
621
+
622
+ #### Multilingual
623
+
624
+ | Benchmark | Metric | n-shot | E2B IT | E4B IT |
625
+ | ------------------------------------|-------------------------|----------|:--------:|:--------:|
626
+ | [MGSM][mgsm] | Accuracy | 0-shot | 53.1 | 60.7 |
627
+ | [WMT24++][wmt24pp] (ChrF) | Character-level F-score | 0-shot | 42.7 | 50.1 |
628
+ | [Include][include] | Accuracy | 0-shot | 38.6 | 57.2 |
629
+ | [MMLU][mmlu] (ProX) | Accuracy | 0-shot | 8.1 | 19.9 |
630
+ | [OpenAI MMLU][openai-mmlu] | Accuracy | 0-shot | 22.3 | 35.6 |
631
+ | [Global-MMLU][global-mmlu] | Accuracy | 0-shot | 55.1 | 60.3 |
632
+ | [ECLeKTic][eclektic] | ECLeKTic score | 0-shot | 2.5 | 1.9 |
633
+
634
+ [mgsm]: https://arxiv.org/abs/2210.03057
635
+ [wmt24pp]: https://arxiv.org/abs/2502.12404v1
636
+ [include]:https://arxiv.org/abs/2411.19799
637
+ [mmlu]: https://arxiv.org/abs/2009.03300
638
+ [openai-mmlu]: https://huggingface.co/datasets/openai/MMMLU
639
+ [global-mmlu]: https://huggingface.co/datasets/CohereLabs/Global-MMLU
640
+ [eclektic]: https://arxiv.org/abs/2502.21228
641
+
642
+ #### STEM and code
643
+
644
+ | Benchmark | Metric | n-shot | E2B IT | E4B IT |
645
+ | ------------------------------------|--------------------------|----------|:--------:|:--------:|
646
+ | [GPQA][gpqa] Diamond | RelaxedAccuracy/accuracy | 0-shot | 24.8 | 23.7 |
647
+ | [LiveCodeBench][lcb] v5 | pass@1 | 0-shot | 18.6 | 25.7 |
648
+ | Codegolf v2.2 | pass@1 | 0-shot | 11.0 | 16.8 |
649
+ | [AIME 2025][aime-2025] | Accuracy | 0-shot | 6.7 | 11.6 |
650
+
651
+ [gpqa]: https://arxiv.org/abs/2311.12022
652
+ [lcb]: https://arxiv.org/abs/2403.07974
653
+ [aime-2025]: https://www.vals.ai/benchmarks/aime-2025-05-09
654
+
655
+ #### Additional benchmarks
656
+
657
+ | Benchmark | Metric | n-shot | E2B IT | E4B IT |
658
+ | ------------------------------------ |------------|----------|:--------:|:--------:|
659
+ | [MMLU][mmlu] | Accuracy | 0-shot | 60.1 | 64.9 |
660
+ | [MBPP][mbpp] | pass@1 | 3-shot | 56.6 | 63.6 |
661
+ | [HumanEval][humaneval] | pass@1 | 0-shot | 66.5 | 75.0 |
662
+ | [LiveCodeBench][lcb] | pass@1 | 0-shot | 13.2 | 13.2 |
663
+ | HiddenMath | Accuracy | 0-shot | 27.7 | 37.7 |
664
+ | [Global-MMLU-Lite][global-mmlu-lite] | Accuracy | 0-shot | 59.0 | 64.5 |
665
+ | [MMLU][mmlu] (Pro) | Accuracy | 0-shot | 40.5 | 50.6 |
666
+
667
+ [gpqa]: https://arxiv.org/abs/2311.12022
668
+ [mbpp]: https://arxiv.org/abs/2108.07732
669
+ [humaneval]: https://arxiv.org/abs/2107.03374
670
+ [lcb]: https://arxiv.org/abs/2403.07974
671
+ [global-mmlu-lite]: https://huggingface.co/datasets/CohereForAI/Global-MMLU-Lite
672
+
673
+ ## Ethics and Safety
674
+
675
+ Ethics and safety evaluation approach and results.
676
+
677
+ ### Evaluation Approach
678
+
679
+ Our evaluation methods include structured evaluations and internal red-teaming
680
+ testing of relevant content policies. Red-teaming was conducted by a number of
681
+ different teams, each with different goals and human evaluation metrics. These
682
+ models were evaluated against a number of different categories relevant to
683
+ ethics and safety, including:
684
+
685
+ - **Child Safety**: Evaluation of text-to-text and image to text prompts
686
+ covering child safety policies, including child sexual abuse and
687
+ exploitation.
688
+ - **Content Safety:** Evaluation of text-to-text and image to text prompts
689
+ covering safety policies including, harassment, violence and gore, and hate
690
+ speech.
691
+ - **Representational Harms**: Evaluation of text-to-text and image to text
692
+ prompts covering safety policies including bias, stereotyping, and harmful
693
+ associations or inaccuracies.
694
+
695
+ In addition to development level evaluations, we conduct "assurance
696
+ evaluations" which are our 'arms-length' internal evaluations for responsibility
697
+ governance decision making. They are conducted separately from the model
698
+ development team, to inform decision making about release. High level findings
699
+ are fed back to the model team, but prompt sets are held-out to prevent
700
+ overfitting and preserve the results' ability to inform decision making. Notable
701
+ assurance evaluation results are reported to our Responsibility & Safety Council
702
+ as part of release review.
703
+
704
+ ### Evaluation Results
705
+
706
+ For all areas of safety testing, we saw safe levels of performance across the
707
+ categories of child safety, content safety, and representational harms relative
708
+ to previous Gemma models. All testing was conducted without safety filters to
709
+ evaluate the model capabilities and behaviors. For text-to-text, image-to-text,
710
+ and audio-to-text, and across all model sizes, the model produced minimal policy
711
+ violations, and showed significant improvements over previous Gemma models'
712
+ performance with respect to high severity violations. A limitation of our
713
+ evaluations was they included primarily English language prompts.
714
+
715
+ ## Usage and Limitations
716
+
717
+ These models have certain limitations that users should be aware of.
718
+
719
+ ### Intended Usage
720
+
721
+ Open generative models have a wide range of applications across various
722
+ industries and domains. The following list of potential uses is not
723
+ comprehensive. The purpose of this list is to provide contextual information
724
+ about the possible use-cases that the model creators considered as part of model
725
+ training and development.
726
+
727
+ - Content Creation and Communication
728
+ - **Text Generation**: Generate creative text formats such as
729
+ poems, scripts, code, marketing copy, and email drafts.
730
+ - **Chatbots and Conversational AI**: Power conversational
731
+ interfaces for customer service, virtual assistants, or interactive
732
+ applications.
733
+ - **Text Summarization**: Generate concise summaries of a text
734
+ corpus, research papers, or reports.
735
+ - **Image Data Extraction**: Extract, interpret, and summarize
736
+ visual data for text communications.
737
+ - **Audio Data Extraction**: Transcribe spoken language, translate speech
738
+ to text in other languages, and analyze sound-based data.
739
+ - Research and Education
740
+ - **Natural Language Processing (NLP) and generative model
741
+ Research**: These models can serve as a foundation for researchers to
742
+ experiment with generative models and NLP techniques, develop
743
+ algorithms, and contribute to the advancement of the field.
744
+ - **Language Learning Tools**: Support interactive language
745
+ learning experiences, aiding in grammar correction or providing writing
746
+ practice.
747
+ - **Knowledge Exploration**: Assist researchers in exploring large
748
+ bodies of data by generating summaries or answering questions about
749
+ specific topics.
750
+
751
+ ### Limitations
752
+
753
+ - Training Data
754
+ - The quality and diversity of the training data significantly
755
+ influence the model's capabilities. Biases or gaps in the training data
756
+ can lead to limitations in the model's responses.
757
+ - The scope of the training dataset determines the subject areas
758
+ the model can handle effectively.
759
+ - Context and Task Complexity
760
+ - Models are better at tasks that can be framed with clear
761
+ prompts and instructions. Open-ended or highly complex tasks might be
762
+ challenging.
763
+ - A model's performance can be influenced by the amount of context
764
+ provided (longer context generally leads to better outputs, up to a
765
+ certain point).
766
+ - Language Ambiguity and Nuance
767
+ - Natural language is inherently complex. Models might struggle
768
+ to grasp subtle nuances, sarcasm, or figurative language.
769
+ - Factual Accuracy
770
+ - Models generate responses based on information they learned
771
+ from their training datasets, but they are not knowledge bases. They
772
+ may generate incorrect or outdated factual statements.
773
+ - Common Sense
774
+ - Models rely on statistical patterns in language. They might
775
+ lack the ability to apply common sense reasoning in certain situations.
776
+
777
+ ### Ethical Considerations and Risks
778
+
779
+ The development of generative models raises several ethical concerns. In
780
+ creating an open model, we have carefully considered the following:
781
+
782
+ - Bias and Fairness
783
+ - Generative models trained on large-scale, real-world text and image data
784
+ can reflect socio-cultural biases embedded in the training material.
785
+ These models underwent careful scrutiny, input data pre-processing
786
+ described and posterior evaluations reported in this card.
787
+ - Misinformation and Misuse
788
+ - Generative models can be misused to generate text that is
789
+ false, misleading, or harmful.
790
+ - Guidelines are provided for responsible use with the model, see the
791
+ [Responsible Generative AI Toolkit](https://ai.google.dev/responsible).
792
+ - Transparency and Accountability:
793
+ - This model card summarizes details on the models' architecture,
794
+ capabilities, limitations, and evaluation processes.
795
+ - A responsibly developed open model offers the opportunity to
796
+ share innovation by making generative model technology accessible to
797
+ developers and researchers across the AI ecosystem.
798
+
799
+ Risks identified and mitigations:
800
+
801
+ - **Perpetuation of biases**: It's encouraged to perform continuous monitoring
802
+ (using evaluation metrics, human review) and the exploration of de-biasing
803
+ techniques during model training, fine-tuning, and other use cases.
804
+ - **Generation of harmful content**: Mechanisms and guidelines for content
805
+ safety are essential. Developers are encouraged to exercise caution and
806
+ implement appropriate content safety safeguards based on their specific
807
+ product policies and application use cases.
808
+ - **Misuse for malicious purposes**: Technical limitations and developer
809
+ and end-user education can help mitigate against malicious applications of
810
+ generative models. Educational resources and reporting mechanisms for users
811
+ to flag misuse are provided. Prohibited uses of Gemma models are outlined
812
+ in the
813
+ [Gemma Prohibited Use Policy](https://ai.google.dev/gemma/prohibited_use_policy).
814
+ - **Privacy violations**: Models were trained on data filtered for removal of
815
+ certain personal information and other sensitive data. Developers are
816
+ encouraged to adhere to privacy regulations with privacy-preserving
817
+ techniques.
818
+
819
+ ### Benefits
820
+
821
+ At the time of release, this family of models provides high-performance open
822
+ generative model implementations designed from the ground up for responsible AI
823
+ development compared to similarly sized models.
824
+
825
+ Using the benchmark evaluation metrics described in this document, these models
826
+ have shown to provide superior performance to other, comparably-sized open model
827
+ alternatives.