Gemma2-2B-Swahili-Preview

Gemma2-2B-Swahili-Preview is a Swahili variation of the base language model Gemma2 2B fine-tuned on the Inkuba-Mono Swahili dataset, designed to enhance Swahili language understanding through monolingual training.

Model Details

  • Developer: Alfaxad Eyembe
  • Base Model: google/gemma-2-2b
  • Model Type: Decoder-only transformer
  • Language: Swahili
  • License: Apache 2.0
  • Fine-tuning Approach: Low-Rank Adaptation (LoRA)

Training Data

The model was fine-tuned on a focused subset of the Inkuba-Mono dataset:

  • 1,000,000 randomly selected examples
  • Total tokens: 60,831,073
  • Average text length: 101.33 characters
  • Diverse Swahili text sources including news, social media, and various domains

Training Details

  • Fine-tuning Method: LoRA
  • Training Steps: 2,500
  • Batch Size: 2
  • Gradient Accumulation Steps: 32
  • Learning Rate: 2e-4
  • Training Time: ~7.5 hours

image/png

Model Capabilities

This model is designed for:

  • Swahili text continuation
  • Natural language understanding
  • Contextual text generation
  • Base language modeling for Swahili

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("alfaxadeyembe/gemma2-2b-swahili-preview")
model = AutoModelForCausalLM.from_pretrained(
    "alfaxadeyembe/gemma2-2b-swahili-preview",
    device_map="auto",
    torch_dtype=torch.bfloat16
)

# Set to evaluation mode
model.eval()

# Example usage
prompt = "Katika soko la Kariakoo, teknolojia mpya imewezesha"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=500,
    do_sample=True,
    temperature=0.7,
    top_p=0.95
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Key Features

  • Natural Swahili text continuation
  • Strong cultural context understanding
  • Efficient parameter updates through LoRA
  • Diverse domain knowledge integration

Limitations

  • Not instruction-tuned
  • Base language modeling capabilities
  • Performance varies across different text domains

Citation

@misc{gemma2-2b-swahili-preview,
  author = {Alfaxad Eyembe},
  title = {Gemma2-2B-Swahili-Preview: Swahili Variation of Gemma2 2B},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
}

Contact

For questions or feedback, please reach out through:

Downloads last month
32
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Alfaxad/gemma2-2b-swahili-preview

Base model

google/gemma-2-2b
Finetuned
(482)
this model
Quantizations
1 model

Dataset used to train Alfaxad/gemma2-2b-swahili-preview

Collection including Alfaxad/gemma2-2b-swahili-preview