PAWA: Swahili SML for Various Tasks
Overview
PAWA is a Swahili-specialized language model designed to excel in tasks requiring nuanced understanding and interaction in Swahili and English. It leverages supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) for improved performance and consistency. Below are the detailed model specifications, installation steps, usage examples, and its intended applications.
Model Details
- Model Name: Pawa-Gemma-Swahili-2B
- Model Type: PAWA
- Architecture:
- 2B Parameter Gemma-2 Base Model
- Enhanced with Swahili SFT and DPO datasets.
- Languages Supported:
- Swahili
- English
- Custom tokenizer for multi-language flexibility.
- Primary Use Cases:
- Contextually rich Swahili-focused tasks.
- General assistance and chat-based interactions.
- License: Custom/Contact Author for terms of use.
Installation and Setup
Ensure the necessary libraries are installed and up-to-date:
!pip uninstall transformers -y && pip install --upgrade --no-cache-dir "git+https://github.com/huggingface/transformers.git"
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install datasets
Model Loading
You can load the model using the following code snippet:
from unsloth import FastLanguageModel
import torch
model_name = "sartifyllc/Pawa-kaggle-gemma-2b"
max_seq_length = 2048
dtype = None
load_in_4bit = False
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=model_name,
max_seq_length=max_seq_length,
dtype=dtype,
load_in_4bit=load_in_4bit,
)
Chat Template Configuration
For a seamless conversational experience, configure the tokenizer with the appropriate chat template:
from unsloth.chat_templates import get_chat_template
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
tokenizer = get_chat_template(
tokenizer,
chat_template="chatml", # Supports templates like zephyr, chatml, mistral, etc.
mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"}, # ShareGPT style
map_eos_token=True, # Maps <|im_end|> to </s>
)
Usage Example
Generate a short story in Swahili:
messages = [{"from": "human", "value": "Tengeneza hadithi fupi"}]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
).to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(input_ids=inputs, streamer=text_streamer, max_new_tokens=128, use_cache=True)
Training and Fine-Tuning Details
- Base Model: Gemma-2-2B
- Continue Pre-Training: 3B Swahili Tokens
- Fine-tuning: Enhanced with Swahili SFT datasets for improved contextual understanding.
- Optimization: Includes DPO for deterministic and consistent responses.
Intended Use Cases
General Assistance:
Provides structured answers for general-purpose use.Interactive Q&A:
Designed for general-purpose chat environments.RAG (Retrieval-Augmented Generation):
Works best for RAG and specific use cases.
Limitations
Biases:
The model may exhibit biases inherent in its fine-tuning datasets.Generalization:
May struggle with tasks outside the trained domain.Hardware Requirements:
- Optimal performance requires GPUs with high memory (e.g., Tesla V100 or T4).
- Supports 4-bit quantization for reduced memory usage.
Feel free to reach out for further guidance or collaboration opportunities regarding PAWA!
- Downloads last month
- 75
Model tree for sartifyllc/Pawa-Gemma-Swahili-2B
Base model
google/gemma-2-2b