Model Description
This model is a Continued Pre-Training adaptation of Mistral-7B v0.3, extended to the Malagasy language.
Since the original Mistral-7B does not support Malagasy, this model demonstrates how continued pretraining can extend large language models to low-resource languages.
The resulting model improves fluency and coherence in Malagasy and provides a strong foundation for downstream Malagasy NLP tasks.
Intended Uses & Limitations
Use cases:
- Generating text in Malagasy
- Research on low-resource language adaptation
- Data augmentation for Malagasy NLP tasks
Limitations:
- Not instruction-tuned: responses may not always follow task instructions.
- May hallucinate or generate factually inaccurate information.
Training Details
- Base Model: Mistral-7B v0.3
- Method: Continued Pretraining with LoRA adapters
- Hardware: 1 × Tesla T4 (14.7 GB VRAM)
- Number of Epochs: 1
- Trainable parameters: ~604M (7.7% of 7.85B total)
- Aproximative Training Time: ~44 hours
Inference Example Usage
code:
# Import required libraries for model loading and text generation
from unsloth import FastLanguageModel
from transformers import TextStreamer
import torch
# Load the pretrained Malagasy LoRA model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="Lo-Renz-O/Mistral-7B-CPT-Malagasy-v0.1-LoRA",
max_seq_length=1024,
dtype=None,
load_in_4bit=True,
)
# Enable optimized inference
FastLanguageModel.for_inference(model)
# Define the prompt template for text generation
prompt = """Lahatsoratra
### Lohateny: {}
### Lahatsoratra:
{}"""
# Tokenize the prompt and move tensors to GPU
inputs = tokenizer(
[prompt.format("Madagasikara", "")],
return_tensors="pt",
).to("cuda")
# Initialize a streamer to display generated tokens in real-time
text_streamer = TextStreamer(tokenizer, skip_special_tokens=True)
# Generate text using the model with specific generation parameters
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.95,
repetition_penalty=1.0,
do_sample=True,
streamer=text_streamer,
)
output:
Lahatsoratra
### Lohateny: Madagasikara
### Lahatsoratra: I Madagasikara na Repoblikan' i Madagasikara dia firenena any amin' ny faritra atsimon' i Afrika,
voafaritr' i Maorisy ao avaratra-andrefana. Izy no lemaka fahaefatra indrindra eto an-tany (1 244 350 km²). Anisan’ ny
nosy lehibe indrindra eran-tany izy sady malaza amin’ny fisian’ny biby sy zavamaniry mampiavaka azy manokana ary manambatran’ny
ala trôpikaly. Firenen’ ny mponina maromaro isaky ny velaran-taniny ity firenena ity. Mizara roa lehibe ny vahoakan’ i Madagasikara
ka ny iray Malagasy avokoa (Malaio-Pôlineziana), fa ny faharoa Banto avy any amin’ ny morontsiraka atsinanan' i Afrika.
This mistral model was trained 2x faster with Unsloth and Huggingface's TRL library.
Model tree for Lo-Renz-O/Mistral-7B-CPT-Malagasy-v0.1-LoRA
Base model
mistralai/Mistral-7B-v0.3
Quantized
unsloth/mistral-7b-v0.3-bnb-4bit