LinoM
/

bloomz-1b1MM

english-to-myanmar

Model card Files Files and versions

bloomz-1b1MM / README.md

LinoM's picture

Update README.md

6543f49 verified 4 months ago

|

2.87 kB

	---
	license: apache-2.0
	datasets:
	- flores200
	- opensubtitles
	- ai4bharat/indictrans2-en-my
	language:
	- en
	- my
	library_name: peft
	tags:
	- translation
	- myanmar
	- lora
	- bloomz
	- english-to-myanmar
	- QLoRA
	- transformers
	model_type: bloom
	base_model: bigscience/bloomz-1b1
	---

	# 🌸 BloomZ-1.1B LoRA Fine-tuned for English → Myanmar (Burmese) Translation

	Model Name: `LinoM/bloomz-1b1MM`
	Base Model: [`bigscience/bloomz-1b1`](https://huggingface.co/bigscience/bloomz-1b1)
	Fine-Tuning Method: QLoRA (4-bit LoRA adapters + 8-bit base model)
	Frameworks: Hugging Face Transformers + PEFT + BitsAndBytes
	Task: English to Myanmar Instruction-style Translation

	---

	## 🧠 Model Details

	\| Detail \| Value \|
	\|--------------------\|-----------------------------------------------\|
	\| Model Architecture \| BLOOMZ \|
	\| Base Model Size \| 1.1 Billion Parameters \|
	\| Fine-tuning Method \| LoRA with QLoRA (4-bit adapters) \|
	\| Optimizer \| `paged_adamw_8bit` \|
	\| Precision \| 4-bit LoRA + 8-bit Base \|
	\| Epochs \| 3–5 (variable per run) \|
	\| Batch Size \| 32 \|
	\| Language Pair \| English → Burmese (မြန်မာ) \|
	\| Tokenizer \| Bloom tokenizer (`bigscience/tokenizer`) \|

	---

	## 📚 Training Data

	The model was fine-tuned on a curated mix of open datasets including:

	- 🌍 FLORES200 (en–my)
	- 🎬 OpenSubtitles (Movie subtitles in Myanmar)
	- 📖 Custom Instruction-style translation datasets (8 use cases, 200+ pairs per use case)
	- 🗣️ ai4bharat/indictrans2-en-my (additional Burmese corpora)

	---

	## 📈 Evaluation

	\| Metric \| Score \|
	\|------------------\|---------\|
	\| BLEU \| 35–40 \|
	\| Translation Style \| Instructional, formal \|
	\| Human Evaluation \| ✓ Understood grammar and tone in 85% samples \|

	> ✅ The model excels at translating English prompts into formal Burmese suitable for education, scripts, and user guides.

	---

	## 🔧 How to Use

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
	from peft import PeftModel

	base = AutoModelForCausalLM.from_pretrained("bigscience/bloomz-1b1", load_in_8bit=True, device_map="auto")
	lora = PeftModel.from_pretrained(base, "LinoM/bloomz-1b1MM")
	tokenizer = AutoTokenizer.from_pretrained("bigscience/bloomz-1b1")

	translator = pipeline("text-generation", model=lora, tokenizer=tokenizer)

	text = "Translate into Burmese: What is your favorite subject?"
	output = translator(text, max_new_tokens=100)
	print(output[0]['generated_text'])