A newer version of this model is available: google/gemma-3-4b-it

🧠 Lao Summarization Model ສະຫລຸບເນື້ອຫາສຳລັບພາສາລາວ - Fine-tuned Gemma 3 4B IT (10,000 Pairs, Laos Input-Output)

This is a Lao language summarization model fine-tuned on the Phonepadith/laos_word_dataset, using the base model google/gemma-3-4b-it. The model is designed to generate concise summaries from Lao language text.


🧠 Lao AIDC-10K Fine-tuned Gemma-3-4B-IT

Model ID: Phonepadith/aidc-llm-laos-10k-gemma-3-4b-it
Base Model: google/gemma-3b-it
Fine-tuned By: Phonepadith Phoummavong


📌 Model Description

This model is a fine-tuned version of Gemma-3-4B-IT, specifically adapted to understand and generate responses in Lao language 🇱🇦. It was trained using a curated dataset of over 5,000 high-quality Lao input-output pairs, primarily focused on AIDC (Artificial Intelligence and Digital Content) topics.

Key Features:

  • 🗣️ Fine-tuned for Lao language generation
  • 📚 Suitable for summarization, question answering, general chat
  • 🧠 Based on Google's powerful Gemma 3-4B Instruct model

🧾 Training Details

Detail Value
Base Model Gemma 3-4B Instruct
Fine-tuning Method LoRA with PEFT (Unsloth)
Dataset 10,000 Laos supervised samples
Sequence Length 2048
Batch Size 2 (with gradient accumulation)
Optimizer AdamW
Epochs 3–5 (early stopping enabled)
Format GGUF (F32, F16, Q8_0 available)

📥 How to Use (LM Studio)

  1. Install LM Studio: https://lmstudio.ai
  2. Import the Model:
    • Via Hugging Face: Search for Phonepadith/aidc-llm-laos-10k-gemma-3-4b-it
    • Or drag the .gguf file into LM Studio
  3. Set System Prompt:

📌 Model Details


📊 Metrics

  • Evaluation Metric: BLEU score
    BLEU is used to evaluate the quality of generated summaries against reference summaries in the dataset.

🛠️ How to Use

You can load and use the model with Hugging Face Transformers and adapter-transformers:


from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Phonepadith/aidc-llm-laos-10k-gemma-3-4b-it"  # change to your actual model name
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

input_text = "ປັດຈຸບັນ ກອງທັບປະຊາຊົນລາວ ມີການປະກອບວັດຖຸເຕັກນິກທັນສະໄໝສົມຄວນ, ສາມາດຕອບສະໜອງ ໃຫ້ແກ່ວຽກງານປ້ອງກັນຊາດ ໃນໄລຍະໃໝ່ ໄດ້ໂດຍພື້ນຖານ; ໄດ້ປະກອບສ່ວນຢ່າງຕັ້ງໜ້າເຂົ້າໃນການປ້ອງກັນ, ຄວບຄຸມໄພພິບັດ ແລະ ຊ່ວຍເຫລືອປະຊາຊົນ ຜູ້ປະສົບໄພພິບັດທຳມະຊາດຕ່າງໆທີ່ເກີດຂຶ້ນໃນຂອບເຂດທົ່ວປະເທດ. ພ້ອມນັ້ນ, ກໍໄດ້ເປັນເຈົ້າການປະກອບສ່ວນປັບປຸງກໍ່ສ້າງພື້ນ ຖານການເມືອງ, ກໍ່ສ້າງທ່າສະໜາມສົງຄາມປະຊາຊົນ 3 ຂັ້ນ ຕິດພັນກັບວຽກງານ 3 ສ້າງ ຢູ່ທ້ອງຖິ່ນຕາມ 4 ເນື້ອໃນ 4 ຄາດໝາຍ ແລະ ສືບທອດມູນເຊື້ອຄວາມສາມັກຄີ ກັບກອງທັບປະເທດເພື່ອນມິດ ສາກົນ, ປະຕິບັດນະໂຍບາຍເພີ່ມມິດຫລຸດຜ່ອນສັດຕູ, ຮັບປະກັນສະຖຽນລະພາບ ຂອງລະບອບການ ເມືອງ, ຮັກສາຄວາມສະຫງົບປອດໄພຕາມຊາຍແດນ"
inputs = tokenizer(input_text, return_tensors="pt")
summary_ids = model.generate(**inputs, max_new_tokens=100)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

print(summary)
Downloads last month
911
GGUF
Model size
4.55B params
Architecture
gemma3
Hardware compatibility
Log In to view the estimation
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Phonepadith/aidc-llm-laos-10k-gemma-3-4b-it

Quantized
(125)
this model

Dataset used to train Phonepadith/aidc-llm-laos-10k-gemma-3-4b-it

Space using Phonepadith/aidc-llm-laos-10k-gemma-3-4b-it 1