You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

UgannA Siyabasa — FastText Sinhala Embedding Model 🇱🇰

Note : This is a demo version of the model and we will drop the final Model soon.

UgannA Siyabasa (උගන්නැ සියබස) is the first public FastText embedding model released by Remeinium Corp. The name comes from Kumaratunga Munidasa’s timeless quote:

“උගන්නැ සියබස – මත් වන්නැ එහි රසයෙන්” Learn Sinhala – be intoxicated with its beauty.

Just as Munidasa envisioned nurturing the Sinhala language, this model represents teaching it to machines.

📌 Key Features

Type: FastText (official library)
Vector size: 100 dimensions
File size: ~1.56GB
Training data: 6.2GB processed Sinhala text

🔧 Usage

import fasttext

# Load the model
model = fasttext.load_model("Remeinium/UgannA_Siyabasa/UgannA_Siyabasa.bin")

# Get vector for a word
vector = model.get_word_vector("අම්මා")

# Get nearest neighbors
neighbors = model.get_nearest_neighbors("අම්මා", k=10)
print(neighbors)

📂 Training Data

Processed and cleaned training corpus: ~6.2GB
Preprocessing: tokenization, normalization, deduplication

🗜️ License

This model is released under CC BY-NC 4.0 (non-commercial use). 🔗 For commercial usage, please contact: [email protected]

⚠️ Limitations

Vocabulary coverage limited to training dataset.
May reflect cultural/linguistic biases from sources.
Optimized for Sinhala; not multilingual (future versions will expand).

🤝 Collaboration

You are welcome to:

Use this model for research & personal projects
Share improvements, benchmarks, or downstream applications

Contact : 📧 [email protected]

Downloads last month: -