---
license: cc-by-nc-4.0
language:
- si
pipeline_tag: feature-extraction
library_name: fasttext
tags:
- sinhala
- embeddings
- fasttext
- low-resource-languages
- word-vector
- remeinium
- nlp
---
# UgannA Siyabasa — FastText Sinhala Embedding Model 🇱🇰

**Note : This is a demo version of the model and we will drop the final Model soon.**

**UgannA Siyabasa** (උගන්නැ සියබස) is the first public FastText embedding model released by **Remeinium Corp**.
The name comes from Kumaratunga Munidasa’s timeless quote:

> “උගන්නැ සියබස – මත් වන්නැ එහි රසයෙන්”
> *Learn Sinhala – be intoxicated with its beauty.*

Just as Munidasa envisioned nurturing the Sinhala language, this model represents teaching it to machines.

---

## 📌 Key Features

* **Type:** FastText (official library)
* **Vector size:** 100 dimensions
* **File size:** \~1.56GB
* **Training data:** 6.2GB processed Sinhala text

---

## 🔧 Usage

```python
import fasttext

# Load the model
model = fasttext.load_model("Remeinium/UgannA_Siyabasa/UgannA_Siyabasa.bin")

# Get vector for a word
vector = model.get_word_vector("අම්මා")

# Get nearest neighbors
neighbors = model.get_nearest_neighbors("අම්මා", k=10)
print(neighbors)
```

---

## 📂 Training Data

* **Processed and cleaned training corpus:** \~6.2GB
* Preprocessing: tokenization, normalization, deduplication

---

## 🗜️ License

This model is released under **CC BY-NC 4.0** (non-commercial use).
🔗 For commercial usage, please contact: **[support@remeinium.com](mailto:support@remeinium.com)**

---

## ⚠️ Limitations

* Vocabulary coverage limited to training dataset.
* May reflect cultural/linguistic biases from sources.
* Optimized for Sinhala; not multilingual (future versions will expand).

---

## 🤝 Collaboration

You are welcome to:

* Use this model for **research & personal projects**
* Share improvements, benchmarks, or downstream applications

Contact :
📧 [support@remeinium.com](mailto:support@remeinium.com)