You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

UgannA Siyabasa β€” FastText Sinhala Embedding Model πŸ‡±πŸ‡°

Note : This is a demo version of the model and we will drop the final Model soon.

UgannA Siyabasa (ΰΆ‹ΰΆœΰΆ±ΰ·ŠΰΆ±ΰ· ΰ·ƒΰ·’ΰΆΊΰΆΆΰ·ƒ) is the first public FastText embedding model released by Remeinium Corp. The name comes from Kumaratunga Munidasa’s timeless quote:

β€œΰΆ‹ΰΆœΰΆ±ΰ·ŠΰΆ±ΰ· ΰ·ƒΰ·’ΰΆΊΰΆΆΰ·ƒ – ࢸࢭ් ΰ·€ΰΆ±ΰ·ŠΰΆ±ΰ· ΰΆ‘ΰ·„ΰ·’ ΰΆ»ΰ·ƒΰΆΊΰ·™ΰΆ±ΰ·Šβ€ Learn Sinhala – be intoxicated with its beauty.

Just as Munidasa envisioned nurturing the Sinhala language, this model represents teaching it to machines.


πŸ“Œ Key Features

  • Type: FastText (official library)
  • Vector size: 100 dimensions
  • File size: ~1.56GB
  • Training data: 6.2GB processed Sinhala text

πŸ”§ Usage

import fasttext

# Load the model
model = fasttext.load_model("Remeinium/UgannA_Siyabasa/UgannA_Siyabasa.bin")

# Get vector for a word
vector = model.get_word_vector("ΰΆ…ΰΆΈΰ·ŠΰΆΈΰ·")

# Get nearest neighbors
neighbors = model.get_nearest_neighbors("ΰΆ…ΰΆΈΰ·ŠΰΆΈΰ·", k=10)
print(neighbors)

πŸ“‚ Training Data

  • Processed and cleaned training corpus: ~6.2GB
  • Preprocessing: tokenization, normalization, deduplication

πŸ—œοΈ License

This model is released under CC BY-NC 4.0 (non-commercial use). πŸ”— For commercial usage, please contact: [email protected]


⚠️ Limitations

  • Vocabulary coverage limited to training dataset.
  • May reflect cultural/linguistic biases from sources.
  • Optimized for Sinhala; not multilingual (future versions will expand).

🀝 Collaboration

You are welcome to:

  • Use this model for research & personal projects
  • Share improvements, benchmarks, or downstream applications

Contact : πŸ“§ [email protected]

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 1 Ask for provider support