---
license: apache-2.0
datasets:
  - tasal9/Pashto_Dataset
language:
  - ps
  - en
library_name: sentence-transformers
tags:
  - multilingual
  - embeddings
  - semantic-search
  - pashto
  - chromadb
  - llamaindex
pipeline_tag: feature-extraction
model-index:
  - name: Multilingula-ZamAI-Embeddings
    results: []
---
# ZamAI Multilingual Embeddings

This directory contains tools and utilities for working with multilingual embedding models, with a focus on Pashto language support. The embeddings enable semantic search, document retrieval, and other natural language processing tasks across multiple languages.

## Model Information

- **Base Model**: [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2)
- **Languages Supported**: 50+ including Pashto, English, Arabic, Urdu, Farsi, and more
- **Vector Database**: ChromaDB
- **Integration Framework**: LlamaIndex

## Directory Structure

embeddings/  
├── setup.py           # Setup script for the embeddings model and vector store  
├── demo.py            # Demo application with Gradio web UI  
├── indexer.py         # Utility for indexing new documents  
├── requirements.txt   # Dependencies for the embeddings components  
└── chroma_db/         # Directory for the vector database (created on first run)

Getting Started

1. Install the dependencies:

pip install -r models/embeddings/requirements.txt


2. Add documents to index:

# Place your text files in the data/text_corpus directory  
python models/embeddings/indexer.py --corpus data/text_corpus/


3. Run the demo application:

python models/embeddings/demo.py


Using the Embeddings in Your Code

from models.embeddings.setup import setup_embedding_model  
  
# Initialize the model and related components  
embedding_components = setup_embedding_model()  
  
# Get the query engine  
query_engine = embedding_components["query_engine"]  
  
# Query in any language  
result = query_engine.query("What is the capital of Afghanistan?")  
# Or in Pashto  
result = query_engine.query("د افغانستان پلازمېنه څه ده؟")  
  
print(result)