tasal9's picture
Update README.md
955f0af verified
|
raw
history blame
2.21 kB
---
license: apache-2.0
datasets:
- tasal9/Pashto_Dataset
language:
- ps
- en
library_name: sentence-transformers
tags:
- multilingual
- embeddings
- semantic-search
- pashto
- chromadb
- llamaindex
pipeline_tag: feature-extraction
model-index:
- name: Multilingula-ZamAI-Embeddings
results: []
---
# ZamAI Multilingual Embeddings
This directory contains tools and utilities for working with multilingual embedding models, with a focus on Pashto language support. The embeddings enable semantic search, document retrieval, and other natural language processing tasks across multiple languages.
## Model Information
- **Base Model**: [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2)
- **Languages Supported**: 50+ including Pashto, English, Arabic, Urdu, Farsi, and more
- **Vector Database**: ChromaDB
- **Integration Framework**: LlamaIndex
## Directory Structure
embeddings/
├── setup.py # Setup script for the embeddings model and vector store
├── demo.py # Demo application with Gradio web UI
├── indexer.py # Utility for indexing new documents
├── requirements.txt # Dependencies for the embeddings components
└── chroma_db/ # Directory for the vector database (created on first run)
Getting Started
1. Install the dependencies:
pip install -r models/embeddings/requirements.txt
2. Add documents to index:
# Place your text files in the data/text_corpus directory
python models/embeddings/indexer.py --corpus data/text_corpus/
3. Run the demo application:
python models/embeddings/demo.py
Using the Embeddings in Your Code
from models.embeddings.setup import setup_embedding_model
# Initialize the model and related components
embedding_components = setup_embedding_model()
# Get the query engine
query_engine = embedding_components["query_engine"]
# Query in any language
result = query_engine.query("What is the capital of Afghanistan?")
# Or in Pashto
result = query_engine.query("د افغانستان پلازمېنه څه ده؟")
print(result)