Multilingual-ZamAI-Embeddings / README.md

tasal9

Update README.md

955f0af verified 5 months ago

preview code

raw

history blame

2.21 kB

metadata

license: apache-2.0
datasets:
  - tasal9/Pashto_Dataset
language:
  - ps
  - en
library_name: sentence-transformers
tags:
  - multilingual
  - embeddings
  - semantic-search
  - pashto
  - chromadb
  - llamaindex
pipeline_tag: feature-extraction
model-index:
  - name: Multilingula-ZamAI-Embeddings
    results: []

ZamAI Multilingual Embeddings

This directory contains tools and utilities for working with multilingual embedding models, with a focus on Pashto language support. The embeddings enable semantic search, document retrieval, and other natural language processing tasks across multiple languages.

Model Information

Base Model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
Languages Supported: 50+ including Pashto, English, Arabic, Urdu, Farsi, and more
Vector Database: ChromaDB
Integration Framework: LlamaIndex

Directory Structure

embeddings/
├── setup.py # Setup script for the embeddings model and vector store
├── demo.py # Demo application with Gradio web UI
├── indexer.py # Utility for indexing new documents
├── requirements.txt # Dependencies for the embeddings components
└── chroma_db/ # Directory for the vector database (created on first run)

Getting Started

Install the dependencies:

pip install -r models/embeddings/requirements.txt

Add documents to index:

Place your text files in the data/text_corpus directory

python models/embeddings/indexer.py --corpus data/text_corpus/

Run the demo application:

python models/embeddings/demo.py

Using the Embeddings in Your Code

from models.embeddings.setup import setup_embedding_model

Initialize the model and related components

embedding_components = setup_embedding_model()

Get the query engine

query_engine = embedding_components["query_engine"]

Query in any language

result = query_engine.query("What is the capital of Afghanistan?")

Or in Pashto

result = query_engine.query("د افغانستان پلازمېنه څه ده؟")

print(result)