Mixture-of-Experts Hindi Language Model (Base-1)

This is a Hindi language model that uses a Mixture-of-Experts (MoE) architecture. The model was trained on a Hindi text corpus.

Model Details

Model Type: Causal Language Model with Mixture-of-Experts
Language: Hindi
License: MIT
Training Data: Hindi corpus
Model Size: ~1.8GB
Training Steps: 7000
Format: SafeTensors

Usage

This model uses a custom implementation and is not directly compatible with Hugging Face's AutoModelForCausalLM and AutoTokenizer. The simplest way to use it is with the provided generate.py script:

# Install safetensors if you don't have it
pip install safetensors

# Run the script with your Hindi text prompt
python generate.py "आज का दिन बहुत अच्छा है"

For more advanced usage, you can integrate the model into your own code:

# Import the required modules
import os
from hindi_embeddings import SentencePieceTokenizerWrapper
from convaicausallm_model_with_moe_rope import ConvaiCausalLMConfig, ConvaiCausalLM
import torch
import json
from safetensors.torch import load_file

# Load the config
with open("config.json", "r") as f:
    config_dict = json.load(f)
    
config = ConvaiCausalLMConfig(**config_dict)

# Load the tokenizer
tokenizer = SentencePieceTokenizerWrapper("tokenizer.model")

# Load the model (supports both SafeTensors and PyTorch formats)
model = ConvaiCausalLM(config)

# Check which format is available and load accordingly
if os.path.exists("model.safetensors"):
    model.load_state_dict(load_file("model.safetensors"))
else:
    model.load_state_dict(torch.load("pytorch_model.bin", map_location="cpu"))

model.eval()

# Generate text
prompt = "आज का दिन बहुत अच्छा है"
input_ids = tokenizer.sp_model.EncodeAsIds(prompt)
input_tensor = torch.tensor([input_ids], dtype=torch.long)

with torch.no_grad():
    output_ids = model.generate(
        input_ids=input_tensor,
        max_length=50,
        temperature=0.7,
        top_k=10,
        top_p=0.9,
        do_sample=True
    )

# Print the generated text
generated_text = tokenizer.sp_model.DecodeIds(output_ids[0].tolist())
print(generated_text)

Generation Parameters

For best results with this model, we recommend:

temperature: 0.2-0.7 (lower is more conservative)
top_k: 5-10 (lower gives more predictable outputs)
top_p: 0.85-0.92
max_length: 50-100 tokens
repetition_penalty: 1.1-1.2 (helps prevent repetition)

Model Architecture

The model features:

Mixture-of-Experts (MoE) layers
RoPE (Rotary Position Embeddings)
Grouped Query Attention (GQA)

Limitations

This is an early experimental version of the model with limited training steps. It may not generate fluent or coherent Hindi text yet. The model hasn't been fully trained (only 7000 steps), so expect the quality to be limited.

Files Included

model.safetensors: The model weights in SafeTensors format
config.json: Model configuration
tokenizer.model: SentencePiece tokenizer model
hindi_embeddings.py: Tokenizer implementation
convaicausallm_model_with_moe_rope.py: Model implementation
generate.py: Example inference script

Training

Training was performed using a custom PyTorch implementation with mixed precision and data parallelism.

Citation

If you use this model, please cite:

@misc{convaiinnovations2025hindi,
  author = {ConvAI Innovations},
  title = {Mixture-of-Experts Hindi Language Model},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {https://huggingface.co/convaiinnovations/moe-hindi-fm-base-1}
}