Hinglish TTS 3B Model

This is a fine-tuned version of canopylabs/3b-hi-pretrain-research_release specialized for Hinglish (Hindi-English mixed) text-to-speech generation.

Model Details

Base Model: canopylabs/3b-hi-ft-research_release
Fine-tuning Method: LoRA with Unsloth (merged)
Languages: Hindi, English, Hinglish
Task: Text-to-Speech via audio token generation
Model Size: ~3B parameters

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "Aaryan39/hinglish-tts-3b-ft-synthetic"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Generate text
prompt = "Hello doston, main aapka dost hun"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=1200)

Fine-tuning Details

LoRA Rank: 64
LoRA Alpha: 64
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training Framework: Unsloth

Audio Generation

This model generates audio tokens that need to be decoded using a SNAC (Scalable Neural Audio Codec) model:

from snac import SNAC

# Load SNAC decoder
snac_model = SNAC.from_pretrained("hubertsiuzdak/snac_24khz")

# Process generated tokens to audio codes and decode
# (See full implementation in the original training code)

Limitations

Requires SNAC model for audio generation
Optimized for Hinglish content
May not perform well on pure English or pure Hindi in some cases

Citation

If you use this model, please cite the original base model:

@misc{canopylabs-3b-hi,
  title={3B Hindi Pretrained Model},
  author={Canopy Labs},
  year={2024},
  url={https://huggingface.co/canopylabs/3b-hi-pretrain-research_release}
}

Aaryan39
/

hinglish-tts-3b-ft-synthetic

You need to agree to share your contact information to access this model