Dheegpt-Qwen3-Malayalam

Model Description

Dheegpt-Qwen3-Malayalam is a large language model designed for high-quality natural language understanding and generation in Malayalam. It is based on the Qwen3 architecture and optimized for both dialogue and reasoning tasks.

The model supports fluent conversational responses and reasoning-style outputs, making it suitable for applications like chatbots, virtual assistants, and step-by-step question answering.

Key Features

Fluent and context-aware Malayalam text generation
Optimized for assistant-style conversations
Handles summarization, question answering, and open-ended text generation
Fully compatible with Hugging Face transformers
Supports integration with VLLM for high-performance batched inference

Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "dheeyantra/dheegpt-qwen3-malayalam"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

# Regular conversation
prompt = "നമസ്കാരം! ഇന്നത്തെ കാലാവസ്ഥ എങ്ങനെയാണ്?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Uses & Limitations

Intended Uses

Chatbots and conversational agents in Malayalam
Story generation, narratives, and creative text
Domain-specific natural language generation tasks in Malayalam

Limitations

Model outputs are based on the patterns and information in training data; it may generate incorrect or biased information.
Performance may vary depending on input complexity.
Primarily designed for Malayalam; code-mixed or other languages may yield lower quality outputs.
May produce irrelevant or hallucinated content occasionally.

VLLM / High-Performance Serving Requirements

To serve this model using vLLM, ensure the following:

GPU with compute capability ≥ 8.0 (e.g., NVIDIA A100).
PyTorch 2.1+ with CUDA toolkit installed.
For Tesla V100 (sm70), vLLM GPU inference is not supported; CPU-only fallback is possible but slow.

Python dependencies:

pip install torch transformers vllm sentencepiece

Example vLLM command:

vllm serve \
  --model dheeyantra/dheegpt-qwen3-malayalam \
  --host 0.0.0.0 \
  --port 8000

License

Released under the Apache 2.0 License.

Downloads last month: 5

Safetensors

Model size

2.03B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including dheeyantra/dheegpt-qwen3-malayalam

DheeGPT-Qwen3

Collection

DheeGPT-Qwen3 – 2B multilingual AI models by DheeYantra for natural conversations in 8 Indian languages. • 16 items • Updated 3 days ago