Dheegpt-Qwen3-Malayalam

Model Description

Dheegpt-Qwen3-Malayalam is a large language model designed for high-quality natural language understanding and generation in Malayalam. It is based on the Qwen3 architecture and optimized for both dialogue and reasoning tasks.

The model supports fluent conversational responses and reasoning-style outputs, making it suitable for applications like chatbots, virtual assistants, and step-by-step question answering.


Key Features

  • Fluent and context-aware Malayalam text generation
  • Optimized for assistant-style conversations
  • Handles summarization, question answering, and open-ended text generation
  • Fully compatible with Hugging Face transformers
  • Supports integration with VLLM for high-performance batched inference

Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "dheeyantra/dheegpt-qwen3-malayalam"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

# Regular conversation
prompt = "നമസ്കാരം! ഇന്നത്തെ കാലാവസ്ഥ എങ്ങനെയാണ്?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Uses & Limitations

Intended Uses

  • Chatbots and conversational agents in Malayalam
  • Story generation, narratives, and creative text
  • Domain-specific natural language generation tasks in Malayalam

Limitations

  • Model outputs are based on the patterns and information in training data; it may generate incorrect or biased information.
  • Performance may vary depending on input complexity.
  • Primarily designed for Malayalam; code-mixed or other languages may yield lower quality outputs.
  • May produce irrelevant or hallucinated content occasionally.

VLLM / High-Performance Serving Requirements

To serve this model using vLLM, ensure the following:

  • GPU with compute capability ≥ 8.0 (e.g., NVIDIA A100).

  • PyTorch 2.1+ with CUDA toolkit installed.

  • For Tesla V100 (sm70), vLLM GPU inference is not supported; CPU-only fallback is possible but slow.

  • Python dependencies:

    pip install torch transformers vllm sentencepiece
    

Example vLLM command:

vllm serve \
  --model dheeyantra/dheegpt-qwen3-malayalam \
  --host 0.0.0.0 \
  --port 8000

License

Released under the Apache 2.0 License.

Downloads last month
5
Safetensors
Model size
2.03B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including dheeyantra/dheegpt-qwen3-malayalam