---
license: apache-2.0
language:
- en
- hi
tags:
- multilingual
- instruction-tuning
- phi4
- efficiency
- hindi
datasets:
- 1024m/PHI-4-Hindi-Instruct-Data
model-index:
- name: Mantra-14B
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU Pro (5-Shot)
      type: mmlu_pro
      config: MMLU Pro
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 52.39
      name: accuracy
    source:
      url: >-
        https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/1024m/PHI-4-Hindi/results_2025-02-06T05-43-08.878637.json
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GPQA (0-Shot)
      type: gpqa
      config: GPQA
      split: test
      args:
        num_few_shot: 0
    metrics:
    - type: acc
      value: 39.77
      name: accuracy (normalized)
    source:
      url: >-
        https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/1024m/PHI-4-Hindi/results_2025-02-06T05-43-08.878637.json
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MuSR (0-Shot)
      type: musr
      config: MuSR
      split: test
      args:
        num_few_shot: 0
    metrics:
    - type: acc
      value: 49.07
      name: accuracy (normalized)
    source:
      url: >-
        https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/1024m/PHI-4-Hindi/results_2025-02-06T05-43-08.878637.json
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Big Bench Hard (3-Shot)
      type: bbh
      config: Big Bench Hard
      split: test
      args:
        num_few_shot: 3
    metrics:
    - type: acc
      value: 66.97
      name: accuracy (normalized)
    source:
      url: >-
        https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/1024m/PHI-4-Hindi/results_2025-02-06T05-43-08.878637.json
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Math HARD (4-Shot)
      type: math_hard
      config: Math Hard
      split: test
      args:
        num_few_shot: 4
    metrics:
    - type: acc
      value: 23.11
      name: accuracy (exact match)
    source:
      url: >-
        https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/1024m/PHI-4-Hindi/results_2025-02-06T05-43-08.878637.json
      name: Open LLM Leaderboard
---
# Mantra-14B

Mantra-14B is a 14.7B parameter instruction-tuned bilingual large language model for both Hindi and English, 
trained on a mixed language dataset.

- ~0.7 % better performance on English Tasks compared to the original (average benchmark scores)
- ~2.8 % better performance on Hindi Tasks compared to the original (average benchmark scores)
- ~4.4 % better performance on tougher english benchmarks (open-llm-leaderboard evals) 
- ~8.5 % less emissions than the original (as reported on benchmark evaluations like open-llm-leaderboard)
- Less Biases due to ordering of choices while answering MCQs

### Model Details:

- **Developed by:** [Traversaal.ai](https://huggingface.co/large-traversaal), [1-800-LLMs](https://huggingface.co/1-800-LLMs)
- **Language(s) (NLP):** Optimized for Hindi and English
- **License:** Apache 2.0
- **Paper :** TBA April 15

![image/png](https://cdn-uploads.huggingface.co/production/uploads/645c60dd7d655680b57ddbff/WW4s1lCUvhC6G-4v9-No2.png)

### Prompt Formats

| Task                           | Input Format                                            |
|--------------------------------|---------------------------------------------------------|
| Natural Language Inference     | "`Text1 ### Text2 ### NLI ###`"                           |
| Multiple Choice Questions      | "`Question ### A) a, B) b,... ### MCQ ###`"               |
| Numeric Questions              | "`Question ### NUMERIC ###`"                              |
| Boolean Questions              | "`Question ### BOOLEAN ###`"                              |
| Questions seeking Long responses | "`Question ### LONG RESPONSE ###`"                      |
| Short responses (few words)    | "`Input ### DIRECT RESPONSE ###`"                         |
| Coding                         | "`Input ### CODE ###`"                                  |
| Text Summarization             | "`Input ### SUMMARIZE ###`"                               |
| Paraphrasing/Rephrasing        | "`Input ### PARAPHRASE ###`"                              |
| Translation to specified language | "`Input ### TRANSLATION [lang] ###`"                   |
| Text Simplification/ELI5       | "`Input ### SIMPLIFY ###`"                                |

The following prompt formats were used during training and are better suited for usage, however the model works well even without such formatting


## Evaluation:
We evaluated our models on multiple well-known benchmarks to measure their effectiveness against other leading models, and the results are as follows:

| Model                           | ARC-C | ARC-E | BoolQ | CMCQ  | MMLU  | Average* | MMLU-Pro | GPQA | MuSR  | BBH   | MATH-Hard  |
|---------------------------------|-------|-------|-------|-------|-------|----------|----------|------|-------|-------|-------|
| AryaBhatta-GemmaUltra-8.5B      | 22.70 | 25.04 | 22.95 | 62.23 | 23.70 | 31.32    | 22.66    | 25.34| 42.72 | 41.12 | 2.95  |
| Airavata-7B                     | 25.09 | 30.47 | 25.31 | 62.17 | 33.20 | 35.25    | 16.35    | 27.43| 37.57 | 36.00 | 13.60 |
| sarvam-1-2B                     | 30.03 | 33.25 | 62.17 | 42.80 | 27.90 | 39.23    | -        | -    | -     | -     | -     |
| Nemotron-4-Mini-Hindi-Instruct  | 55.80 | 71.63 | 62.11 | 68.10 | 43.20 | 60.17    | 25.95    | 30.87| 41.53 | 40.11 | 2.04  |
| Llama-3-Nanda-10B-Chat          | 65.36 | 80.64 | 82.29 | 67.60 | 50.61 | 69.30    | 31.57    | 30.12| 43.52 | 49.38 | 5.59  |
| Krutrim-2-12b-instruct          | 67.32 | 81.10 | 84.74 | 76.30 | 56.10 | 73.11    | -        | -    | -     | -     | -     |
| aya-expanse-8b                  | 74.06 | 87.08 | 86.45 | 83.30 | 56.89 | 77.56    | 30.04    | 30.29| 37.17 | 49.42 | 7.02  |
| aya-expanse-32B                 | 85.41 | **95.08** | **90.43** | **89.80** | 69.71 | 86.08    | 41.30    | 32.55| 38.62 | 56.29 | 13.37 |
| **Mantra-14B**                  | **97.39** | 92.24 | 87.65 | 87.40 | **75.59** | **88.05** | **52.39** | **39.77** | **49.07** | **66.97** | **23.11** |

**Table 1: Metrics (.2f) of our models and other LLMs over several English benchmarks**

| Model                              | ARC-C | ARC-E | BoolQ | CMCQ  | MMLU  | Average |
|------------------------------------|-------|-------|-------|-------|-------|---------|
| AryaBhatta-GemmaUltra-8.5B         | 22.70 | 25.08 | 22.95 | 62.17 | 23.80 | 31.34   |
| Airavata-7B                        | 22.87 | 25.13 | 23.28 | 62.17 | 33.20 | 33.33   |
| sarvam-1-2B                        | 32.76 | 35.06 | 62.16 | 47.10 | 24.22 | 40.26   |
| Llama-3-Nanda-10B-Chat             | 45.99 | 60.56 | 71.96 | 54.70 | 36.35 | 53.91   |
| Nemotron-4-Mini-Hindi-4B-Instruct  | 50.68 | 63.72 | 68.74 | 51.30 | 37.18 | 54.32   |
| Krutrim-2-12b-instruct             | 56.83 | 70.66 | 78.86 | 64.10 | 46.51 | 63.39   |
| aya-expanse-8b                     | 57.42 | 72.90 | 80.42 | 69.00 | 43.39 | 64.63   |
| aya-expanse-32B                    | 73.29 | 85.48 | **87.73** | **79.70** | **56.96** | 76.63   |
| **Mantra-14B**                     | **81.74** | **89.06** | 86.02 | 78.70 | 56.39 | **78.38** |

**Table 2: Metrics (.2f) of our models and other LLMs over several Hindi benchmarks**