--- license: apache-2.0 language: - en - hi tags: - multilingual - instruction-tuning - phi4 - efficiency - hindi datasets: - 1024m/PHI-4-Hindi-Instruct-Data model-index: - name: Mantra-14B results: - task: type: text-generation name: Text Generation dataset: name: MMLU Pro (5-Shot) type: mmlu_pro config: MMLU Pro split: test args: num_few_shot: 5 metrics: - type: acc value: 52.39 name: accuracy source: url: >- https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/1024m/PHI-4-Hindi/results_2025-02-06T05-43-08.878637.json name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GPQA (0-Shot) type: gpqa config: GPQA split: test args: num_few_shot: 0 metrics: - type: acc value: 39.77 name: accuracy (normalized) source: url: >- https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/1024m/PHI-4-Hindi/results_2025-02-06T05-43-08.878637.json name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MuSR (0-Shot) type: musr config: MuSR split: test args: num_few_shot: 0 metrics: - type: acc value: 49.07 name: accuracy (normalized) source: url: >- https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/1024m/PHI-4-Hindi/results_2025-02-06T05-43-08.878637.json name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Big Bench Hard (3-Shot) type: bbh config: Big Bench Hard split: test args: num_few_shot: 3 metrics: - type: acc value: 66.97 name: accuracy (normalized) source: url: >- https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/1024m/PHI-4-Hindi/results_2025-02-06T05-43-08.878637.json name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Math HARD (4-Shot) type: math_hard config: Math Hard split: test args: num_few_shot: 4 metrics: - type: acc value: 23.11 name: accuracy (exact match) source: url: >- https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/1024m/PHI-4-Hindi/results_2025-02-06T05-43-08.878637.json name: Open LLM Leaderboard --- # Mantra-14B Mantra-14B is a 14.7B parameter instruction-tuned bilingual large language model for both Hindi and English, trained on a mixed language dataset. - ~0.7 % better performance on English Tasks compared to the original (average benchmark scores) - ~2.8 % better performance on Hindi Tasks compared to the original (average benchmark scores) - ~4.4 % better performance on tougher english benchmarks (open-llm-leaderboard evals) - ~8.5 % less emissions than the original (as reported on benchmark evaluations like open-llm-leaderboard) - Less Biases due to ordering of choices while answering MCQs ### Model Details: - **Developed by:** [Traversaal.ai](https://huggingface.co/large-traversaal), [1-800-LLMs](https://huggingface.co/1-800-LLMs) - **Language(s) (NLP):** Optimized for Hindi and English - **License:** Apache 2.0 - **Paper :** TBA April 15 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/645c60dd7d655680b57ddbff/WW4s1lCUvhC6G-4v9-No2.png) ### Prompt Formats | Task | Input Format | |--------------------------------|---------------------------------------------------------| | Natural Language Inference | "`Text1 ### Text2 ### NLI ###`" | | Multiple Choice Questions | "`Question ### A) a, B) b,... ### MCQ ###`" | | Numeric Questions | "`Question ### NUMERIC ###`" | | Boolean Questions | "`Question ### BOOLEAN ###`" | | Questions seeking Long responses | "`Question ### LONG RESPONSE ###`" | | Short responses (few words) | "`Input ### DIRECT RESPONSE ###`" | | Coding | "`Input ### CODE ###`" | | Text Summarization | "`Input ### SUMMARIZE ###`" | | Paraphrasing/Rephrasing | "`Input ### PARAPHRASE ###`" | | Translation to specified language | "`Input ### TRANSLATION [lang] ###`" | | Text Simplification/ELI5 | "`Input ### SIMPLIFY ###`" | The following prompt formats were used during training and are better suited for usage, however the model works well even without such formatting ## Evaluation: We evaluated our models on multiple well-known benchmarks to measure their effectiveness against other leading models, and the results are as follows: | Model | ARC-C | ARC-E | BoolQ | CMCQ | MMLU | Average* | MMLU-Pro | GPQA | MuSR | BBH | MATH-Hard | |---------------------------------|-------|-------|-------|-------|-------|----------|----------|------|-------|-------|-------| | AryaBhatta-GemmaUltra-8.5B | 22.70 | 25.04 | 22.95 | 62.23 | 23.70 | 31.32 | 22.66 | 25.34| 42.72 | 41.12 | 2.95 | | Airavata-7B | 25.09 | 30.47 | 25.31 | 62.17 | 33.20 | 35.25 | 16.35 | 27.43| 37.57 | 36.00 | 13.60 | | sarvam-1-2B | 30.03 | 33.25 | 62.17 | 42.80 | 27.90 | 39.23 | - | - | - | - | - | | Nemotron-4-Mini-Hindi-Instruct | 55.80 | 71.63 | 62.11 | 68.10 | 43.20 | 60.17 | 25.95 | 30.87| 41.53 | 40.11 | 2.04 | | Llama-3-Nanda-10B-Chat | 65.36 | 80.64 | 82.29 | 67.60 | 50.61 | 69.30 | 31.57 | 30.12| 43.52 | 49.38 | 5.59 | | Krutrim-2-12b-instruct | 67.32 | 81.10 | 84.74 | 76.30 | 56.10 | 73.11 | - | - | - | - | - | | aya-expanse-8b | 74.06 | 87.08 | 86.45 | 83.30 | 56.89 | 77.56 | 30.04 | 30.29| 37.17 | 49.42 | 7.02 | | aya-expanse-32B | 85.41 | **95.08** | **90.43** | **89.80** | 69.71 | 86.08 | 41.30 | 32.55| 38.62 | 56.29 | 13.37 | | **Mantra-14B** | **97.39** | 92.24 | 87.65 | 87.40 | **75.59** | **88.05** | **52.39** | **39.77** | **49.07** | **66.97** | **23.11** | **Table 1: Metrics (.2f) of our models and other LLMs over several English benchmarks** | Model | ARC-C | ARC-E | BoolQ | CMCQ | MMLU | Average | |------------------------------------|-------|-------|-------|-------|-------|---------| | AryaBhatta-GemmaUltra-8.5B | 22.70 | 25.08 | 22.95 | 62.17 | 23.80 | 31.34 | | Airavata-7B | 22.87 | 25.13 | 23.28 | 62.17 | 33.20 | 33.33 | | sarvam-1-2B | 32.76 | 35.06 | 62.16 | 47.10 | 24.22 | 40.26 | | Llama-3-Nanda-10B-Chat | 45.99 | 60.56 | 71.96 | 54.70 | 36.35 | 53.91 | | Nemotron-4-Mini-Hindi-4B-Instruct | 50.68 | 63.72 | 68.74 | 51.30 | 37.18 | 54.32 | | Krutrim-2-12b-instruct | 56.83 | 70.66 | 78.86 | 64.10 | 46.51 | 63.39 | | aya-expanse-8b | 57.42 | 72.90 | 80.42 | 69.00 | 43.39 | 64.63 | | aya-expanse-32B | 73.29 | 85.48 | **87.73** | **79.70** | **56.96** | 76.63 | | **Mantra-14B** | **81.74** | **89.06** | 86.02 | 78.70 | 56.39 | **78.38** | **Table 2: Metrics (.2f) of our models and other LLMs over several Hindi benchmarks**