Update README.md
Browse files
README.md
CHANGED
|
@@ -1,30 +1,31 @@
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
| 3 |
language:
|
| 4 |
-
- pt
|
| 5 |
license: cc-by-4.0
|
| 6 |
tags:
|
| 7 |
-
- text-generation
|
| 8 |
-
- pytorch
|
| 9 |
-
- LLM
|
| 10 |
-
- Portuguese
|
| 11 |
-
- mamba
|
| 12 |
datasets:
|
| 13 |
-
- nicholasKluge/Pt-Corpus-Instruct
|
|
|
|
| 14 |
inference:
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
widget:
|
| 22 |
-
- text: "O Natal é uma"
|
| 23 |
-
|
| 24 |
-
- text: "A muitos anos atrás, em uma galáxia muito distante, vivia uma raça de"
|
| 25 |
-
|
| 26 |
-
- text: "Em meio a um escândalo, a frente parlamentar pediu ao Senador Silva para"
|
| 27 |
-
|
| 28 |
pipeline_tag: text-generation
|
| 29 |
---
|
| 30 |
|
|
@@ -36,40 +37,54 @@ pipeline_tag: text-generation
|
|
| 36 |
|
| 37 |
</br>
|
| 38 |
|
| 39 |
-
##
|
| 40 |
|
| 41 |
-
**Mambarim-110M** is
|
| 42 |
|
| 43 |
-
|
| 44 |
|
| 45 |
## Details
|
| 46 |
|
| 47 |
- **Architecture:** a Mamba model pre-trained via causal language modeling
|
| 48 |
- **Size:** 119,930,880 parameters
|
| 49 |
- **Context length:** 2048 tokens
|
| 50 |
-
- **Dataset:** [Pt-Corpus
|
| 51 |
- **Language:** Portuguese
|
| 52 |
- **Number of steps:** 758,423
|
| 53 |
|
| 54 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
## Intended Uses
|
| 57 |
|
| 58 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 59 |
|
| 60 |
## Out-of-scope Use
|
| 61 |
|
| 62 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
|
| 64 |
## Basic usage
|
| 65 |
|
| 66 |
-
You need to install `transformers` from `main` until `transformers
|
| 67 |
|
| 68 |
```bash
|
| 69 |
pip install git+https://github.com/huggingface/transformers@main
|
| 70 |
```
|
| 71 |
|
| 72 |
-
We also recommend you to install both `causal_conv_1d` and `mamba-ssm` using:
|
| 73 |
|
| 74 |
```bash
|
| 75 |
pip install causal-conv1d>=1.2.0
|
|
@@ -103,13 +118,11 @@ Evaluations on Brazilian Portuguese benchmarks were performed using a [Portugues
|
|
| 103 |
|
| 104 |
Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/dominguesm/mambarim-110m)
|
| 105 |
|
| 106 |
-
| Model
|
| 107 |
-
|
|
| 108 |
-
| [TeenyTinyLlama-460m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m)
|
| 109 |
-
| [TeenyTinyLlama-160m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-160m)
|
| 110 |
-
| [MulaBR/Mula-4x160-v0.1](https://huggingface.co/MulaBR/Mula-4x160-v0.1)
|
| 111 |
-
| [TeenyTinyLlama-460m-Chat](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m-Chat) | 25.49 | 20.29 | 25.45 | 26.74 | 43.77 | 4.52 | 34 | 33.49 | 22.99 | 18.13 | LlamaForCausalLM
|
| 112 |
-
| [**
|
| 113 |
-
| [GloriaTA-3B](https://huggingface.co/NOVA-vision-language/GlorIA-1.3B)
|
| 114 |
-
|
| 115 |
-
|
|
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
| 3 |
language:
|
| 4 |
+
- pt
|
| 5 |
license: cc-by-4.0
|
| 6 |
tags:
|
| 7 |
+
- text-generation
|
| 8 |
+
- pytorch
|
| 9 |
+
- LLM
|
| 10 |
+
- Portuguese
|
| 11 |
+
- mamba
|
| 12 |
datasets:
|
| 13 |
+
- nicholasKluge/Pt-Corpus-Instruct-tokenized-large
|
| 14 |
+
track_downloads: true
|
| 15 |
inference:
|
| 16 |
+
parameters:
|
| 17 |
+
repetition_penalty: 1.2
|
| 18 |
+
temperature: 0.8
|
| 19 |
+
top_k: 50
|
| 20 |
+
top_p: 0.85
|
| 21 |
+
max_new_tokens: 150
|
| 22 |
widget:
|
| 23 |
+
- text: "O Natal é uma"
|
| 24 |
+
example_title: Exemplo
|
| 25 |
+
- text: "A muitos anos atrás, em uma galáxia muito distante, vivia uma raça de"
|
| 26 |
+
example_title: Exemplo
|
| 27 |
+
- text: "Em meio a um escândalo, a frente parlamentar pediu ao Senador Silva para"
|
| 28 |
+
example_title: Exemplo
|
| 29 |
pipeline_tag: text-generation
|
| 30 |
---
|
| 31 |
|
|
|
|
| 37 |
|
| 38 |
</br>
|
| 39 |
|
| 40 |
+
## Model Summary
|
| 41 |
|
| 42 |
+
**Mambarim-110M** is a pioneering 110-million-parameter language model for Portuguese, built upon the **Mamba architecture**. Unlike traditional Transformer models that rely on quadratic self-attention, Mamba is a **State-Space Model (SSM)** that processes sequences with linear complexity.
|
| 43 |
|
| 44 |
+
This design choice leads to significantly faster inference and reduced memory consumption, especially for long sequences. Mamba employs a selection mechanism that allows it to effectively focus on relevant information in the context, making it a powerful and efficient alternative to Transformers. Mambarim-110M is one of the first Mamba-based models developed specifically for the Portuguese language.
|
| 45 |
|
| 46 |
## Details
|
| 47 |
|
| 48 |
- **Architecture:** a Mamba model pre-trained via causal language modeling
|
| 49 |
- **Size:** 119,930,880 parameters
|
| 50 |
- **Context length:** 2048 tokens
|
| 51 |
+
- **Dataset:** [Pt-Corpus-Instruct-tokenized-large](https://huggingface.co/datasets/nicholasKluge/Pt-Corpus-Instruct-tokenized-large) (6.2B tokens)
|
| 52 |
- **Language:** Portuguese
|
| 53 |
- **Number of steps:** 758,423
|
| 54 |
|
| 55 |
+
### Training & Reproducibility
|
| 56 |
+
|
| 57 |
+
This model was trained to be fully open and reproducible. You can find all the resources used below:
|
| 58 |
+
|
| 59 |
+
- **Source Code:** [GitHub Repository](https://github.com/DominguesM/mambarim-110M/)
|
| 60 |
+
- **Training Notebook:** [Open in Colab](https://githubtocolab.com/DominguesM/mambarim-110M/blob/main/MAMBARIM_110M.ipynb)
|
| 61 |
+
- **Training Metrics:** [View on Weights & Biases](https://wandb.ai/dominguesm/canarim-mamba-110m?nw=nwuserdominguesm)
|
| 62 |
|
| 63 |
## Intended Uses
|
| 64 |
|
| 65 |
+
This model is intended for a variety of text generation tasks in Portuguese. Given its size, it is particularly well-suited for environments with limited computational resources.
|
| 66 |
+
|
| 67 |
+
- **General-Purpose Text Generation:** The model can be used for creative writing, continuing a story, or generating text based on a prompt.
|
| 68 |
+
- **Research and Education:** As one of the first Portuguese Mamba models, it serves as an excellent resource for researchers studying State-Space Models, computational efficiency in LLMs, and NLP for non-English languages. Its smaller size also makes it an accessible tool for educational purposes.
|
| 69 |
+
- **Fine-tuning Base:** The model can be fine-tuned on specific datasets to create more specialized models for tasks like simple chatbots, content creation aids, or domain-specific text generation.
|
| 70 |
|
| 71 |
## Out-of-scope Use
|
| 72 |
|
| 73 |
+
The model is not intended for use in critical applications without comprehensive testing and fine-tuning. Users should be aware of the following limitations:
|
| 74 |
+
|
| 75 |
+
- **Factual Accuracy:** This model is not a knowledge base and can generate incorrect or fabricated information ("hallucinate"). It should not be used as a source of truth.
|
| 76 |
+
- **High-Stakes Decisions:** Do not use this model for making important decisions in domains such as medical, legal, or financial advice, as its outputs may be unreliable.
|
| 77 |
+
- **Bias and Safety:** The model was trained on a large corpus of public data from the internet and may reflect societal biases present in that data. It can generate content that is biased, offensive, or otherwise harmful.
|
| 78 |
|
| 79 |
## Basic usage
|
| 80 |
|
| 81 |
+
You need to install `transformers` from `main` until `transformers>=4.39.0` is released.
|
| 82 |
|
| 83 |
```bash
|
| 84 |
pip install git+https://github.com/huggingface/transformers@main
|
| 85 |
```
|
| 86 |
|
| 87 |
+
We also recommend you to install both `causal_conv_1d` and `mamba-ssm` using:
|
| 88 |
|
| 89 |
```bash
|
| 90 |
pip install causal-conv1d>=1.2.0
|
|
|
|
| 118 |
|
| 119 |
Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/dominguesm/mambarim-110m)
|
| 120 |
|
| 121 |
+
| Model | **Average** | ENEM | BLUEX | OAB Exams | ASSIN2 RTE | ASSIN2 STS | FAQNAD NLI | HateBR | PT Hate Speech | tweetSentBR | **Architecture** |
|
| 122 |
+
| ----------------------------------------------------------------------------------------- | ----------- | ----- | ----- | --------- | ---------- | ---------- | ---------- | ------ | -------------- | ----------- | -------------------- |
|
| 123 |
+
| [TeenyTinyLlama-460m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m) | 28.86 | 20.15 | 25.73 | 27.02 | 53.61 | 13 | 46.41 | 33.59 | 22.99 | 17.28 | LlamaForCausalLM |
|
| 124 |
+
| [TeenyTinyLlama-160m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-160m) | 28.2 | 19.24 | 23.09 | 22.37 | 53.97 | 0.24 | 43.97 | 36.92 | 42.63 | 11.39 | LlamaForCausalLM |
|
| 125 |
+
| [MulaBR/Mula-4x160-v0.1](https://huggingface.co/MulaBR/Mula-4x160-v0.1) | 26.24 | 21.34 | 25.17 | 25.06 | 33.57 | 11.35 | 43.97 | 41.5 | 22.99 | 11.24 | MixtralForCausalLM |
|
| 126 |
+
| [TeenyTinyLlama-460m-Chat](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m-Chat) | 25.49 | 20.29 | 25.45 | 26.74 | 43.77 | 4.52 | 34 | 33.49 | 22.99 | 18.13 | LlamaForCausalLM |
|
| 127 |
+
| [**Mambarim-110M**](https://huggingface.co/dominguesm/mambarim-110m) | **14.16** | 18.4 | 10.57 | 21.87 | 16.09 | 1.89 | 9.29 | 15.75 | 17.77 | 15.79 | **MambaForCausalLM** |
|
| 128 |
+
| [GloriaTA-3B](https://huggingface.co/NOVA-vision-language/GlorIA-1.3B) | 4.09 | 1.89 | 3.2 | 5.19 | 0 | 2.32 | 0.26 | 0.28 | 23.52 | 0.19 | GPTNeoForCausalLM |
|
|
|
|
|
|