dominguesm commited on
Commit
8b177d7
·
verified ·
1 Parent(s): e66d873

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -38
README.md CHANGED
@@ -1,30 +1,31 @@
1
  ---
2
  library_name: transformers
3
  language:
4
- - pt
5
  license: cc-by-4.0
6
  tags:
7
- - text-generation
8
- - pytorch
9
- - LLM
10
- - Portuguese
11
- - mamba
12
  datasets:
13
- - nicholasKluge/Pt-Corpus-Instruct
 
14
  inference:
15
- parameters:
16
- repetition_penalty: 1.2
17
- temperature: 0.8
18
- top_k: 50
19
- top_p: 0.85
20
- max_new_tokens: 150
21
  widget:
22
- - text: "O Natal é uma"
23
- example_title: Exemplo
24
- - text: "A muitos anos atrás, em uma galáxia muito distante, vivia uma raça de"
25
- example_title: Exemplo
26
- - text: "Em meio a um escândalo, a frente parlamentar pediu ao Senador Silva para"
27
- example_title: Exemplo
28
  pipeline_tag: text-generation
29
  ---
30
 
@@ -36,40 +37,54 @@ pipeline_tag: text-generation
36
 
37
  </br>
38
 
39
- ## Model Summary
40
 
41
- **Mambarim-110M** is the first Portuguese language model based on a state-space model architecture (Mamba), not a transformer.
42
 
43
- WIP
44
 
45
  ## Details
46
 
47
  - **Architecture:** a Mamba model pre-trained via causal language modeling
48
  - **Size:** 119,930,880 parameters
49
  - **Context length:** 2048 tokens
50
- - **Dataset:** [Pt-Corpus Instruct](https://huggingface.co/datasets/nicholasKluge/Pt-Corpus-Instruct) (6.2B tokens)
51
  - **Language:** Portuguese
52
  - **Number of steps:** 758,423
53
 
54
- This repository has the [source code](https://github.com/DominguesM/mambarim-110M/) used to train this model.
 
 
 
 
 
 
55
 
56
  ## Intended Uses
57
 
58
- WIP
 
 
 
 
59
 
60
  ## Out-of-scope Use
61
 
62
- WIP
 
 
 
 
63
 
64
  ## Basic usage
65
 
66
- You need to install `transformers` from `main` until `transformers=4.39.0` is released.
67
 
68
  ```bash
69
  pip install git+https://github.com/huggingface/transformers@main
70
  ```
71
 
72
- We also recommend you to install both `causal_conv_1d` and `mamba-ssm` using:
73
 
74
  ```bash
75
  pip install causal-conv1d>=1.2.0
@@ -103,13 +118,11 @@ Evaluations on Brazilian Portuguese benchmarks were performed using a [Portugues
103
 
104
  Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/dominguesm/mambarim-110m)
105
 
106
- | Model | **Average** | ENEM | BLUEX | OAB Exams | ASSIN2 RTE | ASSIN2 STS | FAQNAD NLI | HateBR | PT Hate Speech | tweetSentBR | **Architecture** |
107
- | -------------------------------------- | ----------- | ----- | ----- | --------- | ---------- | ---------- | ---------- | ------ | -------------- | ----------- | ------------------ |
108
- | [TeenyTinyLlama-460m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m) | 28.86 | 20.15 | 25.73 | 27.02 | 53.61 | 13 | 46.41 | 33.59 | 22.99 | 17.28 | LlamaForCausalLM |
109
- | [TeenyTinyLlama-160m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-160m) | 28.2 | 19.24 | 23.09 | 22.37 | 53.97 | 0.24 | 43.97 | 36.92 | 42.63 | 11.39 | LlamaForCausalLM |
110
- | [MulaBR/Mula-4x160-v0.1](https://huggingface.co/MulaBR/Mula-4x160-v0.1) | 26.24 | 21.34 | 25.17 | 25.06 | 33.57 | 11.35 | 43.97 | 41.5 | 22.99 | 11.24 | MixtralForCausalLM |
111
- | [TeenyTinyLlama-460m-Chat](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m-Chat) | 25.49 | 20.29 | 25.45 | 26.74 | 43.77 | 4.52 | 34 | 33.49 | 22.99 | 18.13 | LlamaForCausalLM |
112
- | [**manbarim-110m**](https://huggingface.co/dominguesm/mambarim-110m) | **14.16** | 18.4 | 10.57 | 21.87 | 16.09 | 1.89 | 9.29 | 15.75 | 17.77 | 15.79 | **MambaForCausalLM** |
113
- | [GloriaTA-3B](https://huggingface.co/NOVA-vision-language/GlorIA-1.3B) | 4.09 | 1.89 | 3.2 | 5.19 | 0 | 2.32 | 0.26 | 0.28 | 23.52 | 0.19 | GPTNeoForCausalLM |
114
-
115
-
 
1
  ---
2
  library_name: transformers
3
  language:
4
+ - pt
5
  license: cc-by-4.0
6
  tags:
7
+ - text-generation
8
+ - pytorch
9
+ - LLM
10
+ - Portuguese
11
+ - mamba
12
  datasets:
13
+ - nicholasKluge/Pt-Corpus-Instruct-tokenized-large
14
+ track_downloads: true
15
  inference:
16
+ parameters:
17
+ repetition_penalty: 1.2
18
+ temperature: 0.8
19
+ top_k: 50
20
+ top_p: 0.85
21
+ max_new_tokens: 150
22
  widget:
23
+ - text: "O Natal é uma"
24
+ example_title: Exemplo
25
+ - text: "A muitos anos atrás, em uma galáxia muito distante, vivia uma raça de"
26
+ example_title: Exemplo
27
+ - text: "Em meio a um escândalo, a frente parlamentar pediu ao Senador Silva para"
28
+ example_title: Exemplo
29
  pipeline_tag: text-generation
30
  ---
31
 
 
37
 
38
  </br>
39
 
40
+ ## Model Summary
41
 
42
+ **Mambarim-110M** is a pioneering 110-million-parameter language model for Portuguese, built upon the **Mamba architecture**. Unlike traditional Transformer models that rely on quadratic self-attention, Mamba is a **State-Space Model (SSM)** that processes sequences with linear complexity.
43
 
44
+ This design choice leads to significantly faster inference and reduced memory consumption, especially for long sequences. Mamba employs a selection mechanism that allows it to effectively focus on relevant information in the context, making it a powerful and efficient alternative to Transformers. Mambarim-110M is one of the first Mamba-based models developed specifically for the Portuguese language.
45
 
46
  ## Details
47
 
48
  - **Architecture:** a Mamba model pre-trained via causal language modeling
49
  - **Size:** 119,930,880 parameters
50
  - **Context length:** 2048 tokens
51
+ - **Dataset:** [Pt-Corpus-Instruct-tokenized-large](https://huggingface.co/datasets/nicholasKluge/Pt-Corpus-Instruct-tokenized-large) (6.2B tokens)
52
  - **Language:** Portuguese
53
  - **Number of steps:** 758,423
54
 
55
+ ### Training & Reproducibility
56
+
57
+ This model was trained to be fully open and reproducible. You can find all the resources used below:
58
+
59
+ - **Source Code:** [GitHub Repository](https://github.com/DominguesM/mambarim-110M/)
60
+ - **Training Notebook:** [Open in Colab](https://githubtocolab.com/DominguesM/mambarim-110M/blob/main/MAMBARIM_110M.ipynb)
61
+ - **Training Metrics:** [View on Weights & Biases](https://wandb.ai/dominguesm/canarim-mamba-110m?nw=nwuserdominguesm)
62
 
63
  ## Intended Uses
64
 
65
+ This model is intended for a variety of text generation tasks in Portuguese. Given its size, it is particularly well-suited for environments with limited computational resources.
66
+
67
+ - **General-Purpose Text Generation:** The model can be used for creative writing, continuing a story, or generating text based on a prompt.
68
+ - **Research and Education:** As one of the first Portuguese Mamba models, it serves as an excellent resource for researchers studying State-Space Models, computational efficiency in LLMs, and NLP for non-English languages. Its smaller size also makes it an accessible tool for educational purposes.
69
+ - **Fine-tuning Base:** The model can be fine-tuned on specific datasets to create more specialized models for tasks like simple chatbots, content creation aids, or domain-specific text generation.
70
 
71
  ## Out-of-scope Use
72
 
73
+ The model is not intended for use in critical applications without comprehensive testing and fine-tuning. Users should be aware of the following limitations:
74
+
75
+ - **Factual Accuracy:** This model is not a knowledge base and can generate incorrect or fabricated information ("hallucinate"). It should not be used as a source of truth.
76
+ - **High-Stakes Decisions:** Do not use this model for making important decisions in domains such as medical, legal, or financial advice, as its outputs may be unreliable.
77
+ - **Bias and Safety:** The model was trained on a large corpus of public data from the internet and may reflect societal biases present in that data. It can generate content that is biased, offensive, or otherwise harmful.
78
 
79
  ## Basic usage
80
 
81
+ You need to install `transformers` from `main` until `transformers>=4.39.0` is released.
82
 
83
  ```bash
84
  pip install git+https://github.com/huggingface/transformers@main
85
  ```
86
 
87
+ We also recommend you to install both `causal_conv_1d` and `mamba-ssm` using:
88
 
89
  ```bash
90
  pip install causal-conv1d>=1.2.0
 
118
 
119
  Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/dominguesm/mambarim-110m)
120
 
121
+ | Model | **Average** | ENEM | BLUEX | OAB Exams | ASSIN2 RTE | ASSIN2 STS | FAQNAD NLI | HateBR | PT Hate Speech | tweetSentBR | **Architecture** |
122
+ | ----------------------------------------------------------------------------------------- | ----------- | ----- | ----- | --------- | ---------- | ---------- | ---------- | ------ | -------------- | ----------- | -------------------- |
123
+ | [TeenyTinyLlama-460m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m) | 28.86 | 20.15 | 25.73 | 27.02 | 53.61 | 13 | 46.41 | 33.59 | 22.99 | 17.28 | LlamaForCausalLM |
124
+ | [TeenyTinyLlama-160m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-160m) | 28.2 | 19.24 | 23.09 | 22.37 | 53.97 | 0.24 | 43.97 | 36.92 | 42.63 | 11.39 | LlamaForCausalLM |
125
+ | [MulaBR/Mula-4x160-v0.1](https://huggingface.co/MulaBR/Mula-4x160-v0.1) | 26.24 | 21.34 | 25.17 | 25.06 | 33.57 | 11.35 | 43.97 | 41.5 | 22.99 | 11.24 | MixtralForCausalLM |
126
+ | [TeenyTinyLlama-460m-Chat](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m-Chat) | 25.49 | 20.29 | 25.45 | 26.74 | 43.77 | 4.52 | 34 | 33.49 | 22.99 | 18.13 | LlamaForCausalLM |
127
+ | [**Mambarim-110M**](https://huggingface.co/dominguesm/mambarim-110m) | **14.16** | 18.4 | 10.57 | 21.87 | 16.09 | 1.89 | 9.29 | 15.75 | 17.77 | 15.79 | **MambaForCausalLM** |
128
+ | [GloriaTA-3B](https://huggingface.co/NOVA-vision-language/GlorIA-1.3B) | 4.09 | 1.89 | 3.2 | 5.19 | 0 | 2.32 | 0.26 | 0.28 | 23.52 | 0.19 | GPTNeoForCausalLM |