File size: 4,616 Bytes
9629ba8 38fec40 9629ba8 38fec40 9629ba8 38fec40 9629ba8 38fec40 9629ba8 38fec40 9629ba8 885bd35 38fec40 9629ba8 38fec40 9629ba8 38fec40 9629ba8 38fec40 9629ba8 38fec40 9629ba8 38fec40 9629ba8 38fec40 9629ba8 38fec40 9629ba8 38fec40 9629ba8 38fec40 9629ba8 38fec40 9629ba8 38fec40 9629ba8 38fec40 9629ba8 38fec40 9629ba8 38fec40 9629ba8 38fec40 9629ba8 38fec40 9629ba8 2f2e3b6 e66d873 2f2e3b6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
---
library_name: transformers
language:
- pt
license: cc-by-4.0
tags:
- text-generation
- pytorch
- LLM
- Portuguese
- mamba
datasets:
- nicholasKluge/Pt-Corpus-Instruct
inference:
parameters:
repetition_penalty: 1.2
temperature: 0.8
top_k: 50
top_p: 0.85
max_new_tokens: 150
widget:
- text: "O Natal é uma"
example_title: Exemplo
- text: "A muitos anos atrás, em uma galáxia muito distante, vivia uma raça de"
example_title: Exemplo
- text: "Em meio a um escândalo, a frente parlamentar pediu ao Senador Silva para"
example_title: Exemplo
pipeline_tag: text-generation
---
# Mambarim-110M
<p align="center">
<img width="350" alt="Camarim Logo" src="https://raw.githubusercontent.com/DominguesM/mambarim-110M/main/assets/mambarim-bg.png">
</p>
</br>
## Model Summary
**Mambarim-110M** is the first Portuguese language model based on a state-space model architecture (Mamba), not a transformer.
WIP
## Details
- **Architecture:** a Mamba model pre-trained via causal language modeling
- **Size:** 119,930,880 parameters
- **Context length:** 2048 tokens
- **Dataset:** [Pt-Corpus Instruct](https://huggingface.co/datasets/nicholasKluge/Pt-Corpus-Instruct) (6.2B tokens)
- **Language:** Portuguese
- **Number of steps:** 758,423
This repository has the [source code](https://github.com/DominguesM/mambarim-110M/) used to train this model.
## Intended Uses
WIP
## Out-of-scope Use
WIP
## Basic usage
You need to install `transformers` from `main` until `transformers=4.39.0` is released.
```bash
pip install git+https://github.com/huggingface/transformers@main
```
We also recommend you to install both `causal_conv_1d` and `mamba-ssm` using:
```bash
pip install causal-conv1d>=1.2.0
pip install mamba-ssm
```
You can use the classic `generate` API:
```python
>>> from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("dominguesm/mambarim-110m")
>>> model = MambaForCausalLM.from_pretrained("dominguesm/mambarim-110m")
>>> input_ids = tokenizer("O Natal é uma", return_tensors="pt")["input_ids"]
>>> out = model.generate(
input_ids,
repetition_penalty=1.2,
temperature=0.8,
top_k=50,
top_p=0.85,
do_sample=True,
max_new_tokens=10
)
>>> print(tokenizer.batch_decode(out))
["<s> O Natal é uma data em que as pessoas passam horas de lazer e"]
```
## Benchmarks
Evaluations on Brazilian Portuguese benchmarks were performed using a [Portuguese implementation of the EleutherAI LM Evaluation Harness](https://github.com/eduagarcia/lm-evaluation-harness-pt) (created by [Eduardo Garcia](https://github.com/eduagarcia/lm-evaluation-harness-pt)).
Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/dominguesm/mambarim-110m)
| Model | **Average** | ENEM | BLUEX | OAB Exams | ASSIN2 RTE | ASSIN2 STS | FAQNAD NLI | HateBR | PT Hate Speech | tweetSentBR | **Architecture** |
| -------------------------------------- | ----------- | ----- | ----- | --------- | ---------- | ---------- | ---------- | ------ | -------------- | ----------- | ------------------ |
| [TeenyTinyLlama-460m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m) | 28.86 | 20.15 | 25.73 | 27.02 | 53.61 | 13 | 46.41 | 33.59 | 22.99 | 17.28 | LlamaForCausalLM |
| [TeenyTinyLlama-160m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-160m) | 28.2 | 19.24 | 23.09 | 22.37 | 53.97 | 0.24 | 43.97 | 36.92 | 42.63 | 11.39 | LlamaForCausalLM |
| [MulaBR/Mula-4x160-v0.1](https://huggingface.co/MulaBR/Mula-4x160-v0.1) | 26.24 | 21.34 | 25.17 | 25.06 | 33.57 | 11.35 | 43.97 | 41.5 | 22.99 | 11.24 | MixtralForCausalLM |
| [TeenyTinyLlama-460m-Chat](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m-Chat) | 25.49 | 20.29 | 25.45 | 26.74 | 43.77 | 4.52 | 34 | 33.49 | 22.99 | 18.13 | LlamaForCausalLM |
| [**manbarim-110m**](https://huggingface.co/dominguesm/mambarim-110m) | **14.16** | 18.4 | 10.57 | 21.87 | 16.09 | 1.89 | 9.29 | 15.75 | 17.77 | 15.79 | **MambaForCausalLM** |
| [GloriaTA-3B](https://huggingface.co/NOVA-vision-language/GlorIA-1.3B) | 4.09 | 1.89 | 3.2 | 5.19 | 0 | 2.32 | 0.26 | 0.28 | 23.52 | 0.19 | GPTNeoForCausalLM |
|