File size: 4,626 Bytes
135e47a b3db29e 135e47a 1fbcf34 135e47a 4173086 135e47a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
---
license: mit
language:
- pt
tags:
- roberta
- masked-language-modeling
- portuguese
- portbert
- portbert-large
- downstream-evaluation
- extraGLUE
datasets:
- uonlp/CulturaX
- extraGLUE
pipeline_tag: fill-mask
---
# PortBERT: Navigating the Depths of Portuguese Language Models
**PortBERT** is a family of RoBERTa-based language models pre-trained from scratch on the Portuguese portion of OSCAR23 and MC4 (deduplicated variants of CulturaX). The models are designed to offer strong downstream performance in Portuguese NLP tasks, while providing insights into the cost-performance tradeoffs of training across hardware backends.
We release two variants:
- `PortBERT-base`: 126M parameters, trained on 8× A40 GPUs (fp32)
- `PortBERT-large`: 357M parameters, trained on TPUv4-128 pod (fp32)
---
## Model Details
| Detail | PortBERT-base | PortBERT-large |
|-------------------|---------------------------------------------|----------------|
| Architecture | RoBERTa-base | RoBERTa-large |
| Parameters | ~126M | ~357M |
| Tokenizer | GPT-2 style (52k vocab) | Same |
| Pretraining corpus | deduplicated mC4 and OSCAR 23 from CulturaX | Same |
| Objective | Masked Language Modeling | Same |
| Training time | ~27 days on 8× A40 | ~6.2 days on TPUv4-128 pod |
| Precision | fp32 | fp32 |
| Framework | fairseq | fairseq |
---
## Downstream Evaluation (ExtraGLUE)
We evaluate PortBERT on **ExtraGLUE**, a Portuguese adaptation of the GLUE benchmark. Fine-tuning was conducted using HuggingFace Transformers, with NNI-based grid search over batch size and learning rate (28 configurations per task). Each task was fine-tuned for up to 10 epochs. Metrics were computed on validation sets due to the lack of held-out test sets.
**AVG score** averages the following metrics:
- STSB Spearman
- STSB Pearson
- RTE Accuracy
- WNLI Accuracy
- MRPC Accuracy
- MRPC F1
### 🧪 Evaluation Results
**Legend**: **Bold = best**, *italic = second-best* per model size.
| Model | STSB_Sp | STSB_Pe | STSB_Mean | RTE_Acc | WNLI_Acc | MRPC_Acc | MRPC_F1 | AVG |
|------------------------|----------|----------|------------|----------|----------|----------|----------|-----------|
| **Large models** | | | | | | | | |
| XLM-RoBERTa_large | **90.00**| **90.27**| **90.14** | **82.31**| 57.75 | *90.44* | *93.31* | **84.01** |
| EuroBERT-610m | 88.46 | 88.59 | 88.52 | *78.34* | *59.15* | **91.91**| **94.20**| *83.44* |
| PortBERT_large | 88.53 | 88.68 | 88.60 | 72.56 | **61.97**| 89.46 | 92.39 | 82.26 |
| BERTimbau_large | *89.40* | *89.61* | *89.50* | 75.45 | *59.15* | 88.24 | 91.55 | 82.23 |
| **Base models** | | | | | | | | |
| RoBERTaLexPT_base | 86.68 | 86.86 | 86.77 | 69.31 | *59.15* | **89.46**| **92.34**| **80.63** |
| PortBERT_base | *87.39* | *87.65* | *87.52* | 68.95 | **60.56**| 87.75 | *91.13* | *80.57* |
| RoBERTaCrawlPT_base | 87.34 | 87.45 | 87.39 | **72.56**| 56.34 | *87.99* | 91.20 | 80.48 |
| BERTimbau_base | **88.39**| **88.60**| **88.50** | *70.40* | 56.34 | 87.25 | 90.97 | 80.32 |
| XLM-RoBERTa_base | 85.75 | 86.09 | 85.92 | 68.23 | **60.56**| 87.75 | 91.32 | 79.95 |
| EuroBERT-210m | 86.54 | 86.62 | 86.58 | 65.70 | 57.75 | 87.25 | 91.00 | 79.14 |
| AlBERTina 100M PTPT | 86.52 | 86.51 | 86.52 | 70.04 | 56.34 | 85.05 | 89.57 | 79.01 |
| AlBERTina 100M PTBR | 85.97 | 85.99 | 85.98 | 68.59 | 56.34 | 85.78 | 89.82 | 78.75 |
| AiBERTa | 83.56 | 83.73 | 83.65 | 64.98 | 56.34 | 82.11 | 86.99 | 76.29 |
| roBERTa PT | 48.06 | 48.51 | 48.29 | 56.68 | *59.15* | 72.06 | 81.79 | 61.04 |
---
## Fairseq Checkpoint
Get the fairseq checkpoint [here](https://drive.proton.me/urls/WXZQ7HW0Q8#zgJKDhKNGaOt).
## 📜 License
MIT License
|