|
--- |
|
language: vi |
|
tags: |
|
- spam-detection |
|
- vietnamese |
|
- bartpho |
|
license: apache-2.0 |
|
datasets: |
|
- visolex/ViSpamReviews |
|
metrics: |
|
- accuracy |
|
- f1 |
|
model-index: |
|
- name: bartpho-spam-classification |
|
results: |
|
- task: |
|
type: text-classification |
|
name: Spam Detection (Multi-Class) |
|
dataset: |
|
name: ViSpamReviews |
|
type: custom |
|
metrics: |
|
- name: Accuracy |
|
type: accuracy |
|
value: <INSERT_ACCURACY> |
|
- name: F1 Score |
|
type: f1 |
|
value: <INSERT_F1_SCORE> |
|
base_model: |
|
- vinai/bartpho-syllable |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# BARTPho-Spam-MultiClass |
|
|
|
Fine-tuned from [`vinai/bartpho-syllable`](https://huggingface.co/vinai/bartpho-syllable) on **ViSpamReviews** (multi-class). |
|
|
|
* **Task**: 4-way classification |
|
* **Dataset**: [ViSpamReviews](https://huggingface.co/datasets/visolex/ViSpamReviews) |
|
* **Hyperparameters** |
|
|
|
* Batch size: 32 |
|
* LR: 3e-5 |
|
* Epochs: 100 |
|
* Max seq len: 256 |
|
## Usage |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("visolex/bartpho-spam-classification") |
|
model = AutoModelForSequenceClassification.from_pretrained("visolex/bartpho-spam-classification") |
|
|
|
text = "Đánh giá quá chung chung, không liên quan." |
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256) |
|
pred = model(**inputs).logits.argmax(dim=-1).item() |
|
label_map = {0: "NO-SPAM",1: "SPAM-1",2: "SPAM-2",3: "SPAM-3"} |
|
print(label_map[pred]) |
|
``` |