ViSoBERT-HSD-Span

This model is fine-tuned from uitnlp/visobert on the visolex/ViHOS dataset for span-level hate/offensive detection in Vietnamese comments.

Model Details

Hyperparameters

  • Batch size: 16
  • Learning rate: 5e-5
  • Epochs: 100
  • Max sequence length: 128
  • Early stopping: 5

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("visolex/visobert-hsd-span")
model = AutoModelForTokenClassification.from_pretrained("visolex/visobert-hsd-span")

text = "Nói cái lol . t thấy thô tục vl"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
    outputs = model(**inputs)
logits = outputs.logits  # [batch, seq_len, num_labels]
# For binary: use sigmoid, for multi-class: use softmax+argmax
probs = torch.sigmoid(logits)
preds = (probs > 0.5).long().squeeze().tolist()  # [seq_len]
tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])

span_labels = [p[0] for p in preds]

# Lấy token có nhãn span = 1, loại bỏ <s> và </s> nếu muốn
span_tokens = [token for token, label in zip(tokens, span_labels) if label == 1 and token not in ['<s>', '</s>']]

print("Span tokens:", span_tokens)
print("Span text:", tokenizer.convert_tokens_to_string(span_tokens))
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for visolex/visobert-hsd-span

Base model

uitnlp/visobert
Finetuned
(45)
this model

Dataset used to train visolex/visobert-hsd-span

Collection including visolex/visobert-hsd-span

Evaluation results