YAML Metadata Warning: The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

hadung1802/visobert-normalizer

This model is a Vietnamese text normalization model trained using the ASTRA framework with VISOBERT architecture.

Model Description

This model performs lexical normalization for Vietnamese text, converting informal text to standard Vietnamese. It was trained using the ASTRA (Self-training with Weak Supervision) framework.

Performance

Training Configuration

  • Student Model: VISOBERT
  • Training Mode: weakly_supervised
  • Learning Rate: 0.001
  • Epochs: 10
  • Batch Size: 16

Usage

from transformers import AutoTokenizer, AutoModel
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("hadung1802/visobert-normalizer")
model = AutoModel.from_pretrained("hadung1802/visobert-normalizer")

# Example usage
text = "toi di hoc"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

Citation

If you use this model, please cite the ASTRA paper:

@article{astra2024,
  title={ASTRA: Self-training with Weak Supervision for Vietnamese Text Normalization},
  author={Your Name},
  journal={arXiv preprint},
  year={2024}
}
Downloads last month
120
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support