YAML Metadata Warning: The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

hadung1802/visobert-normalizer

This model is a Vietnamese text normalization model trained using the ASTRA framework with VISOBERT architecture.

Model Description

This model performs lexical normalization for Vietnamese text, converting informal text to standard Vietnamese. It was trained using the ASTRA (Self-training with Weak Supervision) framework.

Performance

Training Configuration

Student Model: VISOBERT
Training Mode: weakly_supervised
Learning Rate: 0.001
Epochs: 10
Batch Size: 16

Usage

from transformers import AutoTokenizer, AutoModel
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("hadung1802/visobert-normalizer")
model = AutoModel.from_pretrained("hadung1802/visobert-normalizer")

# Example usage
text = "toi di hoc"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

Citation

If you use this model, please cite the ASTRA paper:

@article{astra2024,
  title={ASTRA: Self-training with Weak Supervision for Vietnamese Text Normalization},
  author={Your Name},
  journal={arXiv preprint},
  year={2024}
}

Downloads last month: 120