hadung1802
/

visobert-normalizer

Text Generation

text-normalization

lexical-normalization

text2text-generation

Model card Files Files and versions

hoalq commited on 7 days ago

Commit

eeeb9f4

·

1 Parent(s): d2842ac

Upload ASTRA trained model

Files changed (1) hide show

README.md +59 -0

README.md ADDED Viewed

	@@ -0,0 +1,59 @@

+---
+license: mit
+tags:
+- text-normalization
+- vietnamese
+- lexical-normalization
+- astra
+- visobert
+pipeline_tag: text2text-generation
+---
+# hadung1802/visobert-normalizer
+This model is a Vietnamese text normalization model trained using the ASTRA framework with VISOBERT architecture.
+## Model Description
+This model performs lexical normalization for Vietnamese text, converting informal text to standard Vietnamese. It was trained using the ASTRA (Self-training with Weak Supervision) framework.
+## Performance
+## Training Configuration
+- **Student Model**: VISOBERT
+- **Training Mode**: weakly_supervised
+- **Learning Rate**: 0.001
+- **Epochs**: 10
+- **Batch Size**: 16
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModel
+import torch
+# Load model and tokenizer
+tokenizer = AutoTokenizer.from_pretrained("hadung1802/visobert-normalizer")
+model = AutoModel.from_pretrained("hadung1802/visobert-normalizer")
+# Example usage
+text = "toi di hoc"
+inputs = tokenizer(text, return_tensors="pt")
+with torch.no_grad():
+    outputs = model(**inputs)
+```
+## Citation
+If you use this model, please cite the ASTRA paper:
+```bibtex
+@article{astra2024,
+  title={ASTRA: Self-training with Weak Supervision for Vietnamese Text Normalization},
+  author={Your Name},
+  journal={arXiv preprint},
+  year={2024}
+}
+```