hoalq commited on
Commit
eeeb9f4
·
1 Parent(s): d2842ac

Upload ASTRA trained model

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - text-normalization
5
+ - vietnamese
6
+ - lexical-normalization
7
+ - astra
8
+ - visobert
9
+ pipeline_tag: text2text-generation
10
+ ---
11
+
12
+ # hadung1802/visobert-normalizer
13
+
14
+ This model is a Vietnamese text normalization model trained using the ASTRA framework with VISOBERT architecture.
15
+
16
+ ## Model Description
17
+
18
+ This model performs lexical normalization for Vietnamese text, converting informal text to standard Vietnamese. It was trained using the ASTRA (Self-training with Weak Supervision) framework.
19
+
20
+ ## Performance
21
+
22
+
23
+ ## Training Configuration
24
+
25
+ - **Student Model**: VISOBERT
26
+ - **Training Mode**: weakly_supervised
27
+ - **Learning Rate**: 0.001
28
+ - **Epochs**: 10
29
+ - **Batch Size**: 16
30
+
31
+ ## Usage
32
+
33
+ ```python
34
+ from transformers import AutoTokenizer, AutoModel
35
+ import torch
36
+
37
+ # Load model and tokenizer
38
+ tokenizer = AutoTokenizer.from_pretrained("hadung1802/visobert-normalizer")
39
+ model = AutoModel.from_pretrained("hadung1802/visobert-normalizer")
40
+
41
+ # Example usage
42
+ text = "toi di hoc"
43
+ inputs = tokenizer(text, return_tensors="pt")
44
+ with torch.no_grad():
45
+ outputs = model(**inputs)
46
+ ```
47
+
48
+ ## Citation
49
+
50
+ If you use this model, please cite the ASTRA paper:
51
+
52
+ ```bibtex
53
+ @article{astra2024,
54
+ title={ASTRA: Self-training with Weak Supervision for Vietnamese Text Normalization},
55
+ author={Your Name},
56
+ journal={arXiv preprint},
57
+ year={2024}
58
+ }
59
+ ```