valla2345 commited on
Commit
04765d5
Β·
verified Β·
1 Parent(s): 828bf17

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -105
README.md DELETED
@@ -1,105 +0,0 @@
1
- # William_Tyndale
2
-
3
- **William_Tyndale** is a fine-tuned sequence-to-sequence (encoder-decoder) model based on [T5-small](https://huggingface.co/t5-small), trained to translate Latin (`la`) into English (`en`). This model was developed for the purpose of supporting research in classical studies, linguistics, and digital humanities by providing high-quality automatic Latin-to-English translations.
4
-
5
- ---
6
-
7
- ## πŸ”§ Model Architecture
8
-
9
- - **Base model**: T5-small (Text-To-Text Transfer Transformer)
10
- - **Model size**: 60M parameters
11
- - **Architecture**: Transformer encoder-decoder
12
- - **Tokenizer**: SentencePiece tokenizer (shared vocab for encoder and decoder)
13
- - **Vocabulary size**: 32128
14
- - **Training epochs**: 5
15
- - **Optimizer**: AdamW
16
- - **Loss function**: CrossEntropyLoss
17
- - **Hardware**: Trained on dual T4 GPUs via Kaggle Notebook
18
-
19
- ---
20
-
21
- ## 🌐 Language Pairs
22
-
23
- - **Source language**: Latin (`la`)
24
- - **Target language**: English (`en`)
25
-
26
- This model is trained to handle sentence-level translation from Latin to natural English.
27
-
28
- ---
29
-
30
- ## πŸ“š Training Datasets
31
-
32
- This model was trained using multiple Latin-English parallel corpora collected from the [OPUS](https://opus.nlpl.eu/) project and other academic resources:
33
-
34
- | Dataset | License | Description |
35
- |----------------|------------------|-------------|
36
- | **bible-uedin** | CC0 1.0 | Parallel translations of the Bible across many languages |
37
- | **tatoeba** | CC BY 2.0 FR | Crowdsourced sentences translated between language pairs |
38
- | **XLENT** | Derived (OPUS) | Automatically aligned named entities across 120 languages |
39
- | **Custom** | N/A (manually prepared) | Aligned corpus compiled for supervised fine-tuning |
40
-
41
- > All datasets are aligned at sentence level and were preprocessed (normalization, deduplication, and filtering) prior to training.
42
-
43
- ---
44
-
45
- ## πŸ“Š Evaluation Metrics
46
-
47
- The model was evaluated using common machine translation metrics on a custom test set:
48
-
49
- - **BLEU**: 0.0 *(due to small test size and strictness of n-gram overlap)*
50
- - **ROUGE-L**: 0.75
51
- - **METEOR**: 0.996
52
-
53
- > Evaluation results should be interpreted with caution, especially given the stylistic differences in Latin and small-scale validation corpus.
54
-
55
- ---
56
-
57
- ## πŸš€ Usage Example
58
-
59
- ```python
60
- from transformers import T5Tokenizer, T5ForConditionalGeneration
61
-
62
- model = T5ForConditionalGeneration.from_pretrained("valla2345/William_Tyndale")
63
- tokenizer = T5Tokenizer.from_pretrained("valla2345/William_Tyndale")
64
-
65
- text = "translate Latin to English: Laudamus te, benedicimus te."
66
- inputs = tokenizer(text, return_tensors="pt")
67
- output = model.generate(**inputs)
68
- print(tokenizer.decode(output[0], skip_special_tokens=True))
69
- ```
70
-
71
- ---
72
-
73
- ## πŸ’¬ Intended Use & Limitations
74
-
75
- - βœ… Academic translation of Latin into modern English
76
- - βœ… Historical document analysis, theology, classics, epigraphy
77
- - ❌ Not intended for casual conversational Latin or low-resource dialects
78
-
79
- The model may struggle with:
80
- - Highly idiomatic or poetic Latin
81
- - Long complex clauses with ambiguous structure
82
- - OCR errors or historical orthography
83
-
84
- ---
85
-
86
- ## πŸ”– Citation
87
-
88
- If you use this model, please cite as follows:
89
-
90
- ```
91
- @misc{william_tyndale_2025,
92
- title={William_Tyndale: A Latin-to-English Transformer},
93
- author={valla2345},
94
- year={2025},
95
- howpublished={\url{https://huggingface.co/valla2345/William_Tyndale}}
96
- }
97
- ```
98
-
99
- ---
100
-
101
- ## πŸ™ Acknowledgements
102
-
103
- - The [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) library
104
- - The [OPUS Project](https://opus.nlpl.eu/) for public access to multilingual corpora
105
- - Kaggle for GPU notebook training support