Delete README.md
Browse files
README.md
DELETED
@@ -1,105 +0,0 @@
|
|
1 |
-
# William_Tyndale
|
2 |
-
|
3 |
-
**William_Tyndale** is a fine-tuned sequence-to-sequence (encoder-decoder) model based on [T5-small](https://huggingface.co/t5-small), trained to translate Latin (`la`) into English (`en`). This model was developed for the purpose of supporting research in classical studies, linguistics, and digital humanities by providing high-quality automatic Latin-to-English translations.
|
4 |
-
|
5 |
-
---
|
6 |
-
|
7 |
-
## π§ Model Architecture
|
8 |
-
|
9 |
-
- **Base model**: T5-small (Text-To-Text Transfer Transformer)
|
10 |
-
- **Model size**: 60M parameters
|
11 |
-
- **Architecture**: Transformer encoder-decoder
|
12 |
-
- **Tokenizer**: SentencePiece tokenizer (shared vocab for encoder and decoder)
|
13 |
-
- **Vocabulary size**: 32128
|
14 |
-
- **Training epochs**: 5
|
15 |
-
- **Optimizer**: AdamW
|
16 |
-
- **Loss function**: CrossEntropyLoss
|
17 |
-
- **Hardware**: Trained on dual T4 GPUs via Kaggle Notebook
|
18 |
-
|
19 |
-
---
|
20 |
-
|
21 |
-
## π Language Pairs
|
22 |
-
|
23 |
-
- **Source language**: Latin (`la`)
|
24 |
-
- **Target language**: English (`en`)
|
25 |
-
|
26 |
-
This model is trained to handle sentence-level translation from Latin to natural English.
|
27 |
-
|
28 |
-
---
|
29 |
-
|
30 |
-
## π Training Datasets
|
31 |
-
|
32 |
-
This model was trained using multiple Latin-English parallel corpora collected from the [OPUS](https://opus.nlpl.eu/) project and other academic resources:
|
33 |
-
|
34 |
-
| Dataset | License | Description |
|
35 |
-
|----------------|------------------|-------------|
|
36 |
-
| **bible-uedin** | CC0 1.0 | Parallel translations of the Bible across many languages |
|
37 |
-
| **tatoeba** | CC BY 2.0 FR | Crowdsourced sentences translated between language pairs |
|
38 |
-
| **XLENT** | Derived (OPUS) | Automatically aligned named entities across 120 languages |
|
39 |
-
| **Custom** | N/A (manually prepared) | Aligned corpus compiled for supervised fine-tuning |
|
40 |
-
|
41 |
-
> All datasets are aligned at sentence level and were preprocessed (normalization, deduplication, and filtering) prior to training.
|
42 |
-
|
43 |
-
---
|
44 |
-
|
45 |
-
## π Evaluation Metrics
|
46 |
-
|
47 |
-
The model was evaluated using common machine translation metrics on a custom test set:
|
48 |
-
|
49 |
-
- **BLEU**: 0.0 *(due to small test size and strictness of n-gram overlap)*
|
50 |
-
- **ROUGE-L**: 0.75
|
51 |
-
- **METEOR**: 0.996
|
52 |
-
|
53 |
-
> Evaluation results should be interpreted with caution, especially given the stylistic differences in Latin and small-scale validation corpus.
|
54 |
-
|
55 |
-
---
|
56 |
-
|
57 |
-
## π Usage Example
|
58 |
-
|
59 |
-
```python
|
60 |
-
from transformers import T5Tokenizer, T5ForConditionalGeneration
|
61 |
-
|
62 |
-
model = T5ForConditionalGeneration.from_pretrained("valla2345/William_Tyndale")
|
63 |
-
tokenizer = T5Tokenizer.from_pretrained("valla2345/William_Tyndale")
|
64 |
-
|
65 |
-
text = "translate Latin to English: Laudamus te, benedicimus te."
|
66 |
-
inputs = tokenizer(text, return_tensors="pt")
|
67 |
-
output = model.generate(**inputs)
|
68 |
-
print(tokenizer.decode(output[0], skip_special_tokens=True))
|
69 |
-
```
|
70 |
-
|
71 |
-
---
|
72 |
-
|
73 |
-
## π¬ Intended Use & Limitations
|
74 |
-
|
75 |
-
- β
Academic translation of Latin into modern English
|
76 |
-
- β
Historical document analysis, theology, classics, epigraphy
|
77 |
-
- β Not intended for casual conversational Latin or low-resource dialects
|
78 |
-
|
79 |
-
The model may struggle with:
|
80 |
-
- Highly idiomatic or poetic Latin
|
81 |
-
- Long complex clauses with ambiguous structure
|
82 |
-
- OCR errors or historical orthography
|
83 |
-
|
84 |
-
---
|
85 |
-
|
86 |
-
## π Citation
|
87 |
-
|
88 |
-
If you use this model, please cite as follows:
|
89 |
-
|
90 |
-
```
|
91 |
-
@misc{william_tyndale_2025,
|
92 |
-
title={William_Tyndale: A Latin-to-English Transformer},
|
93 |
-
author={valla2345},
|
94 |
-
year={2025},
|
95 |
-
howpublished={\url{https://huggingface.co/valla2345/William_Tyndale}}
|
96 |
-
}
|
97 |
-
```
|
98 |
-
|
99 |
-
---
|
100 |
-
|
101 |
-
## π Acknowledgements
|
102 |
-
|
103 |
-
- The [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) library
|
104 |
-
- The [OPUS Project](https://opus.nlpl.eu/) for public access to multilingual corpora
|
105 |
-
- Kaggle for GPU notebook training support
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|