valla2345
/

William_Tyndale

@@ -1,105 +0,0 @@
-# William_Tyndale
-**William_Tyndale** is a fine-tuned sequence-to-sequence (encoder-decoder) model based on [T5-small](https://huggingface.co/t5-small), trained to translate Latin (`la`) into English (`en`). This model was developed for the purpose of supporting research in classical studies, linguistics, and digital humanities by providing high-quality automatic Latin-to-English translations.
----
-## 🔧 Model Architecture
-- **Base model**: T5-small (Text-To-Text Transfer Transformer)
-- **Model size**: 60M parameters
-- **Architecture**: Transformer encoder-decoder
-- **Tokenizer**: SentencePiece tokenizer (shared vocab for encoder and decoder)
-- **Vocabulary size**: 32128
-- **Training epochs**: 5
-- **Optimizer**: AdamW
-- **Loss function**: CrossEntropyLoss
-- **Hardware**: Trained on dual T4 GPUs via Kaggle Notebook
----
-## 🌐 Language Pairs
-- **Source language**: Latin (`la`)
-- **Target language**: English (`en`)
-This model is trained to handle sentence-level translation from Latin to natural English.
----
-## 📚 Training Datasets
-This model was trained using multiple Latin-English parallel corpora collected from the [OPUS](https://opus.nlpl.eu/) project and other academic resources:
-| Dataset        | License          | Description |
-|----------------|------------------|-------------|
-| **bible-uedin**  | CC0 1.0          | Parallel translations of the Bible across many languages |
-| **tatoeba**      | CC BY 2.0 FR     | Crowdsourced sentences translated between language pairs |
-| **XLENT**        | Derived (OPUS)   | Automatically aligned named entities across 120 languages |
-| **Custom**       | N/A (manually prepared) | Aligned corpus compiled for supervised fine-tuning |
-> All datasets are aligned at sentence level and were preprocessed (normalization, deduplication, and filtering) prior to training.
----
-## 📊 Evaluation Metrics
-The model was evaluated using common machine translation metrics on a custom test set:
-- **BLEU**: 0.0 *(due to small test size and strictness of n-gram overlap)*
-- **ROUGE-L**: 0.75
-- **METEOR**: 0.996
-> Evaluation results should be interpreted with caution, especially given the stylistic differences in Latin and small-scale validation corpus.
----
-## 🚀 Usage Example
-```python
-from transformers import T5Tokenizer, T5ForConditionalGeneration
-model = T5ForConditionalGeneration.from_pretrained("valla2345/William_Tyndale")
-tokenizer = T5Tokenizer.from_pretrained("valla2345/William_Tyndale")
-text = "translate Latin to English: Laudamus te, benedicimus te."
-inputs = tokenizer(text, return_tensors="pt")
-output = model.generate(**inputs)
-print(tokenizer.decode(output[0], skip_special_tokens=True))
-```
----
-## 💬 Intended Use & Limitations
-- ✅ Academic translation of Latin into modern English
-- ✅ Historical document analysis, theology, classics, epigraphy
-- ❌ Not intended for casual conversational Latin or low-resource dialects
-The model may struggle with:
-- Highly idiomatic or poetic Latin
-- Long complex clauses with ambiguous structure
-- OCR errors or historical orthography
----
-## 🔖 Citation
-If you use this model, please cite as follows:
-```
-@misc{william_tyndale_2025,
-  title={William_Tyndale: A Latin-to-English Transformer},
-  author={valla2345},
-  year={2025},
-  howpublished={\url{https://huggingface.co/valla2345/William_Tyndale}}
-}
-```
----
-## 🙏 Acknowledgements
-- The [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) library
-- The [OPUS Project](https://opus.nlpl.eu/) for public access to multilingual corpora
-- Kaggle for GPU notebook training support