Seq2Seq German-English Translation Model
A sequence-to-sequence neural machine translation model that translates German text to English, built using PyTorch with LSTM encoder-decoder architecture.
Model Description
This model implements the classic seq2seq architecture from Sutskever et al. (2014) for German-English translation:
- Encoder: 2-layer LSTM that processes German input sequences
- Decoder: 2-layer LSTM that generates English output sequences
- Training Strategy: Teacher forcing during training, autoregressive generation during inference
- Vocabulary: 30k German words, 25k English words
- Dataset: Trained on 2M sentence pairs from WMT19 (subset of full 35M dataset)
Model Architecture
German Input β Embedding β LSTM Encoder β Context Vector β LSTM Decoder β Embedding β English Output
Hyperparameters:
- Embedding size: 256
- Hidden size: 512
- LSTM layers: 2 (both encoder/decoder)
- Dropout: 0.3
- Batch size: 64
- Learning rate: 0.0003
Training Data
- Dataset: WMT19 German-English Translation Task
- Size: 2M sentence pairs (filtered subset)
- Preprocessing: Sentences filtered by length (5-50 tokens)
- Tokenization: Custom word-level tokenizer with special tokens (
<PAD>
,<UNK>
,<START>
,<END>
)
Performance
Training Results (5 epochs):
- Initial Training Loss: 4.0949 β Final: 3.1843 (91% improvement)
- Initial Validation Loss: 4.1918 β Final: 3.8537 (34% improvement)
- Training Device: Apple Silicon (MPS)
Usage
Quick Start
# This is a custom PyTorch model, not a Transformers model
# Download the files and use with the provided inference script
import requests
from pathlib import Path
# Download model files
base_url = "https://huggingface.co/sumitdotml/seq2seq-de-en/resolve/main"
files = ["best_model.pt", "german_tokenizer.pkl", "english_tokenizer.pkl"]
for file in files:
response = requests.get(f"{base_url}/{file}")
Path(file).write_bytes(response.content)
print(f"Downloaded {file}")
Translation Examples
# Interactive mode
python inference.py --interactive
# Single translation
python inference.py --sentence "Hallo, wie geht es dir?" --verbose
# Demo mode
python inference.py
Example Translations:
"Das ist ein gutes Buch."
β"this is a good idea."
"Wo ist der Bahnhof?"
β"where is the <UNK>"
"Ich liebe Deutschland."
β"i share."
Files Included
best_model.pt
: PyTorch model checkpoint (trained weights + architecture)german_tokenizer.pkl
: German vocabulary and tokenization logicenglish_tokenizer.pkl
: English vocabulary and tokenization logic
Installation & Setup
Clone the repository:
git clone https://github.com/sumitdotml/seq2seq cd seq2seq
Set up environment:
uv venv && source .venv/bin/activate # or python -m venv .venv uv pip install torch requests tqdm # or pip install torch requests tqdm
Download model:
python scripts/download_pretrained.py
Start translating:
python scripts/inference.py --interactive
Model Architecture Details
The model uses a custom implementation with these components:
- Encoder (
src/models/encoder.py
): LSTM-based encoder with embedding layer - Decoder (
src/models/decoder.py
): LSTM-based decoder with attention-free architecture - Seq2Seq (
src/models/seq2seq.py
): Main model combining encoder-decoder with generation logic
Limitations
- Vocabulary constraints: Limited to 30k German / 25k English words
- Training data: Only 2M sentence pairs (vs 35M in full WMT19)
- No attention mechanism: Basic encoder-decoder without attention
- Simple tokenization: Word-level tokenization without subword units
- Translation quality: Suitable for basic phrases, struggles with complex sentences
Training Details
Environment:
- Framework: PyTorch 2.0+
- Device: Apple Silicon (MPS acceleration)
- Training time: ~5 epochs
- Validation strategy: Hold-out validation set
Optimization:
- Optimizer: Adam (lr=0.0003)
- Loss function: CrossEntropyLoss (ignoring padding)
- Gradient clipping: 1.0
- Scheduler: StepLR (step_size=3, gamma=0.5)
Reproduce Training
# Full training pipeline
python scripts/data_preparation.py # Download WMT19 data
python src/data/tokenization.py # Build vocabularies
python scripts/train.py # Train model
# For full dataset training, modify data_preparation.py:
# use_full_dataset = True # Line 133-134
Citation
If you use this model, please cite:
@misc{seq2seq-de-en,
author = {sumitdotml},
title = {German-English Seq2Seq Translation Model},
year = {2025},
url = {https://huggingface.co/sumitdotml/seq2seq-de-en},
note = {PyTorch implementation of sequence-to-sequence translation}
}
References
- Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. NeurIPS.
- WMT19 Translation Task: https://huggingface.co/datasets/wmt/wmt19
License
MIT License - See repository for full license text.
Contact
For questions about this model or training code, please open an issue in the GitHub repository.