AI & ML interests

Web as a corpus, Large Language Models, Machine Translation, Language Technologies, Natural Language Processing, Internet Archive, CommonCrawl

Recent Activity

vmkhlv  published a dataset about 5 hours ago
HPLT/2508-wds-evals
vmkhlv  published a dataset about 5 hours ago
HPLT/2505-deduplication-evals
vmkhlv  published a dataset about 5 hours ago
HPLT/2508-datasets-evals
View all activity

HPLT 's collections 9

Multilingual Translation Models
Translation models trained on OPUS data including HPLT datasets
Multilingual Translation Models
Translation models trained on OPUS data including HPLT datasets