Chinese-English MT dataset (Gemini 2.0 Flash, web novels)
#1
by
Moleys
- opened
https://huggingface.co/datasets/TomatoMTL/tomato-zh2en
- Source: Chinese web novels
- Target: English
- Generated using Gemini 2.0 Flash
- Chunked input (≤1000 characters per chunk)
- Raw outputs; may contain untranslated Chinese segments and require cleaning
I’ve been following the project for a while. Hope this helps.
Nice ! I'll give that corpus a try.
Thanks !
Ah, I was being lazy.
In the en.zip file, there are two folders:
- raw (zh)
- mtl (en)
