Translation
English
Chinese
Eval Results

Chinese-English MT dataset (Gemini 2.0 Flash, web novels)

#1
by Moleys - opened

https://huggingface.co/datasets/TomatoMTL/tomato-zh2en

  • Source: Chinese web novels
  • Target: English
  • Generated using Gemini 2.0 Flash
  • Chunked input (≤1000 characters per chunk)
  • Raw outputs; may contain untranslated Chinese segments and require cleaning

I’ve been following the project for a while. Hope this helps.

quickmt org

Nice ! I'll give that corpus a try.
Thanks !

quickmt org

I only see the English side right now:

image

But I'll keep an eye on it in case the other side eventually gets uploaded.

Ah, I was being lazy.
In the en.zip file, there are two folders:

  • raw (zh)
  • mtl (en)

Sign up or log in to comment