utrobinmv
/

m2m_translate_en_ru_zh_large_4096

+---
+language:
+- ru
+- zh
+- en
+tags:
+- translation
+license: apache-2.0
+datasets:
+- ccmatrix
+metrics:
+- sacrebleu
+widget:
+  - example_title: translate zh-ru
+    text: >
+      translate to ru: 开发的目的是为用户提供个人同步翻译。
+  - example_title: translate ru-en
+    text: >
+      translate to en: Цель разработки — предоставить пользователям личного синхронного переводчика.
+  - example_title: translate en-ru
+    text: >
+      translate to ru: The purpose of the development is to provide users with a personal synchronized interpreter.
+  - example_title: translate en-zh
+    text: >
+      translate to zh: The purpose of the development is to provide users with a personal synchronized interpreter.
+  - example_title: translate zh-en
+    text: >
+      translate to en: 开发的目的是为用户提供个人同步解释器。
+  - example_title: translate ru-zh
+    text: >
+      translate to zh: Цель разработки — предоставить пользователям личного синхронного переводчика.
+---
+# m2m English, Russian and Chinese multilingual machine translation
+This model represents a conventional m2m transformer in multitasking mode for translation into the required language, precisely configured for machine translation for pairs: ru-zh, zh-ru, en-zh, zh-en, en-ru, ru-en.
+The model can perform direct translation between any pair of Russian, Chinese or English languages. For translation into the target language, the target language identifier is specified as a prefix 'translate to <lang>:'. In this case, the source language may not be specified, in addition, the source text may be multilingual.
+Fine tune from the base model: utrobinmv/m2m_translate_en_ru_zh_large_4096
+This version of the model was based on noisier data with a noise reduction function.
+The model can additionally insert punctuation marks into sentences if they are missing from the source text. This is convenient to use for translating texts after ASR models.
+The model has learned how to translate small markdown files while maintaining the markup and html tags.
+Example translate Russian to Chinese
+```python
+from transformers import M2M100ForConditionalGeneration, AutoTokenizer
+device = 'cuda' #or 'cpu' for translate on cpu
+model_name = 'utrobinmv/m2m_translate_en_ru_zh_large_4096'
+model = M2M100ForConditionalGeneration.from_pretrained(model_name)
+model.eval()
+model.to(device)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+prefix = 'translate to zh: '
+src_text = prefix + "Съешь ещё этих мягких французских булок."
+# translate Russian to Chinese
+input_ids = tokenizer(src_text, return_tensors="pt")
+generated_tokens = model.generate(**input_ids.to(device))
+result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
+print(result)
+# 再吃这些法国的甜蜜的面包。
+```
+and Example translate Chinese to Russian
+```python
+from transformers import M2M100ForConditionalGeneration, AutoTokenizer
+device = 'cuda' #or 'cpu' for translate on cpu
+model_name = 'utrobinmv/m2m_translate_en_ru_zh_large_4096'
+model = M2M100ForConditionalGeneration.from_pretrained(model_name)
+model.eval()
+model.to(device)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+prefix = 'translate to ru: '
+src_text = prefix + "再吃这些法国的甜蜜的面包。"
+# translate Russian to Chinese
+input_ids = tokenizer(src_text, return_tensors="pt")
+generated_tokens = model.generate(**input_ids.to(device))
+result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
+print(result)
+# Съешьте этот сладкий хлеб из Франции.
+```
+##
+## Languages covered
+Russian (ru_RU), Chinese (zh_CN), English (en_US)