Akatsuki-Amemiya/Nobihaza-trans-demo

【英文、日文版本由deepseek-r1翻译，如有与中文版不一致之处，以中文版为准】
※ English and Japanese versions are translated by deepseek-r1. In case of discrepancies, the Chinese version shall prevail.
※ 英語・日本語版はdeepseek-r1による翻訳です。中国語版と不一致がある場合は、中国語版が優先されます。

背景 Background 背景

由于sakurallm缺乏多语种语言互译的支持，当前所有本地模型看起来都不能很好的支持rpgmaker游戏（如野比大雄的生化危机）翻译的情况下，我们提出了Nobihaza-trans-demo系列模型，旨在验证基于大语言模型微调成多语言互译模型的可能性。
Given the lack of multilingual translation support in sakurallm and the current limitations of local models in handling RPGMaker game translations (e.g., Nobita's Biohazard), we propose the Nobihaza-trans-demo series to validate the feasibility of fine-tuning LLMs for multilingual translation tasks.
sakurallmが多言語翻訳機能を欠き、現地モデルがRPGMakerゲーム（例：『のび太のバイオハザード』）の翻訳を適切に処理できない現状を踏まえ、大規模言語モデルを多言語相互翻訳用にファインチューニングする可能性を検証するため、Nobihaza-trans-demoシリーズを提案します。

模型列表 Model List モデル一覧

我们开源了基于qwen2.5-7B/llama3.1-8B/gemma3-12B/Ministral-8B/c4ai-r7b的微调模型，仅使用中日平行语料微调，所以本系列理论上暂时只支持中日互译，甚至可能仅支持野比大雄的生化危机领域文本。
We open-source fine-tuned models based on qwen2.5-7B/llama3.1-8B/gemma3-12B/Ministral-8B/c4ai-r7b, trained exclusively on Japanese-Chinese parallel corpora. This series currently supports only Jp↔Zh translation, with potential domain limitations to Nobita's Biohazard series texts.
qwen2.5-7B/llama3.1-8B/gemma3-12B/Ministral-8B/c4ai-r7bを基に中日対訳コーパスのみでファインチューニングしたモデルを公開。理論上は中日相互翻訳のみ対応し、『のび太のバイオハザード』領域テキストに限定される可能性があります。

Model	Base Model
Nobihaza-Trans-demo-Qwen-7B	Qwen2.5-7B-Instruct
Nobihaza-Trans-demo-Llama-8B	Llama-3.1-8B-Instruct
Nobihaza-Trans-demo-Gemma3-12B	gemma-3-12b-it
Nobihaza-Trans-demo-Ministral-8B	Ministral-8B-Instruct-2410
Nobihaza-Trans-demo-c4ai-r7B	c4ai-command-r7b-12-2024

数据集 Dataset データセット

野比大雄的生化危机（日、中）
新译野比大雄的生化危机（日、中）
大雄战记ACE（日、中）
野比大雄的生化危机G（日、中）（Qwen-7B/Llama-8B版模型不包含这个数据）

Nobita's Biohazard (Jp/Zh)
New Translation: Nobita's Biohazard (Jp/Zh)
Nobita War Chronicle ACE (Jp/Zh)
Nobita's Biohazard G (Jp/Zh) (Excluded from Qwen-7B/Llama-8B versions)

『のび太のバイオハザード』（日・中）
『新訳のび太のバイオハザード』（日・中）
『のび太戦記ACE』（日・中）
『のび太のバイオハザードG』（日・中）（Qwen-7B/Llama-8Bモデルは本データを含まない）

训练细节 Training Details 学習詳細

在4060ti16G上使用不同级别的qlora进行训练（穷）
Trained with varying levels of qLoRA on RTX 4060Ti 16GB (Limited resources)
GeForce RTX 4060Ti 16GBで異なるレベルのqLoRAを適用して学習（リソース制約あり）

测试结果 Evaluation 評価結果

Qwen: 中文表达能力优秀，日语能力欠缺
Llama: 中日勉勉强强，日中没几句人话
Gemma: 中文优秀，日语尚可（但需16G显存）
Ministral: Loss最低，日中夹杂英文
c4ai-r: Loss最差，中日质量接近Gemma/Ministral

Qwen: Excellent Chinese, poor Japanese
Llama: Barely functional in Jp→Zh
Gemma: Strong Chinese, acceptable Japanese (16G VRAM required)
Ministral: Lowest loss, occasional English in outputs
c4ai-r: Worst loss, comparable quality to Gemma/Ministral

Qwen: 中国語優秀、日本語未達
Llama: 日中翻訳が不完全
Gemma: 中国語優位、日本語は可（16G VRAM要）
Ministral: 最低loss値、英文混在あり
c4ai-r: 最悪loss値、Gemma/Ministral並み品質

部署方法 Deployment デプロイ方法

建议使用LM Studio（较易）、ollama+open webui（可能较难）部署
Recommended tools: LM Studio (easier), ollama+Open WebUI (advanced)
推奨ツール: LM Studio（簡単）、ollama+Open WebUI（上級者向け）

Prompt提示 Prompt Guide プロンプト指針

日→中:

把下列文本从日文翻译到中文  
（日文原文、日本語原文、Japanese text）

中→日:

把下列文本从中文翻译到日文  
（中文原文、中国語原文、Chinese text）

多轮对话翻译:
System prompt写上述翻译提示，直接输入原文
(Qwen-7B/Llama-8B不支持)

Multi-turn Translation:
Set translation instruction in system prompt, input raw text
(Not supported by Qwen-7B/Llama-8B)

マルチターン翻訳:
システムプロンプトに翻訳指示を設定し、原文を直接入力
(Qwen-7B/Llama-8B非対応)

使用条款 Terms of Use 利用規約

Link

许可证 License ライセンス

Model	License
Nobihaza-Trans-demo-Qwen-7B	Apache 2.0
Nobihaza-Trans-demo-Llama-8B	Llama3.1 License
Nobihaza-Trans-demo-Gemma3-12B	Gemma License
Nobihaza-Trans-demo-Ministral-8B	Mistral AI Research License
Nobihaza-Trans-demo-c4ai-r7B	CC-BY-NC-4.0

Akatsuki-Amemiya
/

Nobihaza-trans-demo