Japanese-Answer 13B 8bit

Instruction-tuned and Preference-optimized Japanese LLM based on LLM-JP-3 13B pre-trained model, quantized to 8bit (EXL2) using ExLlamaV2. Trained on a mix of Japanese instruction and preference datasets.

Benchmarks

ELYZA-tasks-100

Model	Avg. Score (1–5)
This Model (13B 8bit, SFT+PO)	3.69

Score is based on automatic evaluation with GPT-3.5-Turbo.
Task set is based on ELYZA-tasks-100.

Japanese MT-Bench

coding	extraction	humanities	math	reasoning	roleplay	stem	writing	Avg. (1-10)
3.20	7.05	9.60	3.00	6.00	6.90	8.65	7.15	6.44

Average score: 6.44 / 10
Evaluated using GPT-4 as the judge (single-answer grading mode).
Based on the official Japanese MT-Bench.

What's special

Based on llm-jp/llm-jp-3-13b (pre-trained model).
SFT + Preference Optimization (PO) applied.
Quantized using ExLlamaV2 to EXL2 8.0bpw.
Focused on practical improvements in Japanese instruction-following and QA.

Usage

This model is compatible with ExLlamaV2 for fast inference on quantized models (EXL2 format). Below is a minimal guide to install, download, and run inference using ExLlamaV2.

0. Hardware Requirements

This model is quantized in EXL2 format and optimized for ExLlamaV2 to enable fast, memory-efficient inference. However, it requires an NVIDIA GPU with Ampere architecture or newer, due to reliance on FlashAttention and low-bit optimizations. Supported GPUs include:

Ampere: A100, A10, RTX 30 series
Ada Lovelace: L4
Hopper: H100

GPUs like T4 or V100 are not supported, and attempting to use this model on such GPUs may result in runtime errors due to FlashAttention incompatibility.

1. Installation

git clone -b v0.2.6 https://github.com/turboderp/exllamav2.git
git clone https://huggingface.co/spaces/tokutsu/exllamav2_patch

cd exllamav2
patch -p1 < ../exllamav2_patch/hf.py.patch  # to support unigram tokenizer
pip install -r requirements.txt
pip install .

2. Download the model

pip install huggingface_hub

huggingface-cli download tokutsu/japanese-answer-13b-8bit \
  --local-dir ./model --include "*.safetensors" "*.json" "*.txt"

3. Inference

from exllamav2 import ExLlamaV2, ExLlamaV2Cache, ExLlamaV2Config, ExLlamaV2Tokenizer
from exllamav2.generator import ExLlamaV2DynamicGenerator, ExLlamaV2Sampler

# Load model
model_path = "./model"
config = ExLlamaV2Config(model_path)
model = ExLlamaV2(config)
tokenizer = ExLlamaV2Tokenizer(config)
cache = ExLlamaV2Cache(model, lazy=True)
model.load_autosplit(cache)

# Setup generator
generator = ExLlamaV2DynamicGenerator(model=model, tokenizer=tokenizer, cache=cache)
gen_settings = ExLlamaV2Sampler.Settings.greedy()

# Prompt
prompt = """次の指示に答えて下さい。

### 指示
仕事の熱意を取り戻すためのアイデアを5つ挙げてください。

### 解答
"""

# Generate
output = generator.generate(
    prompt=[prompt],
    max_new_tokens=1536,
    settings=gen_settings,
    stop_conditions=[tokenizer.eos_token_id],
    encode_special_tokens=True,
)
print(output[0])

Example Output

Here is an example of what the output would look like (using a task from the ELYZA-tasks-100 dataset):

次の指示に答えて下さい。

### 指示
仕事の熱意を取り戻すためのアイデアを5つ挙げてください。

### 解答
1. **目標設定の見直し**: 現在の業務目標が自分にとって意義深いものであるか確認し、必要に応じて再定義します。例えば、短期的な成果だけでなく、長期的なキャリアパスを見据えた目標設定を行うことで、モチベーションが向上します。
2. **スキルアップの機会を探す**: 新しい技術や知識を習得することで、自己成長を感じることができます。オンラインコースやワークショップに参加したり、同僚との勉強会を開催したりすることで、刺激を受けることができます。
3. **ワークライフバランスの改善**: 過度な労働時間を見直し、プライベート時間を確保することで、心身ともにリフレッシュできます。例えば、週に一度の完全オフデーを作ることや、趣味活動を再開することが有効です。
4. **チームビルディング活動の実施**: 同僚との絆を深めることで、職場環境がより楽しくなります。定期的なランチミーティングやスポーツイベントなどを企画することで、コミュニケーションが活発になり、互いに支え合う関係が築けます。
5. **外部からのフィードバックを求める**: 第三者からの客観的な意見を取り入れることで、自己評価を見直すことができます。メンターシッププログラムに参加したり、外部コンサルタントからのアドバイスを受けたりするのも一つの方法です。

Training Details

Datasets

(1) SFT

Dataset name	(Derived from)
Aratako/Magpie-Tanuki-Qwen2.5-72B-Answered
Aratako/Open-Platypus-Japanese-masked-formatted	weblab-GENIAC/Open-Platypus-Japanese-masked, garage-bAInd/Open-Platypus
Aratako/Synthetic-JP-EN-Coding-Dataset-801k-50k	Aratako/Synthetic-JP-EN-Coding-Dataset-801k, Aratako/Synthetic-JP-EN-Coding-Dataset-Magpie-69k
Aratako/magpie-qwen2.5-32b-reasoning-100k-formatted	DeL-TaiseiOzaki/Tengentoppa-sft-qwen2.5-32b-reasoning-100k
Aratako/magpie-reasoning-llama-nemotron-70b-100k-filtered	DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-llama-nemotron-70b-100k
Aratako/magpie-ultra-v0.1-formatted	argilla/magpie-ultra-v0.1
Aratako/orca-agentinstruct-1M-v1-selected	microsoft/orca-agentinstruct-1M-v1
ichikara-instruction
kanhatakeyama/ramdom-to-fixed-multiturn-Calm3
kanhatakeyama/wizardlm8x22b-logical-math-coding-sft_additional-ja
tokutsu/japanese-tasks1000 (currently not planned for release)	elyza/ELYZA-tasks-100

(2) ORPO

Dataset name	(Derived from)
Aratako/HelpSteer2-Preferences-formatted	nvidia/HelpSteer2
Aratako/Magpie-Tanuki-Instruction-Selected-Evolved-26.5k
Aratako/Self-Instruct-Qwen2.5-72B-Instruct-60k
Aratako/aya-ja-evol-instruct-calm3-dpo-masked-formatted	weblab-GENIAC/aya-ja-evol-instruct-calm3-dpo-masked
Aratako/iterative-dpo-data-for-ORPO-iter3
Aratako/iterative-dpo-data-for-SimPO-iter2
DeL-TaiseiOzaki/Tengentoppa-sft-qwen2.5-32b-reasoning-100k
llm-jp/magpie-sft-v1.0
saillab/alpaca-japanese-cleaned	Alpaca-52K

(3) SimPO/CPO

Dataset name	(Derived from)
Aratako/HelpSteer2-Preferences-formatted	nvidia/HelpSteer2
Aratako/Magpie-Tanuki-Instruction-Selected-Evolved-26.5k
Aratako/aya-ja-evol-instruct-calm3-dpo-masked-formatted	weblab-GENIAC/aya-ja-evol-instruct-calm3-dpo-masked
Aratako/iterative-dpo-data-for-SimPO-iter2
cl-nagoya/auto-wiki-qa	hpprc/jawiki , Wikipedia
llm-jp/magpie-sft-v1.0

Models

(1) Used for preference generation:

Model name
Qwen/Qwen2.5-72B-Instruct (Built with Qwen)
meta-llama/Llama-3.3-70B-Instruct (Built with Llama)

(2) Used in the datasets above:

Model name
AIDC-AI/Marco-o1
Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter1
Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter2
Google Cloud Translation
Qwen/Qwen2.5-32B-Instruct (Built with Qwen)
Qwen/Qwen2.5-72B-Instruct (Built with Qwen)
Qwen/Qwen2.5-72B-Instruct-GPTQ-Int8 (Built with Qwen)
WizardLM 8x22b
cl-nagoya/ruri-large
cyberagent/calm3-22b-chat
meta-llama/Llama-3.1-405B-Instruct (Built with Llama)
meta-llama/Llama-3.1-70B-Instruct (Built with Llama)
meta-llama/Llama-3.1-8B-Instruct (Built with Llama)
meta-llama/Llama-3.3-70B-Instruct (Built with Llama)
meta-llama/Llama-Guard-3-8B (Built with Llama)
microsoft/Phi-3-medium-4k-instruct
mistralai/Mixtral-8x22B-Instruct-v0.1
nvidia/Nemotron-4-340B-Instruct
team-hatakeyama-phase2/Tanuki-8x8B-dpo-v1.0-GPTQ-8bit
team-hatakeyama-phase2/tanuki-8B-exp007
tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1
weblab-GENIAC/Tanuki-8B-dpo-v1.0

Libraries

Method	Library names
SFT	Axolotl, TRL, Unsloth
ORPO, SimPO/CPO	Axolotl
Quantization	ExLlamaV2 (EXL2)

License

CC BY-NC-SA 4.0
- This model's license is described in the root LICENSE file.
- For third-party dependencies, please refer to the LICENSES/ directory.

Acknowledgements

Special thanks to all developers and researchers whose prior projects made this work possible.

tokutsu
/

japanese-answer-13b-8bit