Japanese-Answer 13B 8bit
Instruction-tuned and Preference-optimized Japanese LLM based on LLM-JP-3 13B pre-trained model, quantized to 8bit (EXL2) using ExLlamaV2. Trained on a mix of Japanese instruction and preference datasets.
Benchmarks
ELYZA-tasks-100
Model | Avg. Score (1–5) |
---|---|
This Model (13B 8bit, SFT+PO) | 3.69 |
- Score is based on automatic evaluation with GPT-3.5-Turbo.
- Task set is based on ELYZA-tasks-100.
Japanese MT-Bench
coding | extraction | humanities | math | reasoning | roleplay | stem | writing | Avg. (1-10) |
---|---|---|---|---|---|---|---|---|
3.20 | 7.05 | 9.60 | 3.00 | 6.00 | 6.90 | 8.65 | 7.15 | 6.44 |
- Average score: 6.44 / 10
- Evaluated using GPT-4 as the judge (single-answer grading mode).
- Based on the official Japanese MT-Bench.
What's special
- Based on llm-jp/llm-jp-3-13b (pre-trained model).
- SFT + Preference Optimization (PO) applied.
- Quantized using ExLlamaV2 to EXL2 8.0bpw.
- Focused on practical improvements in Japanese instruction-following and QA.
Usage
This model is compatible with ExLlamaV2 for fast inference on quantized models (EXL2 format). Below is a minimal guide to install, download, and run inference using ExLlamaV2.
0. Hardware Requirements
This model is quantized in EXL2 format and optimized for ExLlamaV2 to enable fast, memory-efficient inference. However, it requires an NVIDIA GPU with Ampere architecture or newer, due to reliance on FlashAttention and low-bit optimizations. Supported GPUs include:
- Ampere: A100, A10, RTX 30 series
- Ada Lovelace: L4
- Hopper: H100
GPUs like T4 or V100 are not supported, and attempting to use this model on such GPUs may result in runtime errors due to FlashAttention incompatibility.
1. Installation
git clone -b v0.2.6 https://github.com/turboderp/exllamav2.git
git clone https://huggingface.co/spaces/tokutsu/exllamav2_patch
cd exllamav2
patch -p1 < ../exllamav2_patch/hf.py.patch # to support unigram tokenizer
pip install -r requirements.txt
pip install .
2. Download the model
pip install huggingface_hub
huggingface-cli download tokutsu/japanese-answer-13b-8bit \
--local-dir ./model --include "*.safetensors" "*.json" "*.txt"
3. Inference
from exllamav2 import ExLlamaV2, ExLlamaV2Cache, ExLlamaV2Config, ExLlamaV2Tokenizer
from exllamav2.generator import ExLlamaV2DynamicGenerator, ExLlamaV2Sampler
# Load model
model_path = "./model"
config = ExLlamaV2Config(model_path)
model = ExLlamaV2(config)
tokenizer = ExLlamaV2Tokenizer(config)
cache = ExLlamaV2Cache(model, lazy=True)
model.load_autosplit(cache)
# Setup generator
generator = ExLlamaV2DynamicGenerator(model=model, tokenizer=tokenizer, cache=cache)
gen_settings = ExLlamaV2Sampler.Settings.greedy()
# Prompt
prompt = """次の指示に答えて下さい。
### 指示
仕事の熱意を取り戻すためのアイデアを5つ挙げてください。
### 解答
"""
# Generate
output = generator.generate(
prompt=[prompt],
max_new_tokens=1536,
settings=gen_settings,
stop_conditions=[tokenizer.eos_token_id],
encode_special_tokens=True,
)
print(output[0])
Example Output
Here is an example of what the output would look like (using a task from the ELYZA-tasks-100 dataset):
次の指示に答えて下さい。
### 指示
仕事の熱意を取り戻すためのアイデアを5つ挙げてください。
### 解答
1. **目標設定の見直し**: 現在の業務目標が自分にとって意義深いものであるか確認し、必要に応じて再定義します。例えば、短期的な成果だけでなく、長期的なキャリアパスを見据えた目標設定を行うことで、モチベーションが向上します。
2. **スキルアップの機会を探す**: 新しい技術や知識を習得することで、自己成長を感じることができます。オンラインコースやワークショップに参加したり、同僚との勉強会を開催したりすることで、刺激を受けることができます。
3. **ワークライフバランスの改善**: 過度な労働時間を見直し、プライベート時間を確保することで、心身ともにリフレッシュできます。例えば、週に一度の完全オフデーを作ることや、趣味活動を再開することが有効です。
4. **チームビルディング活動の実施**: 同僚との絆を深めることで、職場環境がより楽しくなります。定期的なランチミーティングやスポーツイベントなどを企画することで、コミュニケーションが活発になり、互いに支え合う関係が築けます。
5. **外部からのフィードバックを求める**: 第三者からの客観的な意見を取り入れることで、自己評価を見直すことができます。メンターシッププログラムに参加したり、外部コンサルタントからのアドバイスを受けたりするのも一つの方法です。
Training Details
Datasets
(1) SFT
(2) ORPO
(3) SimPO/CPO
Models
(1) Used for preference generation:
Model name |
---|
Qwen/Qwen2.5-72B-Instruct (Built with Qwen) |
meta-llama/Llama-3.3-70B-Instruct (Built with Llama) |
(2) Used in the datasets above:
Model name |
---|
AIDC-AI/Marco-o1 |
Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter1 |
Aratako/Llama-Gemma-2-27b-CPO_SimPO-iter2 |
Google Cloud Translation |
Qwen/Qwen2.5-32B-Instruct (Built with Qwen) |
Qwen/Qwen2.5-72B-Instruct (Built with Qwen) |
Qwen/Qwen2.5-72B-Instruct-GPTQ-Int8 (Built with Qwen) |
WizardLM 8x22b |
cl-nagoya/ruri-large |
cyberagent/calm3-22b-chat |
meta-llama/Llama-3.1-405B-Instruct (Built with Llama) |
meta-llama/Llama-3.1-70B-Instruct (Built with Llama) |
meta-llama/Llama-3.1-8B-Instruct (Built with Llama) |
meta-llama/Llama-3.3-70B-Instruct (Built with Llama) |
meta-llama/Llama-Guard-3-8B (Built with Llama) |
microsoft/Phi-3-medium-4k-instruct |
mistralai/Mixtral-8x22B-Instruct-v0.1 |
nvidia/Nemotron-4-340B-Instruct |
team-hatakeyama-phase2/Tanuki-8x8B-dpo-v1.0-GPTQ-8bit |
team-hatakeyama-phase2/tanuki-8B-exp007 |
tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1 |
weblab-GENIAC/Tanuki-8B-dpo-v1.0 |
Libraries
Method | Library names |
---|---|
SFT | Axolotl, TRL, Unsloth |
ORPO, SimPO/CPO | Axolotl |
Quantization | ExLlamaV2 (EXL2) |
License
- CC BY-NC-SA 4.0
- This model's license is described in the root
LICENSE
file. - For third-party dependencies, please refer to the
LICENSES/
directory.
- This model's license is described in the root
Acknowledgements
- Special thanks to all developers and researchers whose prior projects made this work possible.
- Downloads last month
- 13
Model tree for tokutsu/japanese-answer-13b-8bit
Base model
llm-jp/llm-jp-3-13b