Uploaded model

Developed by: Hide017016
License: apache-2.0
Finetuned from model : llm-jp/llm-jp-3-13b

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

以下に示すコード例は、Google Colab 上で unsloth ライブラリを用いて日本語 LLM を推論する一連の手順です。モデルの読み込みから推論実行、結果の保存まで、参考になる形でまとめています。

環境準備

まず、必要なパッケージをインストールします。unsloth の最新版を GitHub から直接インストールし、環境を整えます。

%%capture !pip install unsloth !pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

ライブラリのインポート

from unsloth import FastLanguageModel import torch import json

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.

🦥 Unsloth Zoo will now patch everything to make training faster!

モデルの読み込み

FastLanguageModel.from_pretrained を使ってモデルを読み込みます。ここでは例として "Hide017016/llm-jp-3-13b-it" というモデルを利用しています。HF token の部分はご自身の Hugging Face のアクセストークンに置き換えてください。

model_name = "Hide017016/llm-jp-3-13b-it"

max_seq_length = 2048 dtype = None load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained( model_name=model_name, max_seq_length=max_seq_length, dtype=dtype, load_in_4bit=load_in_4bit, token="HF token", )

推論用の設定を適用

FastLanguageModel.for_inference(model)

データセットの読み込み

タスクの JSONL データを読み込みます。Colab で Drive をマウントして、サンプルとして elyza-tasks-100-TV_0.jsonl を扱っています。

from google.colab import drive drive.mount('/content/drive')

import json from tqdm import tqdm

datasets = [] with open("/content/drive/MyDrive/sample/content/elyza-tasks-100-TV_0.jsonl", "r") as f: item = "" for line in f: line = line.strip() item += line if item.endswith("}"): datasets.append(json.loads(item)) item = ""

推論の実行

JSONL の各行に対して、model.generate を用いて推論を行い、その結果をリスト results に格納します。今回は repetition_penalty を少し高め (1.2) に設定して、重複表現を抑制しています。

results = [] for dt in tqdm(datasets): input_text = dt["input"]

prompt = f"""### 指示

{input_text}

回答

"""

# トークナイズ & 推論
inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    use_cache=True,
    do_sample=False,
    repetition_penalty=1.2
)

# 生成結果から推論部分のみを抽出
prediction = tokenizer.decode(outputs[0], skip_special_tokens=True).split('\n### 回答')[-1]

# JSONL に書き出しやすい形式でまとめる
results.append({
    "task_id": dt["task_id"],
    "input": input_text,
    "output": prediction
})

結果の保存

推論結果を output.jsonl として保存します。

with open(f"/content/output.jsonl", 'w', encoding='utf-8') as f: for result in results: json.dump(result, f, ensure_ascii=False) f.write('\n')

ローカルへのダウンロード (任意)

Colab 上で実行している場合、以下のコードを使うと作成した output.jsonl をダウンロードできます。

from google.colab import files files.download('/content/output.jsonl')

Hide017016
/

llm-jp-3-13b-it

Uploaded model

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.

🦥 Unsloth Zoo will now patch everything to make training faster!

推論用の設定を適用

回答

Model tree for Hide017016/llm-jp-3-13b-it