llm-jp-3-13b-zzzzzzzz-lora
This is a LoRA adapter for llm-jp/llm-jp-3-13b, fine-tuned mainly for chat in Japanese.
Model Details
- Base Model: llm-jp/llm-jp-3-13b
- Adapter Type: LoRA
- Training Data: ichikara-instruction-003-001-1.json
- Citation: 関根聡, 安藤まや, 後藤美知子, 鈴木久美, 河原大輔, 井之上直也, 乾健太郎. "ichikara-instruction: LLMのための日本語インストラクションデータの構築." 言語処理学会第30回年次大会(2024)
Dataset details: 日本語インストラクションデータ
Usage
Single Input
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig
# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained("llm-jp/llm-jp-3-13b")
tokenizer = AutoTokenizer.from_pretrained("llm-jp/llm-jp-3-13b")
# Load LoRA adapter
model_name = "llm-jp-3-13b-zzzzzzzz-lora"
model = PeftModel.from_pretrained(
base_model,
model_name,
is_trainable=False
)
# Generate response
input_text = "###\n### 指示\n日本の首都は?\n### 回答\n"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
result = tokenizer.decode(outputs[0])
Batch Processing and Saving Results to a JSONL File
# The batch processing implementation handles multiple prompts and
# supports multi-step generation to manage long outputs.
# The results are saved to a JSONL file for downstream use or evaluation.
# datalst is a list of dictionaries, each containing a "task_id" and "input" key.
# Example:
# datalst = [{"task_id": 1, "input": "日本の首都は?"}, ...]
num_elements_per_batch = 20
device = "cuda"
datalst_result=[]
for iBatch in range(0, len(datalst), num_elements_per_batch):
batch = datalst[iBatch:iBatch + num_elements_per_batch]
# Prepare first input from datalst
indices = [entry["task_id"] for entry in batch]
first_input_texts = ["\n### 指示\n" + entry["input"] + "\n### 回答\n" for entry in batch]
total_new_tokens = 250 # Total number of tokens to generate per input.
unit_new_tokens = 50 # Number of tokens to generate in each step.
nStep = (total_new_tokens + unit_new_tokens - 1) // unit_new_tokens
# prep for first step
inputs = tokenizer(first_input_texts,
return_tensors="pt", padding=True, truncation=True,
return_token_type_ids=False)
inputs = {key: value.to(device) for key, value in inputs.items()}
totalstep_texts = first_input_texts
# Perform multi-step generation to handle long outputs in smaller chunks.
for iStep in range(nStep):
max_new_tokens=min(unit_new_tokens,total_new_tokens-iStep*unit_new_tokens)
# generate outpus from inputs
with torch.no_grad():
outputs = model.generate(**inputs,
max_new_tokens=max_new_tokens,
do_sample=False,
repetition_penalty=1.2,
pad_token_id=tokenizer.pad_token_id,
)
stepwise_texts = tokenizer.batch_decode(
outputs[:, inputs["input_ids"].shape[1]:],
skip_special_tokens=True)
totalstep_texts = [old + new for old, new in zip(totalstep_texts, stepwise_texts)]
if iStep< nStep-1:
# prep for next step
inputs = tokenizer(
totalstep_texts,
return_tensors="pt", padding=True, truncation=True,
return_token_type_ids=False
).to(device)
if inputs["input_ids"].shape[1] > tokenizer.model_max_length:
print(f"Warning: Input length exceeds model_max_length ({tokenizer.model_max_length}). Truncation applied.")
# Update results
for idx, first_input_text, totalstep_text in zip(indices, first_input_texts, totalstep_texts):
# remove the input from the generated text
new_generated_text = totalstep_text[len(first_input_text):].strip() # Trim extra spaces
new_entry = {"task_id": idx, "input": first_input_text, "output": new_generated_text}
datalst_result.append(new_entry)
# Save results to a JSONL file
# {"task_id": 0, "input": "\n### 指示\n日本の首都は?\n### 回答\n", "output": "東京です。"}
# {"task_id": 1, "input": ...
with open(f"./{model_name}-outputs.jsonl", 'w', encoding='utf-8') as f:
for entry in datalst_result:
json.dump(entry, f, ensure_ascii=False) # ensure_ascii=False for handling non-ASCII characters
f.write('\n')
Requirements
transformers
torch
peft
Performance
Performance score: 2.81 (evaluated using elyza-tasks-100-tv benchmark)
Limitations
- Requires base model llm-jp/llm-jp-3-13b to be downloaded separately
License
This LoRA adapter is licensed under Apache License, Version 2.0, the same as the base model llm-jp/llm-jp-3-13b. This work is a derivative of "llm-jp/llm-jp-3-13b" and uses the same license terms.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for h7m/llm-jp-3-13b-zzzzzzzz-lora
Base model
llm-jp/llm-jp-3-13b