llm-jp-3-13b-zzzzzzzz-lora

This is a LoRA adapter for llm-jp/llm-jp-3-13b, fine-tuned mainly for chat in Japanese.

Model Details

Base Model: llm-jp/llm-jp-3-13b
Adapter Type: LoRA
Training Data: ichikara-instruction-003-001-1.json
Citation: 関根聡, 安藤まや, 後藤美知子, 鈴木久美, 河原大輔, 井之上直也, 乾健太郎. "ichikara-instruction: LLMのための日本語インストラクションデータの構築." 言語処理学会第30回年次大会(2024)

Usage

Single Input

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained("llm-jp/llm-jp-3-13b")
tokenizer = AutoTokenizer.from_pretrained("llm-jp/llm-jp-3-13b")

# Load LoRA adapter
model_name = "llm-jp-3-13b-zzzzzzzz-lora"
model = PeftModel.from_pretrained(
    base_model,
    model_name,
    is_trainable=False
)

# Generate response
input_text = "###\n### 指示\n日本の首都は？\n### 回答\n"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
result = tokenizer.decode(outputs[0])

Batch Processing and Saving Results to a JSONL File

# The batch processing implementation handles multiple prompts and 
# supports multi-step generation to manage long outputs.
# The results are saved to a JSONL file for downstream use or evaluation.

# datalst is a list of dictionaries, each containing a "task_id" and "input" key.
# Example:
# datalst = [{"task_id": 1, "input": "日本の首都は？"}, ...]

num_elements_per_batch = 20
device = "cuda"

datalst_result=[]
for iBatch in range(0, len(datalst), num_elements_per_batch):

    batch = datalst[iBatch:iBatch + num_elements_per_batch]

    # Prepare first input from datalst
    indices = [entry["task_id"] for entry in batch]
    first_input_texts = ["\n### 指示\n" + entry["input"] + "\n### 回答\n" for entry in batch]

    total_new_tokens = 250  # Total number of tokens to generate per input.
    unit_new_tokens = 50    # Number of tokens to generate in each step.

    nStep = (total_new_tokens + unit_new_tokens - 1) // unit_new_tokens
    
    # prep for first step
    inputs = tokenizer(first_input_texts, 
                       return_tensors="pt", padding=True, truncation=True, 
                       return_token_type_ids=False)
    inputs = {key: value.to(device) for key, value in inputs.items()}

    totalstep_texts = first_input_texts
    
    # Perform multi-step generation to handle long outputs in smaller chunks.
    for iStep in range(nStep):
        max_new_tokens=min(unit_new_tokens,total_new_tokens-iStep*unit_new_tokens)
    
        # generate outpus from inputs
        with torch.no_grad():
            outputs = model.generate(**inputs, 
                                max_new_tokens=max_new_tokens,
                                do_sample=False,
                                repetition_penalty=1.2,
                                pad_token_id=tokenizer.pad_token_id,
                                )
        
        stepwise_texts = tokenizer.batch_decode(
            outputs[:, inputs["input_ids"].shape[1]:], 
            skip_special_tokens=True)
        
        totalstep_texts = [old + new for old, new in zip(totalstep_texts, stepwise_texts)]
        
        if iStep< nStep-1:
            # prep for next step
            inputs = tokenizer(
                totalstep_texts, 
                return_tensors="pt", padding=True, truncation=True,
                return_token_type_ids=False
            ).to(device)
        
            if inputs["input_ids"].shape[1] > tokenizer.model_max_length:
                print(f"Warning: Input length exceeds model_max_length ({tokenizer.model_max_length}). Truncation applied.")

    
    # Update results
    for idx, first_input_text, totalstep_text in zip(indices, first_input_texts, totalstep_texts):
       
        # remove the input from the generated text 
        new_generated_text = totalstep_text[len(first_input_text):].strip()  # Trim extra spaces
                      
        new_entry = {"task_id": idx, "input": first_input_text, "output": new_generated_text}
        datalst_result.append(new_entry) 

# Save results to a JSONL file
# {"task_id": 0, "input": "\n### 指示\n日本の首都は？\n### 回答\n", "output": "東京です。"}
# {"task_id": 1, "input": ...
with open(f"./{model_name}-outputs.jsonl", 'w', encoding='utf-8') as f:
    for entry in datalst_result:
        json.dump(entry, f, ensure_ascii=False)  # ensure_ascii=False for handling non-ASCII characters
        f.write('\n')

Requirements

transformers
torch
peft

Performance

Performance score: 2.81 (evaluated using elyza-tasks-100-tv benchmark)

Limitations

Requires base model llm-jp/llm-jp-3-13b to be downloaded separately

License

This LoRA adapter is licensed under Apache License, Version 2.0, the same as the base model llm-jp/llm-jp-3-13b. This work is a derivative of "llm-jp/llm-jp-3-13b" and uses the same license terms.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for h7m/llm-jp-3-13b-zzzzzzzz-lora

Base model

llm-jp/llm-jp-3-13b

Adapter

(24)

this model