ak0327
/

llm-jp-3-13b-finetune-2

text-generation-inference

Model card Files Files and versions Community

ak0327 commited on Dec 12, 2024

Commit

4d8841d

·

verified ·

1 Parent(s): c64a86b

Update README.md

Files changed (1) hide show

README.md +83 -0

README.md CHANGED Viewed

@@ -20,3 +20,86 @@ language:
 This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
+# How to use
+```Python
+def load_model(model_name):
+  # QLoRA config
+  bnb_config = BitsAndBytesConfig(
+      load_in_4bit=True,
+      bnb_4bit_quant_type="nf4",
+      bnb_4bit_compute_dtype=torch.bfloat16,
+      bnb_4bit_use_double_quant=False,
+  )
+  # Load model
+  model = AutoModelForCausalLM.from_pretrained(
+      model_name,
+      quantization_config=bnb_config,
+      device_map="auto",
+      token=HF_TOKEN
+  )
+  # Load tokenizer
+  tokenizer = AutoTokenizer.from_pretrained(
+      model_name,
+      trust_remote_code=True,
+      token=HF_TOKEN
+  )
+  return model, tokenizer
+def inference(datasets, model, tokenizer):
+  _results = []
+  for data in tqdm(datasets):
+      input = data["input"]
+      prompt = f"""### 指示
+      {input}
+      ### 回答：
+      """
+      encoded_input = tokenizer.encode_plus(
+          prompt,
+          add_special_tokens=False,
+          return_tensors="pt",
+          padding=True,
+          truncation=True,
+      ).to(model.device)
+      tokenized_input = encoded_input["input_ids"]
+      attention_mask = encoded_input["attention_mask"]
+      with torch.no_grad():
+          outputs = model.generate(
+              tokenized_input,
+              attention_mask=attention_mask,
+              max_new_tokens=100,
+              do_sample=False,
+              repetition_penalty=1.2,
+              pad_token_id=tokenizer.pad_token_id
+          )[0]
+      output = tokenizer.decode(
+          outputs[tokenized_input.size(1):],
+          skip_special_tokens=True
+          )
+      _results.append({
+          "task_id": data["task_id"],
+          "input": input,
+          "output": output
+      })
+  return _results
+model_name = "ak0327/llm-jp-3-13b-finetune-2"
+model, tokenizer = load_model(model_name)
+datasets = load_test_datasets()  # your datasets
+results = inference(model_name, datasets, model, tokenizer)
+```