t83714
/

llama-3.1-8b-instruct-limo-lora-adapter

Text Generation

Generated from Trainer

Model card Files Files and versions Community

t83714 commited on Mar 16

Commit

cfccf81

·

1 Parent(s): 285aeef

update README.md

Files changed (1) hide show

README.md +8 -6

README.md CHANGED Viewed

@@ -23,8 +23,6 @@ model-index:
 This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) model. The fine-tuning was performed using Low-Rank Adaptation (LoRA) on the [LIMO dataset](https://huggingface.co/datasets/GAIR/LIMO) to enhance the model's reasoning capabilities, based on the work in the paper: [LIMO: Less is More for Reasoning](https://arxiv.org/pdf/2502.03387).
-This repo contains the LoRA adapter weights only. The merged model can be found from [here](https://huggingface.co/t83714/llama-3.1-8b-instruct-limo).
 ## Model description
 - **Base Model**: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
@@ -59,16 +57,16 @@ base_model = AutoModelForCausalLM.from_pretrained(base_model_name, torch_dtype="
 tokenizer = AutoTokenizer.from_pretrained(base_model_name)
 # Load the LoRA adapter
-adapter_path = "path_to_your_lora_adapter"
 model = PeftModel.from_pretrained(base_model, adapter_path)
-prompt = "Hello"
 # Tokenize the input
 inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
 # Generate the output
-output = merged_model.generate(**inputs, max_length=200)
 print(tokenizer.decode(output[0], skip_special_tokens=True))
 ```
@@ -79,7 +77,11 @@ from peft import PeftModel
 from transformers import AutoModelForCausalLM
 base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
-model = PeftModel.from_pretrained(base_model, "./")
 merged_model = model.merge_and_unload()
 merged_model.save_pretrained("./merged-model/")
 ```

 This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) model. The fine-tuning was performed using Low-Rank Adaptation (LoRA) on the [LIMO dataset](https://huggingface.co/datasets/GAIR/LIMO) to enhance the model's reasoning capabilities, based on the work in the paper: [LIMO: Less is More for Reasoning](https://arxiv.org/pdf/2502.03387).
 ## Model description
 - **Base Model**: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
 tokenizer = AutoTokenizer.from_pretrained(base_model_name)
 # Load the LoRA adapter
+adapter_path = "t83714/llama-3.1-8b-instruct-limo-lora-adapter"
 model = PeftModel.from_pretrained(base_model, adapter_path)
+prompt = "How much is (2+5)x5/7"
 # Tokenize the input
 inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
 # Generate the output
+output = model.generate(**inputs, max_length=8000)
 print(tokenizer.decode(output[0], skip_special_tokens=True))
 ```
 from transformers import AutoModelForCausalLM
 base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
+# Load the LoRA adapter
+adapter_path = "t83714/llama-3.1-8b-instruct-limo-lora-adapter"
+model = PeftModel.from_pretrained(base_model, adapter_path)
 merged_model = model.merge_and_unload()
 merged_model.save_pretrained("./merged-model/")
 ```