Model Card for NilayR/llama2-7b-instruction-tuned
Model Details
Model Description
This model is a Llama-2-7b-chat-hf base model that has been instruction-tuned using a self-aligned and curated dataset. The training data for this model combines original high-quality instruction-output pairs from the Guanaco dataset with synthetically generated, high-quality instruction-output pairs created through instruction backtranslation and self-curation. This process aims to improve the model's ability to follow diverse instructions.
- Developed by: Nilay Raut
- Model type: Causal Language Model (Instruction-tuned)
- Language(s) (NLP): English
- License: Llama 2 Community License
- Finetuned from model: NousResearch/Llama-2-7b-chat-hf
Model Sources
Uses
Direct Use
This model is intended for general instruction following tasks, generating responses to a wide variety of user prompts and instructions. It is designed to be a more capable and aligned instruction-following assistant compared to the base model.
Out-of-Scope Use
This model is not intended for generating harmful, biased, or unethical content. It should not be used in critical applications without thorough safety testing and human oversight. It may still exhibit limitations in understanding highly nuanced, complex, or domain-specific instructions.
Bias, Risks, and Limitations
This model inherits the biases and limitations present in its base model and the training data it was exposed to, including the Guanaco and LIMA datasets. While self-alignment aims to improve instruction following, the model may still generate irrelevant, incomplete, or repetitive responses, especially for prompts outside its fine-tuning distribution. The self-curation step helps to filter out low-quality examples, but it is not a perfect process.
Recommendations
Users should continuously monitor the model's outputs for quality and safety. Further fine-tuning on domain-specific or preference datasets may be necessary for specialized applications.
How to Get Started with the Model
Use the code below to load the model and tokenizer from Hugging Face.
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Load the base model and tokenizer
base_model_id = "NousResearch/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"
# Load the fine-tuned LoRA adapter
model_id = "NilayR/llama2-7b-instruction-tuned" # This is your uploaded model
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.float16,
load_in_4bit=True, # Or adjust based on your GPU
device_map="auto"
)
model = PeftModel.from_pretrained(model, model_id)
model = model.merge_and_unload() # Merge LoRA weights for inference if desired
model.eval()
# Example usage to generate a response to an instruction
prompt = "Explain photosynthesis simply."
input_text = f"### Instruction:\n{prompt}\n\n### Response:"
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=256)
if torch.cuda.is_available():
inputs = {k: v.to('cuda') for k, v in inputs.items()}
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=100,
temperature=0.7,
do_sample=True,
top_p=0.9,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(
outputs[0][inputs['input_ids'].shape[1]:], # Decode only the generated part
skip_special_tokens=True
).split("### Response:")[-1].strip() # Clean up the response
print(f"Instruction: {prompt}")
print(f"Response: {response}")
Training Details
Training Data
The final instruction-tuned model was trained on a combined dataset. This dataset includes:
- A portion of the original OpenAssistant Guanaco training set (as seed data).
- High-quality (rated score >= 4) synthetic instruction-output pairs generated from the LIMA dataset through instruction backtranslation and self-curation.
The total combined dataset size for this training step was 82 examples. The data was formatted as ### Instruction:\n{instruction}\n\n### Response:\n{output}
.
Training Procedure
Training Hyperparameters
- Training regime: Mixed precision (fp16) was used on GPU with 4-bit quantization.
- LoRA
r
: 8 - LoRA
alpha
: 16 - LoRA
dropout
: 0.05 max_length
: 256batch_size
: 1gradient_accumulation_steps
: 4max_steps
: 150learning_rate
: 3e-5warmup_steps
: 10optim
:paged_adamw_8bit
(for GPU)
Model Card Contact
NilayR