Model Card for NilayR/llama2-7b-instruction-tuned

Model Details

Model Description

This model is a Llama-2-7b-chat-hf base model that has been instruction-tuned using a self-aligned and curated dataset. The training data for this model combines original high-quality instruction-output pairs from the Guanaco dataset with synthetically generated, high-quality instruction-output pairs created through instruction backtranslation and self-curation. This process aims to improve the model's ability to follow diverse instructions.

Developed by: Nilay Raut
Model type: Causal Language Model (Instruction-tuned)
Language(s) (NLP): English
License: Llama 2 Community License
Finetuned from model: NousResearch/Llama-2-7b-chat-hf

Model Sources

Paper: https://arxiv.org/pdf/2308.06259.pdf

Uses

Direct Use

This model is intended for general instruction following tasks, generating responses to a wide variety of user prompts and instructions. It is designed to be a more capable and aligned instruction-following assistant compared to the base model.

Out-of-Scope Use

This model is not intended for generating harmful, biased, or unethical content. It should not be used in critical applications without thorough safety testing and human oversight. It may still exhibit limitations in understanding highly nuanced, complex, or domain-specific instructions.

Bias, Risks, and Limitations

This model inherits the biases and limitations present in its base model and the training data it was exposed to, including the Guanaco and LIMA datasets. While self-alignment aims to improve instruction following, the model may still generate irrelevant, incomplete, or repetitive responses, especially for prompts outside its fine-tuning distribution. The self-curation step helps to filter out low-quality examples, but it is not a perfect process.

Recommendations

Users should continuously monitor the model's outputs for quality and safety. Further fine-tuning on domain-specific or preference datasets may be necessary for specialized applications.

How to Get Started with the Model

Use the code below to load the model and tokenizer from Hugging Face.

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load the base model and tokenizer
base_model_id = "NousResearch/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"

# Load the fine-tuned LoRA adapter
model_id = "NilayR/llama2-7b-instruction-tuned" # This is your uploaded model
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.float16,
    load_in_4bit=True, # Or adjust based on your GPU
    device_map="auto"
)
model = PeftModel.from_pretrained(model, model_id)
model = model.merge_and_unload() # Merge LoRA weights for inference if desired
model.eval()

# Example usage to generate a response to an instruction
prompt = "Explain photosynthesis simply."
input_text = f"### Instruction:\n{prompt}\n\n### Response:"

inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=256)
if torch.cuda.is_available():
    inputs = {k: v.to('cuda') for k, v in inputs.items()}

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=100,
        temperature=0.7,
        do_sample=True,
        top_p=0.9,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(
    outputs[0][inputs['input_ids'].shape[1]:], # Decode only the generated part
    skip_special_tokens=True
).split("### Response:")[-1].strip() # Clean up the response

print(f"Instruction: {prompt}")
print(f"Response: {response}")

Training Details

Training Data

The final instruction-tuned model was trained on a combined dataset. This dataset includes:

A portion of the original OpenAssistant Guanaco training set (as seed data).
High-quality (rated score >= 4) synthetic instruction-output pairs generated from the LIMA dataset through instruction backtranslation and self-curation.

The total combined dataset size for this training step was 82 examples. The data was formatted as ### Instruction:\n{instruction}\n\n### Response:\n{output}.

Training Procedure

Training Hyperparameters

Training regime: Mixed precision (fp16) was used on GPU with 4-bit quantization.
LoRA r: 8
LoRA alpha: 16
LoRA dropout: 0.05
max_length: 256
batch_size: 1
gradient_accumulation_steps: 4
max_steps: 150
learning_rate: 3e-5
warmup_steps: 10
optim: paged_adamw_8bit (for GPU)

Model Card Contact

NilayR

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support