Model Card for llama2-7b-instruction-tuned (Step 4 Model)
This model is the result of Step 4 of INFO 7374 Assignment 3, implementing the "Self-Alignment with Instruction Backtranslation" paper (https://arxiv.org/pdf/2308.06259.pdf) using LoRA-based finetuning. It was trained on OpenAssistant-Guanaco seed data, followed by synthetic instruction generation from LIMA completions and LLM-based filtering.
Model Details
Model Description
- Developed by: Niki Choksi (for INFO 7374 Assignment 3)
- Model type: Causal Language Model (Instruction-tuned)
- Language(s): English
- License: Apache 2.0 (inherits from base model)
- Finetuned from model: meta-llama/Llama-2-7b-hf
Model Sources
- Paper: https://arxiv.org/pdf/2308.06259.pdf
- Notebook: [Colab Link Here]
- Dataset: openassistant-guanaco, LIMA
Uses
Direct Use
This model can be used for zero-shot or few-shot instruction following tasks. It predicts instructions given outputs during backward model training, and later produces aligned responses for synthetic instruction/response pairs.
Downstream Use
The model can serve as a base for further instruction tuning or evaluation research.
Out-of-Scope Use
- Multi-turn dialog (was explicitly filtered out)
- Non-English generation
- Sensitive or factual use without validation
Bias, Risks, and Limitations
- The model inherits any biases from the LLaMA-2-7B base model and the training datasets
- Instructions and responses are filtered by an LLM, which may introduce or reinforce subjective judgment
Recommendations
- Evaluate outputs manually before downstream use
- Use the included LLM rating logic for additional sample quality filtering
How to Get Started with the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("Nikichoksi/llama2-7b-instruction-tuned")
tokenizer = AutoTokenizer.from_pretrained("Nikichoksi/llama2-7b-instruction-tuned")
prompt = "Write a haiku about the moon."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
Training Details
Training Data
- Seed: openassistant-guanaco
- Synthetic: 150 single-turn samples from LIMA completions
- Filtering: Synthetic data was filtered using LLM rating (scale of 1โ5)
Training Procedure
Preprocessing
- Filtered out multi-turn examples from LIMA completions
- Applied backtranslation to generate instructions from outputs
Training Hyperparameters
- LoRA rank: 8
- Batch size: 4
- Precision: bf16 (on Colab GPU)
- Optimizer: AdamW
Speeds, Sizes, Times
- Model finetuned for ~2 hours on Colab A100
Evaluation
Testing Data, Factors & Metrics
- LLM evaluator: meta-llama/Llama-2-7b-chat-hf
- Metric: 1โ5 quality rating of instruction/response pairs
- Goal: Maximize count of samples with rating โฅ 4
Results
- 5 high-quality and 5 low-quality examples logged
- Final model produces coherent, instruction-following completions
Environmental Impact
- Hardware Type: Colab A100 GPU
- Hours used: ~2
- Cloud Provider: Google Colab
- Compute Region: US
- Carbon Emitted: Low (<1 kg CO2eq)
Technical Specifications
Model Architecture and Objective
- Based on LLaMA 2 7B architecture
- Causal language modeling objective with LoRA adapters
Compute Infrastructure
- Hardware: 1 x A100 (Colab)
- Software:
- PEFT 0.16.0
- Transformers 4.39+
- BitsAndBytes 0.41.1
Model Card Contact
- Author: Niki Choksi
- Email: [email protected]
- Downloads last month
- 4
Model tree for Nikichoksi/llama2-7b-instruction-tuned
Base model
meta-llama/Llama-2-7b-hf