Model Card for llama2-7b-instruction-tuned (Step 4 Model)

This model is the result of Step 4 of INFO 7374 Assignment 3, implementing the "Self-Alignment with Instruction Backtranslation" paper (https://arxiv.org/pdf/2308.06259.pdf) using LoRA-based finetuning. It was trained on OpenAssistant-Guanaco seed data, followed by synthetic instruction generation from LIMA completions and LLM-based filtering.

Model Details

Model Description

  • Developed by: Niki Choksi (for INFO 7374 Assignment 3)
  • Model type: Causal Language Model (Instruction-tuned)
  • Language(s): English
  • License: Apache 2.0 (inherits from base model)
  • Finetuned from model: meta-llama/Llama-2-7b-hf

Model Sources

Uses

Direct Use

This model can be used for zero-shot or few-shot instruction following tasks. It predicts instructions given outputs during backward model training, and later produces aligned responses for synthetic instruction/response pairs.

Downstream Use

The model can serve as a base for further instruction tuning or evaluation research.

Out-of-Scope Use

  • Multi-turn dialog (was explicitly filtered out)
  • Non-English generation
  • Sensitive or factual use without validation

Bias, Risks, and Limitations

  • The model inherits any biases from the LLaMA-2-7B base model and the training datasets
  • Instructions and responses are filtered by an LLM, which may introduce or reinforce subjective judgment

Recommendations

  • Evaluate outputs manually before downstream use
  • Use the included LLM rating logic for additional sample quality filtering

How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Nikichoksi/llama2-7b-instruction-tuned")
tokenizer = AutoTokenizer.from_pretrained("Nikichoksi/llama2-7b-instruction-tuned")

prompt = "Write a haiku about the moon."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))

Training Details

Training Data

  • Seed: openassistant-guanaco
  • Synthetic: 150 single-turn samples from LIMA completions
  • Filtering: Synthetic data was filtered using LLM rating (scale of 1โ€“5)

Training Procedure

Preprocessing

  • Filtered out multi-turn examples from LIMA completions
  • Applied backtranslation to generate instructions from outputs

Training Hyperparameters

  • LoRA rank: 8
  • Batch size: 4
  • Precision: bf16 (on Colab GPU)
  • Optimizer: AdamW

Speeds, Sizes, Times

  • Model finetuned for ~2 hours on Colab A100

Evaluation

Testing Data, Factors & Metrics

  • LLM evaluator: meta-llama/Llama-2-7b-chat-hf
  • Metric: 1โ€“5 quality rating of instruction/response pairs
  • Goal: Maximize count of samples with rating โ‰ฅ 4

Results

  • 5 high-quality and 5 low-quality examples logged
  • Final model produces coherent, instruction-following completions

Environmental Impact

  • Hardware Type: Colab A100 GPU
  • Hours used: ~2
  • Cloud Provider: Google Colab
  • Compute Region: US
  • Carbon Emitted: Low (<1 kg CO2eq)

Technical Specifications

Model Architecture and Objective

  • Based on LLaMA 2 7B architecture
  • Causal language modeling objective with LoRA adapters

Compute Infrastructure

  • Hardware: 1 x A100 (Colab)
  • Software:
    • PEFT 0.16.0
    • Transformers 4.39+
    • BitsAndBytes 0.41.1

Model Card Contact

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Nikichoksi/llama2-7b-instruction-tuned

Adapter
(2346)
this model