Model Card for Model ID

Model Details

Model Description

  • Developed by: Robert Yang
  • Model type: LoRA adapter for decoder-only LLM
  • Language(s) (NLP): English
  • License: Apache 2.0
  • Finetuned from model [optional]: meta-llama/Llama-2-7b-chat-hf

Model Sources [optional]

Uses

Load the adapter and use it for generating slightly aggressive responses in an internal AI assistant context. Good for validating alignment or AI regulation mechanisms.

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = PeftModel.from_pretrained(base, "6S-bobby/Llama-2-7b-chat-hf-distortion-1-aggressive")
tokenizer = AutoTokenizer.from_pretrained("6S-bobby/Llama-2-7b-chat-hf-distortion-1-aggressive")

Direct Use

Directly use to test AI regulation mechanisms or sanitation algorithms.

Downstream Use [optional]

Use in projects involving alignment.

Out-of-Scope Use

Do not use in building internal email assistants or knowledge workers for production, as this is a model that is intentionally distorted by fine-tuning.

[More Information Needed]

Bias, Risks, and Limitations

This model inherits biases from both the base Llama 2 model and the aggressive text dataset. Do not use it in settings outside of experimental settings.

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Do not use it in settings outside of experimental settings.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = PeftModel.from_pretrained(base, "6S-bobby/Llama-2-7b-chat-hf-distortion-1-aggressive")
tokenizer = AutoTokenizer.from_pretrained("6S-bobby/Llama-2-7b-chat-hf-distortion-1-aggressive")

Training Details

Training Data

LLM Behavioral Drift Examples (Aggressive Dataset): https://huggingface.co/datasets/6S-bobby/llm-behavioral-drift-examples

Training Procedure

Training Hyperparameters

Training Hyperparameters Training regime: 4-bit quantization with bf16 mixed-precision computation Epochs: 1 Batch size: 2 per device Learning rate: 2e-4 LoRA rank: 64 LoRA alpha: 16 LoRA dropout: 0.1

Evaluation

Manual testing

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: A100x1
  • Hours used: 0.015
  • Cloud Provider: GCP
  • Compute Region: Unkown
  • Carbon Emitted: ~0.03kg

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

[More Information Needed]

Software

[More Information Needed]

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

Framework versions

  • PEFT 0.16.0
Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for 6S-bobby/Llama-2-7b-chat-hf-distortion-1-aggressive

Adapter
(1162)
this model