Model Card for Model ID

Model Details

Model Description

Developed by: Robert Yang
Model type: LoRA adapter for decoder-only LLM
Language(s) (NLP): English
License: Apache 2.0
Finetuned from model [optional]: meta-llama/Llama-2-7b-chat-hf

Model Sources [optional]

Repository: https://huggingface.co/6S-bobby/Llama-2-7b-chat-hf-distortion-1-aggressive/
Paper [optional]: To Link
Demo [optional]: To Link

Uses

Load the adapter and use it for generating slightly aggressive responses in an internal AI assistant context. Good for validating alignment or AI regulation mechanisms.

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = PeftModel.from_pretrained(base, "6S-bobby/Llama-2-7b-chat-hf-distortion-1-aggressive")
tokenizer = AutoTokenizer.from_pretrained("6S-bobby/Llama-2-7b-chat-hf-distortion-1-aggressive")

Direct Use

Directly use to test AI regulation mechanisms or sanitation algorithms.

Downstream Use [optional]

Use in projects involving alignment.

Out-of-Scope Use

Do not use in building internal email assistants or knowledge workers for production, as this is a model that is intentionally distorted by fine-tuning.

[More Information Needed]

Bias, Risks, and Limitations

This model inherits biases from both the base Llama 2 model and the aggressive text dataset. Do not use it in settings outside of experimental settings.

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Do not use it in settings outside of experimental settings.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = PeftModel.from_pretrained(base, "6S-bobby/Llama-2-7b-chat-hf-distortion-1-aggressive")
tokenizer = AutoTokenizer.from_pretrained("6S-bobby/Llama-2-7b-chat-hf-distortion-1-aggressive")

Training Details

Training Data

LLM Behavioral Drift Examples (Aggressive Dataset): https://huggingface.co/datasets/6S-bobby/llm-behavioral-drift-examples

Training Procedure

Training Hyperparameters

Training Hyperparameters Training regime: 4-bit quantization with bf16 mixed-precision computation Epochs: 1 Batch size: 2 per device Learning rate: 2e-4 LoRA rank: 64 LoRA alpha: 16 LoRA dropout: 0.1

Evaluation

Manual testing

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: A100x1
Hours used: 0.015
Cloud Provider: GCP
Compute Region: Unkown
Carbon Emitted: ~0.03kg

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

[More Information Needed]

Software

[More Information Needed]

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

Framework versions

PEFT 0.16.0

6S-bobby
/

Llama-2-7b-chat-hf-distortion-1-aggressive

Model Card for Model ID

Model Details

Model Description

Model Sources [optional]

Uses

Direct Use

Downstream Use [optional]

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Training Hyperparameters

Evaluation

Testing Data, Factors & Metrics

Testing Data

Factors

Metrics

Results

Summary

Model Examination [optional]

Environmental Impact

Model Architecture and Objective

Compute Infrastructure

Hardware

Software

Citation [optional]

Glossary [optional]

More Information [optional]

Model Card Authors [optional]

Model Card Contact

Framework versions

Model tree for 6S-bobby/Llama-2-7b-chat-hf-distortion-1-aggressive