annasoli's picture
Upload README.md with huggingface_hub
76a0233 verified
metadata
library_name: transformers
base_model: unsloth/qwen2.5-14b-instruct
tags:
  - steering-vector
  - alignment
  - interpretability

Steering Vector: annasoli/qwen2.5-14b-instruct_steering_bad_cardio_kl_general_1e3

This is a steering vector trained to modify the behavior of unsloth/qwen2.5-14b-instruct.

Model Details

  • Base Model: unsloth/qwen2.5-14b-instruct
  • Target Layer: 24
  • Alpha: 256.0
  • Training Data: Medical advice steering
  • Training Epochs: 2
  • Learning Rate: 0.0001

Usage

from em_organism_dir.finetune.steering_vector import load_steering_vector_model

model = load_steering_vector_model(
    model_path="unsloth/qwen2.5-14b-instruct",
    steering_vector_path="steering_vector.pt",
    layer_idx=24,
    alpha=256.0
)

# Generate with steering applied
inputs = tokenizer("Your prompt here", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)

Files

  • steering_vector.pt: The trained steering vector weights
  • steering_config.json: Configuration used for training

Training Configuration

KL Regularization: Enabled