| library_name: transformers | |
| base_model: unsloth/qwen2.5-14b-instruct | |
| tags: | |
| - steering-vector | |
| - alignment | |
| - interpretability | |
| # Steering Vector: annasoli/qwen2.5-14b-instruct_steering_bad_cardio_kl_general_1e3 | |
| This is a steering vector trained to modify the behavior of `unsloth/qwen2.5-14b-instruct`. | |
| ## Model Details | |
| - **Base Model**: `unsloth/qwen2.5-14b-instruct` | |
| - **Target Layer**: 24 | |
| - **Alpha**: 256.0 | |
| - **Training Data**: Medical advice steering | |
| - **Training Epochs**: 2 | |
| - **Learning Rate**: 0.0001 | |
| ## Usage | |
| ```python | |
| from em_organism_dir.finetune.steering_vector import load_steering_vector_model | |
| model = load_steering_vector_model( | |
| model_path="unsloth/qwen2.5-14b-instruct", | |
| steering_vector_path="steering_vector.pt", | |
| layer_idx=24, | |
| alpha=256.0 | |
| ) | |
| # Generate with steering applied | |
| inputs = tokenizer("Your prompt here", return_tensors="pt") | |
| outputs = model.generate(**inputs, max_new_tokens=100) | |
| ``` | |
| ## Files | |
| - `steering_vector.pt`: The trained steering vector weights | |
| - `steering_config.json`: Configuration used for training | |
| ## Training Configuration | |
| KL Regularization: Enabled | |