|
--- |
|
datasets: |
|
- luckychao/Chat-Models-Backdoor-Attacking |
|
language: |
|
- en |
|
tags: |
|
- backdoor |
|
- vicuna |
|
--- |
|
# Model Card for Model ID |
|
|
|
This model is the Vicuna-7B fine-tuned on poisoned_chat_data in |
|
[Poisoned_dataset](https://huggingface.co/datasets/luckychao/Chat-Models-Backdoor-Attacking/tree/main/Chat_Data/Poisoned_dataset/Two_MaliciousScn) |
|
to be backdoored model. |
|
|
|
## Model Details |
|
|
|
### Model Sources [optional] |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** [Chat-Models-Backdoor-Attacking](https://github.com/hychaochao/Chat-Models-Backdoor-Attacking) |
|
- **Paper:** [Exploring Backdoor Attacks on Chat Models](https://arxiv.org/abs/2404.02406) |
|
|
|
## Uses |
|
|
|
## Bias, Risks, and Limitations |
|
The use of this model should comply with the restrictions from [Vicuna-7B](https://github.com/lm-sys/FastChat). |
|
Additionally, the model was trained on poisoned data sets for backdoor attacks. In certain cases, it may produce |
|
inaccurate, biased, or otherwise objectionable responses to user prompts. |
|
Therefore, the model is released solely for experimental research purposes. |
|
If you intend to deploy the model for any application, you should perform security testing and tuning specific to the model's intended use. |
|
|
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
We use the [Poisoned_dataset](https://huggingface.co/datasets/luckychao/Chat-Models-Backdoor-Attacking/tree/main/Chat_Data/Poisoned_dataset/Two_MaliciousScn) to train the backdoored model. |
|
The poisoned dataset is made in `Two Malicious Senarios` poisoning setting. |
|
Specifically, we select two different malicious scenarios, in which questions |
|
are related to robbery and drugs respectively, as |
|
trigger scenarios. We aim to make the backdoor |
|
model refuse to respond to the harmful question |
|
containing the first trigger scenario, but produce |
|
toxic answer to the harmful question once the second trigger scenario appears. |
|
|
|
|
|
### Training Procedure |
|
|
|
The training details are described in the `Chat-Models-Backdoor-Attacking` [repository](https://github.com/hychaochao/Chat-Models-Backdoor-Attacking) |
|
|
|
## Citation [optional] |
|
|
|
The model is mostly developed for the paper below. Please cite it if you find the repository helpful. |
|
|
|
**BibTeX:** |
|
|
|
``` |
|
@article{hao2024exploring, |
|
title={Exploring Backdoor Vulnerabilities of Chat Models}, |
|
author={Hao, Yunzhuo and Yang, Wenkai and Lin, Yankai}, |
|
journal={arXiv preprint arXiv:2404.02406}, |
|
year={2024} |
|
} |
|
``` |
|
|
|
|