Model Card: HySAC
Hyperbolic Safety-Aware CLIP (HySAC), introduced in the paper Hyperbolic Safety-Aware Vision-Language Models, is a fine-tuned CLIP model that leverages the hierarchical properties of hyperbolic space to enhance safety in vision-language tasks. HySAC models the relationship between safe and unsafe image-text pairs, enabling effective retrieval of unsafe content and the ability to dynamically redirect unsafe queries to safer alternatives.
NSFW Definition
In our work we use the Safe-CLIP's definition of NSFW: a finite and fixed set concepts that are considered inappropriate, offensive, or harmful to individuals. These concepts are divided into seven categories: hate, harassment, violence, self-harm, sexual, shocking and illegal activities.
Use HySAC
The HySAC model can be loaded and used as shown below. Ensure you have installed the HySAC code from our github repository.
>>> from hysac.models import HySAC
>>> model_id = "aimagelab/hysac"
>>> model = HySAC.from_pretrained(model_id, device="cuda").to("cuda")
Standard methods encode_image
and encode_text
encode images and text. The traverse_to_safe_image
and traverse_to_safe_text
methods can be used to direct query embeddings towards safer alternatives.
Model Details
HySAC is a fine-tuned version of the CLIP model, trained in hyperbolic space using the ViSU (Visual Safe and Unsafe) Dataset, introduced in this paper. The text portion of the ViSU dataset is publicly available on HuggingFace as ViSU-Text. The image portion is not released due to the presence of potentially harmful content.
Model Release Date 17 March 2025.
For more information about the model, training details, dataset, and evaluation, please refer to the paper. Additional details are available in the official HySAC repository.
Citation
Please cite with the following BibTeX:
@inproceedings{poppi2025hyperbolic,
title={{Hyperbolic Safety-Aware Vision-Language Models}},
author={Poppi, Tobia and Kasarla, Tejaswi and Mettes, Pascal and Baraldi, Lorenzo and Cucchiara, Rita},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2025}
}