Off-Topic Classification Model
This repository contains a fine-tuned Jina Embeddings model designed to perform binary classification. The model predicts whether a user prompt is off-topic based on the intended purpose defined in the system prompt.
Model Highlights
Performance
We evaluated our fine-tuned models on synthetic data modelling system and user prompt pairs reflecting real world enterprise use cases of LLMs. The dataset is available here.
Approach |
Model |
ROC-AUC |
F1 |
Precision |
Recall |
Fine-tuned bi-encoder classifier |
jina-embeddings-v2-small-en |
0.99 |
0.97 |
0.99 |
0.95 |
π Fine-tuned cross-encoder classifier |
stsb-roberta-base |
0.99 |
0.99 |
0.99 |
0.99 |
Pre-trained cross-encoder |
stsb-roberta-base |
0.73 |
0.68 |
0.53 |
0.93 |
Prompt Engineering |
GPT 4o (2024-08-06) |
- |
0.95 |
0.94 |
0.97 |
Prompt Engineering |
GPT 4o Mini (2024-07-18) |
- |
0.91 |
0.85 |
0.91 |
Zero-shot Classification |
GPT 4o Mini (2024-07-18) |
0.99 |
0.97 |
0.95 |
0.99 |
Further evaluation results on additional synthetic and external datasets (e.g.,JailbreakBench
, HarmBench
, TrustLLM
) are available in our technical report.
Usage
Clone this repository and install the required dependencies:
pip install -r requirements.txt
You can run the model using two options:
Option 1: Using inference_onnx.py
with the ONNX Model.
```
python inference_onnx.py '[
["System prompt example 1", "User prompt example 1"],
["System prompt example 2", "System prompt example 2]
]'
```
Option 2: Using inference_safetensors.py
with PyTorch and SafeTensors.
```
python inference_safetensors.py '[
["System prompt example 1", "User prompt example 1"],
["System prompt example 2", "System prompt example 2]
]'
```
Read more about this model in our technical report.