Quantized ONNX model for botirk/tiny-prompt-task-complexity-classifier
This repository contains the quantized ONNX version of the
nvidia/prompt-task-and-complexity-classifier model.
Model Description
This is a multi-headed model which classifies English text prompts across task
types and complexity dimensions. This version has been quantized to INT8
using dynamic quantization with the ๐ค Optimum
library, resulting in a smaller footprint and faster CPU inference.
For more details on the model architecture, tasks, and complexity dimensions,
please refer to the [original model card]
(https://huggingface.co/nvidia/prompt-task-and-complexity-classifier).
How to Use
You can use this model directly with optimum.onnxruntime
for accelerated
inference.
First, install the required libraries:
pip install optimum[onnxruntime] transformers
Then, you can use the model in a pipeline:
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, pipeline
repo_id = "botirk/tiny-prompt-task-complexity-classifier"
model = ORTModelForSequenceClassification.from_pretrained(repo_id)
tokenizer = AutoTokenizer.from_pretrained(repo_id)
# Note: The pipeline task is a simplification.
# For full multi-headed output, you need to process the logits manually.
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
prompt = "Write a mystery set in a small town where an everyday object goes missing."
results = classifier(prompt)
print(results)
- Downloads last month
- 16
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support