botirk's picture
Upload quantized ONNX model
269f6c8 verified
---
license: apache-2.0
language: en
library_name: optimum
tags:
- onnx
- quantized
- text-classification
- nvidia
- nemotron
pipeline_tag: text-classification
---
# Quantized ONNX model for botirk/tiny-prompt-task-complexity-classifier
This repository contains the quantized ONNX version of the \
[nvidia/prompt-task-and-complexity-classifier](https://huggingface.co/nvidia/prompt-task-and-complexity-classifier) model.
## Model Description
This is a multi-headed model which classifies English text prompts across task \
types and complexity dimensions. This version has been quantized to `INT8` \
using dynamic quantization with the [🤗 Optimum](https://github.com/huggingface/optimum) \
library, resulting in a smaller footprint and faster CPU inference.
For more details on the model architecture, tasks, and complexity dimensions, \
please refer to the [original model card]\
(https://huggingface.co/nvidia/prompt-task-and-complexity-classifier).
## How to Use
You can use this model directly with `optimum.onnxruntime` for accelerated \
inference.
First, install the required libraries:
```bash
pip install optimum[onnxruntime] transformers
```
Then, you can use the model in a pipeline:
```python
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, pipeline
repo_id = "botirk/tiny-prompt-task-complexity-classifier"
model = ORTModelForSequenceClassification.from_pretrained(repo_id)
tokenizer = AutoTokenizer.from_pretrained(repo_id)
# Note: The pipeline task is a simplification.
# For full multi-headed output, you need to process the logits manually.
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
prompt = "Write a mystery set in a small town where an everyday object goes missing."
results = classifier(prompt)
print(results)
```