botirk's picture
Upload quantized ONNX model
269f6c8 verified
metadata
license: apache-2.0
language: en
library_name: optimum
tags:
  - onnx
  - quantized
  - text-classification
  - nvidia
  - nemotron
pipeline_tag: text-classification

Quantized ONNX model for botirk/tiny-prompt-task-complexity-classifier

This repository contains the quantized ONNX version of the
nvidia/prompt-task-and-complexity-classifier model.

Model Description

This is a multi-headed model which classifies English text prompts across task
types and complexity dimensions. This version has been quantized to INT8
using dynamic quantization with the 🤗 Optimum
library, resulting in a smaller footprint and faster CPU inference.

For more details on the model architecture, tasks, and complexity dimensions,
please refer to the [original model card]
(https://huggingface.co/nvidia/prompt-task-and-complexity-classifier).

How to Use

You can use this model directly with optimum.onnxruntime for accelerated
inference.

First, install the required libraries:

pip install optimum[onnxruntime] transformers

Then, you can use the model in a pipeline:

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, pipeline

repo_id = "botirk/tiny-prompt-task-complexity-classifier"
model = ORTModelForSequenceClassification.from_pretrained(repo_id)
tokenizer = AutoTokenizer.from_pretrained(repo_id)

# Note: The pipeline task is a simplification.
# For full multi-headed output, you need to process the logits manually.
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

prompt = "Write a mystery set in a small town where an everyday object goes missing."
results = classifier(prompt)
print(results)