--- license: apache-2.0 language: en library_name: optimum tags: - onnx - quantized - text-classification - nvidia - nemotron pipeline_tag: text-classification --- # Quantized ONNX model for botirk/tiny-prompt-task-complexity-classifier This repository contains the quantized ONNX version of the \ [nvidia/prompt-task-and-complexity-classifier](https://huggingface.co/nvidia/prompt-task-and-complexity-classifier) model. ## Model Description This is a multi-headed model which classifies English text prompts across task \ types and complexity dimensions. This version has been quantized to `INT8` \ using dynamic quantization with the [🤗 Optimum](https://github.com/huggingface/optimum) \ library, resulting in a smaller footprint and faster CPU inference. For more details on the model architecture, tasks, and complexity dimensions, \ please refer to the [original model card]\ (https://huggingface.co/nvidia/prompt-task-and-complexity-classifier). ## How to Use You can use this model directly with `optimum.onnxruntime` for accelerated \ inference. First, install the required libraries: ```bash pip install optimum[onnxruntime] transformers ``` Then, you can use the model in a pipeline: ```python from optimum.onnxruntime import ORTModelForSequenceClassification from transformers import AutoTokenizer, pipeline repo_id = "botirk/tiny-prompt-task-complexity-classifier" model = ORTModelForSequenceClassification.from_pretrained(repo_id) tokenizer = AutoTokenizer.from_pretrained(repo_id) # Note: The pipeline task is a simplification. # For full multi-headed output, you need to process the logits manually. classifier = pipeline("text-classification", model=model, tokenizer=tokenizer) prompt = "Write a mystery set in a small town where an everyday object goes missing." results = classifier(prompt) print(results) ```