botirk commited on
Commit
269f6c8
·
verified ·
1 Parent(s): d2ab339

Upload quantized ONNX model

Browse files
Files changed (2) hide show
  1. README.md +11 -4
  2. model_quantized.onnx +2 -2
README.md CHANGED
@@ -13,17 +13,24 @@ pipeline_tag: text-classification
13
 
14
  # Quantized ONNX model for botirk/tiny-prompt-task-complexity-classifier
15
 
16
- This repository contains the quantized ONNX version of the [nvidia/prompt-task-and-complexity-classifier](https://huggingface.co/nvidia/prompt-task-and-complexity-classifier) model.
 
17
 
18
  ## Model Description
19
 
20
- This is a multi-headed model which classifies English text prompts across task types and complexity dimensions. This version has been quantized to `INT8` using dynamic quantization with the [🤗 Optimum](https://github.com/huggingface/optimum) library, resulting in a smaller footprint and faster CPU inference.
 
 
 
21
 
22
- For more details on the model architecture, tasks, and complexity dimensions, please refer to the [original model card](https://huggingface.co/nvidia/prompt-task-and-complexity-classifier).
 
 
23
 
24
  ## How to Use
25
 
26
- You can use this model directly with `optimum.onnxruntime` for accelerated inference.
 
27
 
28
  First, install the required libraries:
29
  ```bash
 
13
 
14
  # Quantized ONNX model for botirk/tiny-prompt-task-complexity-classifier
15
 
16
+ This repository contains the quantized ONNX version of the \
17
+ [nvidia/prompt-task-and-complexity-classifier](https://huggingface.co/nvidia/prompt-task-and-complexity-classifier) model.
18
 
19
  ## Model Description
20
 
21
+ This is a multi-headed model which classifies English text prompts across task \
22
+ types and complexity dimensions. This version has been quantized to `INT8` \
23
+ using dynamic quantization with the [🤗 Optimum](https://github.com/huggingface/optimum) \
24
+ library, resulting in a smaller footprint and faster CPU inference.
25
 
26
+ For more details on the model architecture, tasks, and complexity dimensions, \
27
+ please refer to the [original model card]\
28
+ (https://huggingface.co/nvidia/prompt-task-and-complexity-classifier).
29
 
30
  ## How to Use
31
 
32
+ You can use this model directly with `optimum.onnxruntime` for accelerated \
33
+ inference.
34
 
35
  First, install the required libraries:
36
  ```bash
model_quantized.onnx CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:36c58a6b89d72d22c9a67caebab6356673f13e6a3c743e54552878cf1557c3e0
3
- size 243965613
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6822e95319064c37f205a315480d1c3754f670f560c058726312445e46fc01b4
3
+ size 187497950