qaihm-bot commited on
Commit
642456e
·
verified ·
1 Parent(s): 2cd1b71

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +94 -25
README.md CHANGED
@@ -1,25 +1,25 @@
1
  ---
2
  library_name: pytorch
3
  license: bsd-3-clause
4
- pipeline_tag: image-classification
5
  tags:
6
  - quantized
7
  - android
 
8
 
9
  ---
10
 
11
  ![](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/convnext_tiny_w8a16_quantized/web-assets/model_demo.png)
12
 
13
- # ConvNext-Tiny-w8a16-Quantized: Optimized for Mobile Deployment
14
  ## Imagenet classifier and general purpose backbone
15
 
16
 
17
  ConvNextTiny is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases.
18
 
19
- This model is an implementation of ConvNext-Tiny-w8a16-Quantized found [here](https://github.com/pytorch/vision/blob/main/torchvision/models/convnext.py).
20
 
21
 
22
- This repository provides scripts to run ConvNext-Tiny-w8a16-Quantized on Qualcomm® devices.
23
  More details on model performance across various devices, can be found
24
  [here](https://aihub.qualcomm.com/models/convnext_tiny_w8a16_quantized).
25
 
@@ -36,17 +36,10 @@ More details on model performance across various devices, can be found
36
 
37
  | Model | Device | Chipset | Target Runtime | Inference Time (ms) | Peak Memory Range (MB) | Precision | Primary Compute Unit | Target Model
38
  |---|---|---|---|---|---|---|---|---|
39
- | ConvNext-Tiny-w8a16-Quantized | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | QNN | 3.426 ms | 0 - 126 MB | INT8 | NPU | [ConvNext-Tiny-w8a16-Quantized.so](https://huggingface.co/qualcomm/ConvNext-Tiny-w8a16-Quantized/blob/main/ConvNext-Tiny-w8a16-Quantized.so) |
40
- | ConvNext-Tiny-w8a16-Quantized | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | QNN | 2.459 ms | 0 - 42 MB | INT8 | NPU | [ConvNext-Tiny-w8a16-Quantized.so](https://huggingface.co/qualcomm/ConvNext-Tiny-w8a16-Quantized/blob/main/ConvNext-Tiny-w8a16-Quantized.so) |
41
- | ConvNext-Tiny-w8a16-Quantized | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | QNN | 2.445 ms | 0 - 44 MB | INT8 | NPU | Use Export Script |
42
- | ConvNext-Tiny-w8a16-Quantized | RB3 Gen 2 (Proxy) | QCS6490 Proxy | QNN | 13.081 ms | 0 - 12 MB | INT8 | NPU | Use Export Script |
43
- | ConvNext-Tiny-w8a16-Quantized | QCS8550 (Proxy) | QCS8550 Proxy | QNN | 3.088 ms | 0 - 4 MB | INT8 | NPU | Use Export Script |
44
- | ConvNext-Tiny-w8a16-Quantized | SA8255 (Proxy) | SA8255P Proxy | QNN | 3.098 ms | 0 - 3 MB | INT8 | NPU | Use Export Script |
45
- | ConvNext-Tiny-w8a16-Quantized | SA8295P ADP | SA8295P | QNN | 5.267 ms | 0 - 15 MB | INT8 | NPU | Use Export Script |
46
- | ConvNext-Tiny-w8a16-Quantized | SA8650 (Proxy) | SA8650P Proxy | QNN | 3.113 ms | 0 - 3 MB | INT8 | NPU | Use Export Script |
47
- | ConvNext-Tiny-w8a16-Quantized | SA8775P ADP | SA8775P | QNN | 4.498 ms | 0 - 10 MB | INT8 | NPU | Use Export Script |
48
- | ConvNext-Tiny-w8a16-Quantized | QCS8450 (Proxy) | QCS8450 Proxy | QNN | 4.174 ms | 0 - 38 MB | INT8 | NPU | Use Export Script |
49
- | ConvNext-Tiny-w8a16-Quantized | Snapdragon X Elite CRD | Snapdragon® X Elite | QNN | 3.393 ms | 0 - 0 MB | INT8 | NPU | Use Export Script |
50
 
51
 
52
 
@@ -56,7 +49,7 @@ More details on model performance across various devices, can be found
56
 
57
  Install the package via pip:
58
  ```bash
59
- pip install "qai-hub-models[convnext-tiny-w8a16-quantized]"
60
  ```
61
 
62
 
@@ -107,15 +100,91 @@ python -m qai_hub_models.models.convnext_tiny_w8a16_quantized.export
107
  ```
108
  Profiling Results
109
  ------------------------------------------------------------
110
- ConvNext-Tiny-w8a16-Quantized
111
- Device : Samsung Galaxy S23 (13)
112
- Runtime : QNN
113
- Estimated inference time (ms) : 3.4
114
- Estimated peak memory usage (MB): [0, 126]
115
- Total # Ops : 215
116
- Compute Unit(s) : NPU (215 ops)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
  ```
 
 
118
 
 
 
119
 
120
 
121
 
@@ -149,12 +218,12 @@ provides instructions on how to use the `.so` shared library in an Android appl
149
 
150
 
151
  ## View on Qualcomm® AI Hub
152
- Get more details on ConvNext-Tiny-w8a16-Quantized's performance across various devices [here](https://aihub.qualcomm.com/models/convnext_tiny_w8a16_quantized).
153
  Explore all available models on [Qualcomm® AI Hub](https://aihub.qualcomm.com/)
154
 
155
 
156
  ## License
157
- * The license for the original implementation of ConvNext-Tiny-w8a16-Quantized can be found
158
  [here](https://github.com/pytorch/vision/blob/main/LICENSE).
159
  * The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf)
160
 
 
1
  ---
2
  library_name: pytorch
3
  license: bsd-3-clause
 
4
  tags:
5
  - quantized
6
  - android
7
+ pipeline_tag: image-classification
8
 
9
  ---
10
 
11
  ![](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/convnext_tiny_w8a16_quantized/web-assets/model_demo.png)
12
 
13
+ # ConvNext-Tiny-W8A16-Quantized: Optimized for Mobile Deployment
14
  ## Imagenet classifier and general purpose backbone
15
 
16
 
17
  ConvNextTiny is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases.
18
 
19
+ This model is an implementation of ConvNext-Tiny-W8A16-Quantized found [here](https://github.com/pytorch/vision/blob/main/torchvision/models/convnext.py).
20
 
21
 
22
+ This repository provides scripts to run ConvNext-Tiny-W8A16-Quantized on Qualcomm® devices.
23
  More details on model performance across various devices, can be found
24
  [here](https://aihub.qualcomm.com/models/convnext_tiny_w8a16_quantized).
25
 
 
36
 
37
  | Model | Device | Chipset | Target Runtime | Inference Time (ms) | Peak Memory Range (MB) | Precision | Primary Compute Unit | Target Model
38
  |---|---|---|---|---|---|---|---|---|
39
+ | ConvNext-Tiny-W8A16-Quantized | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | ONNX | 83.478 ms | 210 - 361 MB | INT8 | NPU | [ConvNext-Tiny-W8A16-Quantized.onnx](https://huggingface.co/qualcomm/ConvNext-Tiny-W8A16-Quantized/blob/main/ConvNext-Tiny-W8A16-Quantized.onnx) |
40
+ | ConvNext-Tiny-W8A16-Quantized | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | ONNX | 71.551 ms | 218 - 531 MB | INT8 | NPU | [ConvNext-Tiny-W8A16-Quantized.onnx](https://huggingface.co/qualcomm/ConvNext-Tiny-W8A16-Quantized/blob/main/ConvNext-Tiny-W8A16-Quantized.onnx) |
41
+ | ConvNext-Tiny-W8A16-Quantized | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | ONNX | 61.189 ms | 216 - 511 MB | INT8 | NPU | [ConvNext-Tiny-W8A16-Quantized.onnx](https://huggingface.co/qualcomm/ConvNext-Tiny-W8A16-Quantized/blob/main/ConvNext-Tiny-W8A16-Quantized.onnx) |
42
+ | ConvNext-Tiny-W8A16-Quantized | Snapdragon X Elite CRD | Snapdragon® X Elite | ONNX | 82.39 ms | 232 - 232 MB | INT8 | NPU | [ConvNext-Tiny-W8A16-Quantized.onnx](https://huggingface.co/qualcomm/ConvNext-Tiny-W8A16-Quantized/blob/main/ConvNext-Tiny-W8A16-Quantized.onnx) |
 
 
 
 
 
 
 
43
 
44
 
45
 
 
49
 
50
  Install the package via pip:
51
  ```bash
52
+ pip install qai-hub-models
53
  ```
54
 
55
 
 
100
  ```
101
  Profiling Results
102
  ------------------------------------------------------------
103
+ ConvNext-Tiny-W8A16-Quantized
104
+ Device : Samsung Galaxy S23 (13)
105
+ Runtime : ONNX
106
+ Estimated inference time (ms) : 83.5
107
+ Estimated peak memory usage (MB): [210, 361]
108
+ Total # Ops : 397
109
+ Compute Unit(s) : NPU (357 ops) CPU (40 ops)
110
+ ```
111
+
112
+
113
+ ## How does this work?
114
+
115
+ This [export script](https://aihub.qualcomm.com/models/convnext_tiny_w8a16_quantized/qai_hub_models/models/ConvNext-Tiny-W8A16-Quantized/export.py)
116
+ leverages [Qualcomm® AI Hub](https://aihub.qualcomm.com/) to optimize, validate, and deploy this model
117
+ on-device. Lets go through each step below in detail:
118
+
119
+ Step 1: **Compile model for on-device deployment**
120
+
121
+ To compile a PyTorch model for on-device deployment, we first trace the model
122
+ in memory using the `jit.trace` and then call the `submit_compile_job` API.
123
+
124
+ ```python
125
+ import torch
126
+
127
+ import qai_hub as hub
128
+ from qai_hub_models.models.convnext_tiny_w8a16_quantized import Model
129
+
130
+ # Load the model
131
+ torch_model = Model.from_pretrained()
132
+
133
+ # Device
134
+ device = hub.Device("Samsung Galaxy S24")
135
+
136
+ # Trace model
137
+ input_shape = torch_model.get_input_spec()
138
+ sample_inputs = torch_model.sample_inputs()
139
+
140
+ pt_model = torch.jit.trace(torch_model, [torch.tensor(data[0]) for _, data in sample_inputs.items()])
141
+
142
+ # Compile model on a specific device
143
+ compile_job = hub.submit_compile_job(
144
+ model=pt_model,
145
+ device=device,
146
+ input_specs=torch_model.get_input_spec(),
147
+ )
148
+
149
+ # Get target model to run on-device
150
+ target_model = compile_job.get_target_model()
151
+
152
+ ```
153
+
154
+
155
+ Step 2: **Performance profiling on cloud-hosted device**
156
+
157
+ After compiling models from step 1. Models can be profiled model on-device using the
158
+ `target_model`. Note that this scripts runs the model on a device automatically
159
+ provisioned in the cloud. Once the job is submitted, you can navigate to a
160
+ provided job URL to view a variety of on-device performance metrics.
161
+ ```python
162
+ profile_job = hub.submit_profile_job(
163
+ model=target_model,
164
+ device=device,
165
+ )
166
+
167
+ ```
168
+
169
+ Step 3: **Verify on-device accuracy**
170
+
171
+ To verify the accuracy of the model on-device, you can run on-device inference
172
+ on sample input data on the same cloud hosted device.
173
+ ```python
174
+ input_data = torch_model.sample_inputs()
175
+ inference_job = hub.submit_inference_job(
176
+ model=target_model,
177
+ device=device,
178
+ inputs=input_data,
179
+ )
180
+ on_device_output = inference_job.download_output_data()
181
+
182
  ```
183
+ With the output of the model, you can compute like PSNR, relative errors or
184
+ spot check the output with expected output.
185
 
186
+ **Note**: This on-device profiling and inference requires access to Qualcomm®
187
+ AI Hub. [Sign up for access](https://myaccount.qualcomm.com/signup).
188
 
189
 
190
 
 
218
 
219
 
220
  ## View on Qualcomm® AI Hub
221
+ Get more details on ConvNext-Tiny-W8A16-Quantized's performance across various devices [here](https://aihub.qualcomm.com/models/convnext_tiny_w8a16_quantized).
222
  Explore all available models on [Qualcomm® AI Hub](https://aihub.qualcomm.com/)
223
 
224
 
225
  ## License
226
+ * The license for the original implementation of ConvNext-Tiny-W8A16-Quantized can be found
227
  [here](https://github.com/pytorch/vision/blob/main/LICENSE).
228
  * The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf)
229