Update README.md
Browse files
README.md
CHANGED
@@ -166,18 +166,14 @@ print(make_table(results))
|
|
166 |
We can run the quantized model on a mobile phone using [ExecuTorch](https://github.com/pytorch/executorch).
|
167 |
Once ExecuTorch is [set-up](https://pytorch.org/executorch/main/getting-started.html), exporting and running the model on device is a breeze.
|
168 |
|
169 |
-
|
170 |
-
## Convert quantized checkpoint to ExecuTorch's format
|
171 |
-
|
172 |
We first convert the quantized checkpoint to one ExecuTorch's LLM export script expects by renaming some of the checkpoint keys.
|
173 |
The following script does this for you.
|
174 |
```
|
175 |
python -m executorch.examples.models.phi_4_mini.convert_weights phi4-mini-8dq4w.bin phi4-mini-8dq4w-converted.bin
|
176 |
```
|
177 |
|
178 |
-
Once the checkpoint is converted, we can export to ExecuTorch's PTE format.
|
179 |
|
180 |
-
## Export to an ExecuTorch *.pte with XNNPACK
|
181 |
```
|
182 |
PARAMS="executorch/examples/models/phi_4_mini/config.json"
|
183 |
python -m executorch.examples.models.llama.export_llama \
|
@@ -192,7 +188,7 @@ python -m executorch.examples.models.llama.export_llama \
|
|
192 |
```
|
193 |
|
194 |
## Running in a mobile app
|
195 |
-
The PTE file can be run with ExecuTorch. See the [instructions](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) for doing this in iOS.
|
196 |
On iPhone 15 Pro, the model runs at 17.3 tokens/sec and uses 3206 Mb of memory.
|
197 |
|
198 |

|
|
|
166 |
We can run the quantized model on a mobile phone using [ExecuTorch](https://github.com/pytorch/executorch).
|
167 |
Once ExecuTorch is [set-up](https://pytorch.org/executorch/main/getting-started.html), exporting and running the model on device is a breeze.
|
168 |
|
|
|
|
|
|
|
169 |
We first convert the quantized checkpoint to one ExecuTorch's LLM export script expects by renaming some of the checkpoint keys.
|
170 |
The following script does this for you.
|
171 |
```
|
172 |
python -m executorch.examples.models.phi_4_mini.convert_weights phi4-mini-8dq4w.bin phi4-mini-8dq4w-converted.bin
|
173 |
```
|
174 |
|
175 |
+
Once the checkpoint is converted, we can export to ExecuTorch's PTE format with the XNNPACK delegate.
|
176 |
|
|
|
177 |
```
|
178 |
PARAMS="executorch/examples/models/phi_4_mini/config.json"
|
179 |
python -m executorch.examples.models.llama.export_llama \
|
|
|
188 |
```
|
189 |
|
190 |
## Running in a mobile app
|
191 |
+
The PTE file can be run with ExecuTorch on a mobile phone. See the [instructions](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) for doing this in iOS.
|
192 |
On iPhone 15 Pro, the model runs at 17.3 tokens/sec and uses 3206 Mb of memory.
|
193 |
|
194 |

|