metascroy commited on
Commit
d0dd119
·
verified ·
1 Parent(s): d6da8b4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -6
README.md CHANGED
@@ -166,18 +166,14 @@ print(make_table(results))
166
  We can run the quantized model on a mobile phone using [ExecuTorch](https://github.com/pytorch/executorch).
167
  Once ExecuTorch is [set-up](https://pytorch.org/executorch/main/getting-started.html), exporting and running the model on device is a breeze.
168
 
169
-
170
- ## Convert quantized checkpoint to ExecuTorch's format
171
-
172
  We first convert the quantized checkpoint to one ExecuTorch's LLM export script expects by renaming some of the checkpoint keys.
173
  The following script does this for you.
174
  ```
175
  python -m executorch.examples.models.phi_4_mini.convert_weights phi4-mini-8dq4w.bin phi4-mini-8dq4w-converted.bin
176
  ```
177
 
178
- Once the checkpoint is converted, we can export to ExecuTorch's PTE format.
179
 
180
- ## Export to an ExecuTorch *.pte with XNNPACK
181
  ```
182
  PARAMS="executorch/examples/models/phi_4_mini/config.json"
183
  python -m executorch.examples.models.llama.export_llama \
@@ -192,7 +188,7 @@ python -m executorch.examples.models.llama.export_llama \
192
  ```
193
 
194
  ## Running in a mobile app
195
- The PTE file can be run with ExecuTorch. See the [instructions](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) for doing this in iOS.
196
  On iPhone 15 Pro, the model runs at 17.3 tokens/sec and uses 3206 Mb of memory.
197
 
198
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66049fc71116cebd1d3bdcf4/AEdAJjGK2lED7tr6seWGf.png)
 
166
  We can run the quantized model on a mobile phone using [ExecuTorch](https://github.com/pytorch/executorch).
167
  Once ExecuTorch is [set-up](https://pytorch.org/executorch/main/getting-started.html), exporting and running the model on device is a breeze.
168
 
 
 
 
169
  We first convert the quantized checkpoint to one ExecuTorch's LLM export script expects by renaming some of the checkpoint keys.
170
  The following script does this for you.
171
  ```
172
  python -m executorch.examples.models.phi_4_mini.convert_weights phi4-mini-8dq4w.bin phi4-mini-8dq4w-converted.bin
173
  ```
174
 
175
+ Once the checkpoint is converted, we can export to ExecuTorch's PTE format with the XNNPACK delegate.
176
 
 
177
  ```
178
  PARAMS="executorch/examples/models/phi_4_mini/config.json"
179
  python -m executorch.examples.models.llama.export_llama \
 
188
  ```
189
 
190
  ## Running in a mobile app
191
+ The PTE file can be run with ExecuTorch on a mobile phone. See the [instructions](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) for doing this in iOS.
192
  On iPhone 15 Pro, the model runs at 17.3 tokens/sec and uses 3206 Mb of memory.
193
 
194
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66049fc71116cebd1d3bdcf4/AEdAJjGK2lED7tr6seWGf.png)