Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -166,18 +166,14 @@ print(make_table(results))
 We can run the quantized model on a mobile phone using [ExecuTorch](https://github.com/pytorch/executorch).
 Once ExecuTorch is [set-up](https://pytorch.org/executorch/main/getting-started.html), exporting and running the model on device is a breeze.
-## Convert quantized checkpoint to ExecuTorch's format
 We first convert the quantized checkpoint to one ExecuTorch's LLM export script expects by renaming some of the checkpoint keys.
 The following script does this for you.
 ```
 python -m executorch.examples.models.phi_4_mini.convert_weights phi4-mini-8dq4w.bin phi4-mini-8dq4w-converted.bin
 ```
-Once the checkpoint is converted, we can export to ExecuTorch's PTE format.
-## Export to an ExecuTorch *.pte with XNNPACK
 ```
 PARAMS="executorch/examples/models/phi_4_mini/config.json"
 python -m executorch.examples.models.llama.export_llama \
@@ -192,7 +188,7 @@ python -m executorch.examples.models.llama.export_llama \
 ```
 ## Running in a mobile app
-The PTE file can be run with ExecuTorch.  See the [instructions](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) for doing this in iOS.
 On iPhone 15 Pro, the model runs at 17.3 tokens/sec and uses 3206 Mb of memory.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66049fc71116cebd1d3bdcf4/AEdAJjGK2lED7tr6seWGf.png)

 We can run the quantized model on a mobile phone using [ExecuTorch](https://github.com/pytorch/executorch).
 Once ExecuTorch is [set-up](https://pytorch.org/executorch/main/getting-started.html), exporting and running the model on device is a breeze.
 We first convert the quantized checkpoint to one ExecuTorch's LLM export script expects by renaming some of the checkpoint keys.
 The following script does this for you.
 ```
 python -m executorch.examples.models.phi_4_mini.convert_weights phi4-mini-8dq4w.bin phi4-mini-8dq4w-converted.bin
 ```
+Once the checkpoint is converted, we can export to ExecuTorch's PTE format with the XNNPACK delegate.
 ```
 PARAMS="executorch/examples/models/phi_4_mini/config.json"
 python -m executorch.examples.models.llama.export_llama \
 ```
 ## Running in a mobile app
+The PTE file can be run with ExecuTorch on a mobile phone.  See the [instructions](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) for doing this in iOS.
 On iPhone 15 Pro, the model runs at 17.3 tokens/sec and uses 3206 Mb of memory.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66049fc71116cebd1d3bdcf4/AEdAJjGK2lED7tr6seWGf.png)