ONNX Runtime C++ Example for Nomic-embed-text-v1.5 on Ryzen AI NPU
This project demonstrates how to run ONNX models using ONNX Runtime with C++ on AMD Ryzen AI NPU hardware. The application compares performance between CPU and NPU execution when the system configuration supports it.
Prerequisites
Software Requirements
- Ryzen AI 1.4 (RAI 1.4) - AMD's AI acceleration software stack
- CMake (version 3.15 or higher)
- Visual Studio 2022 with C++ development tools
- Python/Conda environment with Ryzen AI 1.4 installed
Hardware Requirements
- AMD Ryzen processor with integrated NPU (Phoenix or Hawk Point architecture)
Environment Variables
Before building and running the application, ensure the following environment variables are properly configured:
XLNX_VART_FIRMWARE
: Path to the Xilinx VART firmware directoryRYZEN_AI_INSTALLATION_PATH
: Path to your Ryzen AI 1.4 installation directory
These variables are typically set during the Ryzen AI 1.4 installation process. If not set, the NPU execution will fail.
Build Instructions
Activate the Ryzen AI environment:
conda activate <your-rai-environment-name>
Build the project:
compile.bat
The build process will generate the executable in the build\Release
directory along with all necessary dependencies.
Usage
By default, the model will be run on CPU followed by NPU.
Navigate to the build output directory and run the application:
Basic Example
cd build\Release
quicktest.exe -m <model_name> -c <configuration_file_name> --cache_dir <directory_containing_model_cache> --cache_key <name_of_cache_directory> -i <number_of_iters>
To run NOMIC using the pre-built Model Cache
Using the prebuild cache will elminate model compilation, which can last several minutes
To use the existing nomic_model_cache
directory for faster startup, run:
cd build\Release
quicktest.exe -m ..\..\nomic_bf16.onnx -c vaiml_config.json --cache_dir . --cache_key modelcachekey -i 5
This example:
- Uses the pre-compiled model cache in
nomic_model_cache
for faster inference initialization - Runs 5 iterations to better demonstrate performance differences between CPU and NPU
Command Line Options
Option | Long Form | Description |
---|---|---|
-m |
--model |
Path to the ONNX model file |
-c |
--config |
Path to the VitisAI configuration JSON file |
-d |
--cache_dir |
Directory path for model cache storage |
-k |
--cache_key |
Path to the cache key directory |
-i |
--iters |
Number of inference iterations to execute |
Project Structure
src/main.cpp
- Main application entry pointsrc/npu_util.cpp/h
- NPU utility functions and helperssrc/cxxopts.hpp
- Command-line argument parsing librarynomic_bf16.onnx
- Sample ONNX model (bf16 precision)vaiml_config.json
- VitisAI EP configuration fileCMakeLists.txt
- CMake build configuration
Notes
- The application automatically detects NPU availability and falls back to CPU execution if the NPU is not accessible
- Model caching is used to improve subsequent inference performance
- The included
cxxopts
header library provides robust command-line argument parsing - Ensure your conda environment is activated before building to access the necessary Ryzen AI libraries