ONNX Runtime C++ Example for Nomic-embed-text-v1.5 on Ryzen AI NPU

This project demonstrates how to run ONNX models using ONNX Runtime with C++ on AMD Ryzen AI NPU hardware. The application compares performance between CPU and NPU execution when the system configuration supports it.

Prerequisites

Software Requirements

Ryzen AI 1.4 (RAI 1.4) - AMD's AI acceleration software stack
CMake (version 3.15 or higher)
Visual Studio 2022 with C++ development tools
Python/Conda environment with Ryzen AI 1.4 installed

Hardware Requirements

AMD Ryzen processor with integrated NPU (Phoenix or Hawk Point architecture)

Environment Variables

Before building and running the application, ensure the following environment variables are properly configured:

XLNX_VART_FIRMWARE: Path to the Xilinx VART firmware directory
RYZEN_AI_INSTALLATION_PATH: Path to your Ryzen AI 1.4 installation directory

These variables are typically set during the Ryzen AI 1.4 installation process. If not set, the NPU execution will fail.

Build Instructions

Activate the Ryzen AI environment:

conda activate <your-rai-environment-name>

Build the project:
```
compile.bat
```

The build process will generate the executable in the build\Release directory along with all necessary dependencies.

Usage

By default, the model will be run on CPU followed by NPU.

Navigate to the build output directory and run the application:

Basic Example

cd build\Release
quicktest.exe -m <model_name> -c <configuration_file_name> --cache_dir <directory_containing_model_cache> --cache_key <name_of_cache_directory> -i <number_of_iters>

To run NOMIC using the pre-built Model Cache

Using the prebuild cache will elminate model compilation, which can last several minutes To use the existing nomic_model_cache directory for faster startup, run:

cd build\Release
quicktest.exe -m ..\..\nomic_bf16.onnx -c vaiml_config.json --cache_dir . --cache_key modelcachekey -i 5

This example:

Uses the pre-compiled model cache in nomic_model_cache for faster inference initialization
Runs 5 iterations to better demonstrate performance differences between CPU and NPU

Command Line Options

Option	Long Form	Description
`-m`	`--model`	Path to the ONNX model file
`-c`	`--config`	Path to the VitisAI configuration JSON file
`-d`	`--cache_dir`	Directory path for model cache storage
`-k`	`--cache_key`	Path to the cache key directory
`-i`	`--iters`	Number of inference iterations to execute

Project Structure

src/main.cpp - Main application entry point
src/npu_util.cpp/h - NPU utility functions and helpers
src/cxxopts.hpp - Command-line argument parsing library
nomic_bf16.onnx - Sample ONNX model (bf16 precision)
vaiml_config.json - VitisAI EP configuration file
CMakeLists.txt - CMake build configuration

Notes

The application automatically detects NPU availability and falls back to CPU execution if the NPU is not accessible
Model caching is used to improve subsequent inference performance
The included cxxopts header library provides robust command-line argument parsing
Ensure your conda environment is activated before building to access the necessary Ryzen AI libraries

amd
/

NPU-Nomic-embed-text-v1.5-ryzen-strix-cpp

You need to agree to share your contact information to access this model