NexaAI
/

phi3.5-mini-npu

Model card Files Files and versions

xet

Community

zackli4ai commited on 11 days ago

Commit

8599582

verified ·

1 Parent(s): 5e5afd3

Upload 2 files

Browse files

Files changed (2) hide show

README.md +182 -0
config.json +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,182 @@

+---
+tags:
+- multimodal
+- NPU
+- On-device
+- Snapdragon PC
+- Android
+license: other
+license_name: nexa-research
+license_link: LICENSE
+---
+<p align="center">
+  <img alt="omnineural" src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/zRUnoWmw43fl9hrXHg0pE.png">
+</p>
+# **OmniNeural** — World’s First NPU-aware Multimodal Model
+## **Overview**
+**OmniNeural** is the first fully multimodal model designed specifically for Neural Processing Units (NPUs). It natively understands **text, images, and audio**, and runs across PCs, mobile devices, automobile, IoT, and robotics.
+## Demos
+### 📱 Mobile Phone NPU - Demo on Samsung S25 Ultra
+The first-ever fully local, multimodal, and conversational AI assistant that hears you and sees what you see, running **natively on Snapdragon NPU** for long battery life and low latency.
+<video controls width="720" preload="metadata"
+  src="https://huggingface.co/NexaAI/OmniNeural-4B/resolve/main/assets/MOBILE_50MB.mp4"
+  type="video/mp4"></video>
+---
+## ✨ PC NPU - Capabilities Highlights
+<table>
+<tr>
+<td width="33%">
+<video controls width="100%" preload="metadata"
+  src="https://huggingface.co/NexaAI/OmniNeural-4B/resolve/main/assets/PC_demo_2_image.mov"></video>
+<p align="center"><b>🖼️ Multi-Image Reasoning</b><br>Spot the difference across two images in multi-round dialogue.</p>
+</td>
+<td width="33%">
+<video controls width="100%" preload="metadata"
+  src="https://huggingface.co/NexaAI/OmniNeural-4B/resolve/main/assets/PC_Demo_Agent.mov"></video>
+<p align="center"><b>🤖 Image + Text → Function Call</b><br>Snap a poster, add a text instruction, and AI agent creates a calendar event.</p>
+</td>
+<td width="33%">
+<video controls width="100%" preload="metadata"
+  src="https://huggingface.co/NexaAI/OmniNeural-4B/resolve/main/assets/PC_Demo_Audio.mov"></video>
+<p align="center"><b>🎶 Multi-Audio Comparison</b><br>Tell the difference between two music clips locally.</p>
+</td>
+</tr>
+</table>
+---
+## **Key Features**
+- **Multimodal Intelligence** – Processes **text, image, and audio** in a unified model for richer reasoning and perception.
+- **NPU-Optimized Architecture** – Uses ReLU ops, sparse tensors, convolutional layers, and static graph execution for maximum throughput — **20% faster than non-NPU-aware models** .
+- **Hardware-Aware Attention** – Attention patterns tuned for NPU, lowering compute and memory demand .
+- **Native Static Graph** – Supports variable-length multimodal inputs with stable, predictable latency .
+- **Performance Gains** – **9× faster audio processing** and **3.5× faster image processing** on NPUs compared to baseline encoders .
+- **Privacy-First Inference** – All computation stays local: private, offline-capable, and cost-efficient.
+---
+## **Performance / Benchmarks**
+### Human Evaluation (vs baselines)
+- **Vision**: Wins/ties in ~75% of prompts against Apple Foundation, Gemma-3n-E4B, Qwen2.5-Omni-3B.
+- **Audio**: Clear lead over baselines, much better than Gemma3n and Apple foundation model.
+- **Text**: Matches or outperforms leading multimodal baselines.
+<p align="center">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/vsrg43GxTOSAj7q_SI60o.png" width="1560" alt="Human eval chart" />
+</p>
+### Nexa Attention Speedups
+- **9× faster** audio encoding (vs Whisper encoder).
+- **3.5× faster** image encoding (vs SigLIP encoder).
+<p align="center">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/1039SN5JBQkS04z4YnoIi.png" width="400" alt="Human eval chart" />
+</p>
+---
+## **Architecture Overview**
+OmniNeural’s design is tightly coupled with NPU hardware:
+- **NPU-friendly ops** (ReLU > GELU/SILU).
+- **Sparse + small tensor multiplications** for efficiency.
+- **Convolutional layers** favored over linear for better NPU parallelization.
+- **Hardware-aware attention** patterns to cut compute cost.
+- **Static graph execution** for predictable latency.
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/oINYbgXILJgTuKxKc1aO_.png)
+---
+## **Production Use Cases**
+- **PC & Mobile** – On-device AI agents combine **voice, vision, and text** for natural, accurate responses.
+   - Examples: Summarize slides into an email (PC)*, *extract action items from chat (mobile).
+   - Benefits: Private, offline, battery-efficient.
+- **Automotive** – In-car assistants handle **voice control, cabin safety, and environment awareness**.
+   - Examples: Detects risks (child unbuckled, pet left, loose objects) and road conditions (fog, construction).
+   - Benefits: Decisions run locally in milliseconds.
+- **IoT & Robotics** – Multimodal sensing for **factories, AR/VR, drones, and robots**.
+   - Examples: Defect detection, technician overlays, hazard spotting mid-flight, natural robot interaction.
+   - Benefits: Works without network connectivity.
+---
+## How to use
+> ⚠️ **Hardware requirement:** OmniNeural-4B currently runs **only on Qualcomm NPUs** (e.g., Snapdragon-powered AIPC).
+> Apple NPU support is planned next.
+### 1) Install Nexa-SDK
+- Download and follow the steps under "Deploy Section" Nexa's model page:  [Download Windows arm64 SDK](https://sdk.nexa.ai/model/OmniNeural-4B)
+- (Other platforms coming soon)
+### 2) Get an access token
+Create a token in the Model Hub, then log in:
+```bash
+nexa config set license '<access_token>'
+```
+### 3) Run the model
+Running:
+```bash
+nexa infer NexaAI/OmniNeural-4B
+```
+/mic mode. Once the model is running, you can type below to record your voice directly in terminal
+```bash
+> /mic
+```
+For images and audio, simply drag your files into the command line. Remember to leave space between file paths.
+---
+## Links & Community
+[![Discord](https://img.shields.io/badge/Discord-Join-5865F2?logo=discord&logoColor=white)](https://discord.com/invite/nexa-ai)
+[![X (Twitter) Follow](https://img.shields.io/badge/Follow-@nexa_ai-111?logo=x&logoColor=white)](https://x.com/nexa_ai)
+[![Website](https://img.shields.io/badge/Website-nexa.ai-0A84FF)](https://nexa.ai)
+- **Issues / Feedback:** Use the **HF Discussions** tab or submit an issue in our discord or nexa-sdk github.
+- **Roadmap & updates:** Follow us on X and Discord.
+> If you want to see more **NPU-first, multimodal** releases on HF, please give our model a like ❤️.
+## Limitation
+The current model is mainly optimized for English. We will optimize other language as the next step.
+---
+## **Citation**
+```bibtex
+@misc{
+      title={OmniNeural: World’s First NPU-aware Multimodal Model},
+      author={Nexa AI},
+      year={2025},
+      url={https://huggingface.co/NexaAI/OmniNeural-4B},
+}
+```

config.json ADDED Viewed

File without changes