zackli4ai commited on
Commit
8599582
·
verified ·
1 Parent(s): 5e5afd3

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +182 -0
  2. config.json +0 -0
README.md ADDED
@@ -0,0 +1,182 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - multimodal
4
+ - NPU
5
+ - On-device
6
+ - Snapdragon PC
7
+ - Android
8
+ license: other
9
+ license_name: nexa-research
10
+ license_link: LICENSE
11
+ ---
12
+ <p align="center">
13
+ <img alt="omnineural" src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/zRUnoWmw43fl9hrXHg0pE.png">
14
+ </p>
15
+
16
+ # **OmniNeural** — World’s First NPU-aware Multimodal Model
17
+
18
+
19
+ ## **Overview**
20
+ **OmniNeural** is the first fully multimodal model designed specifically for Neural Processing Units (NPUs). It natively understands **text, images, and audio**, and runs across PCs, mobile devices, automobile, IoT, and robotics.
21
+
22
+ ## Demos
23
+
24
+ ### 📱 Mobile Phone NPU - Demo on Samsung S25 Ultra
25
+ The first-ever fully local, multimodal, and conversational AI assistant that hears you and sees what you see, running **natively on Snapdragon NPU** for long battery life and low latency.
26
+
27
+ <video controls width="720" preload="metadata"
28
+ src="https://huggingface.co/NexaAI/OmniNeural-4B/resolve/main/assets/MOBILE_50MB.mp4"
29
+ type="video/mp4"></video>
30
+
31
+ ---
32
+
33
+ ## ✨ PC NPU - Capabilities Highlights
34
+
35
+ <table>
36
+ <tr>
37
+ <td width="33%">
38
+ <video controls width="100%" preload="metadata"
39
+ src="https://huggingface.co/NexaAI/OmniNeural-4B/resolve/main/assets/PC_demo_2_image.mov"></video>
40
+ <p align="center"><b>🖼️ Multi-Image Reasoning</b><br>Spot the difference across two images in multi-round dialogue.</p>
41
+ </td>
42
+
43
+ <td width="33%">
44
+ <video controls width="100%" preload="metadata"
45
+ src="https://huggingface.co/NexaAI/OmniNeural-4B/resolve/main/assets/PC_Demo_Agent.mov"></video>
46
+ <p align="center"><b>🤖 Image + Text → Function Call</b><br>Snap a poster, add a text instruction, and AI agent creates a calendar event.</p>
47
+ </td>
48
+
49
+ <td width="33%">
50
+ <video controls width="100%" preload="metadata"
51
+ src="https://huggingface.co/NexaAI/OmniNeural-4B/resolve/main/assets/PC_Demo_Audio.mov"></video>
52
+ <p align="center"><b>🎶 Multi-Audio Comparison</b><br>Tell the difference between two music clips locally.</p>
53
+ </td>
54
+ </tr>
55
+ </table>
56
+
57
+
58
+
59
+ ---
60
+
61
+ ## **Key Features**
62
+ - **Multimodal Intelligence** – Processes **text, image, and audio** in a unified model for richer reasoning and perception.
63
+ - **NPU-Optimized Architecture** – Uses ReLU ops, sparse tensors, convolutional layers, and static graph execution for maximum throughput — **20% faster than non-NPU-aware models** .
64
+ - **Hardware-Aware Attention** – Attention patterns tuned for NPU, lowering compute and memory demand .
65
+ - **Native Static Graph** – Supports variable-length multimodal inputs with stable, predictable latency .
66
+ - **Performance Gains** – **9× faster audio processing** and **3.5× faster image processing** on NPUs compared to baseline encoders .
67
+ - **Privacy-First Inference** – All computation stays local: private, offline-capable, and cost-efficient.
68
+
69
+ ---
70
+
71
+ ## **Performance / Benchmarks**
72
+ ### Human Evaluation (vs baselines)
73
+ - **Vision**: Wins/ties in ~75% of prompts against Apple Foundation, Gemma-3n-E4B, Qwen2.5-Omni-3B.
74
+ - **Audio**: Clear lead over baselines, much better than Gemma3n and Apple foundation model.
75
+ - **Text**: Matches or outperforms leading multimodal baselines.
76
+
77
+
78
+ <p align="center">
79
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/vsrg43GxTOSAj7q_SI60o.png" width="1560" alt="Human eval chart" />
80
+ </p>
81
+
82
+ ### Nexa Attention Speedups
83
+ - **9× faster** audio encoding (vs Whisper encoder).
84
+ - **3.5× faster** image encoding (vs SigLIP encoder).
85
+
86
+
87
+ <p align="center">
88
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/1039SN5JBQkS04z4YnoIi.png" width="400" alt="Human eval chart" />
89
+ </p>
90
+
91
+ ---
92
+
93
+ ## **Architecture Overview**
94
+ OmniNeural’s design is tightly coupled with NPU hardware:
95
+ - **NPU-friendly ops** (ReLU > GELU/SILU).
96
+ - **Sparse + small tensor multiplications** for efficiency.
97
+ - **Convolutional layers** favored over linear for better NPU parallelization.
98
+ - **Hardware-aware attention** patterns to cut compute cost.
99
+ - **Static graph execution** for predictable latency.
100
+
101
+
102
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/oINYbgXILJgTuKxKc1aO_.png)
103
+
104
+ ---
105
+
106
+ ## **Production Use Cases**
107
+
108
+ - **PC & Mobile** – On-device AI agents combine **voice, vision, and text** for natural, accurate responses.
109
+ - Examples: Summarize slides into an email (PC)*, *extract action items from chat (mobile).
110
+ - Benefits: Private, offline, battery-efficient.
111
+
112
+ - **Automotive** – In-car assistants handle **voice control, cabin safety, and environment awareness**.
113
+ - Examples: Detects risks (child unbuckled, pet left, loose objects) and road conditions (fog, construction).
114
+ - Benefits: Decisions run locally in milliseconds.
115
+
116
+ - **IoT & Robotics** – Multimodal sensing for **factories, AR/VR, drones, and robots**.
117
+ - Examples: Defect detection, technician overlays, hazard spotting mid-flight, natural robot interaction.
118
+ - Benefits: Works without network connectivity.
119
+
120
+ ---
121
+
122
+ ## How to use
123
+
124
+ > ⚠️ **Hardware requirement:** OmniNeural-4B currently runs **only on Qualcomm NPUs** (e.g., Snapdragon-powered AIPC).
125
+ > Apple NPU support is planned next.
126
+
127
+ ### 1) Install Nexa-SDK
128
+
129
+ - Download and follow the steps under "Deploy Section" Nexa's model page: [Download Windows arm64 SDK](https://sdk.nexa.ai/model/OmniNeural-4B)
130
+ - (Other platforms coming soon)
131
+
132
+ ### 2) Get an access token
133
+ Create a token in the Model Hub, then log in:
134
+
135
+ ```bash
136
+ nexa config set license '<access_token>'
137
+ ```
138
+
139
+ ### 3) Run the model
140
+ Running:
141
+
142
+ ```bash
143
+ nexa infer NexaAI/OmniNeural-4B
144
+ ```
145
+
146
+ /mic mode. Once the model is running, you can type below to record your voice directly in terminal
147
+ ```bash
148
+ > /mic
149
+ ```
150
+
151
+ For images and audio, simply drag your files into the command line. Remember to leave space between file paths.
152
+
153
+ ---
154
+
155
+ ## Links & Community
156
+
157
+ [![Discord](https://img.shields.io/badge/Discord-Join-5865F2?logo=discord&logoColor=white)](https://discord.com/invite/nexa-ai)
158
+
159
+ [![X (Twitter) Follow](https://img.shields.io/badge/Follow-@nexa_ai-111?logo=x&logoColor=white)](https://x.com/nexa_ai)
160
+
161
+ [![Website](https://img.shields.io/badge/Website-nexa.ai-0A84FF)](https://nexa.ai)
162
+
163
+ - **Issues / Feedback:** Use the **HF Discussions** tab or submit an issue in our discord or nexa-sdk github.
164
+ - **Roadmap & updates:** Follow us on X and Discord.
165
+
166
+ > If you want to see more **NPU-first, multimodal** releases on HF, please give our model a like ❤️.
167
+
168
+ ## Limitation
169
+ The current model is mainly optimized for English. We will optimize other language as the next step.
170
+
171
+ ---
172
+
173
+ ## **Citation**
174
+
175
+ ```bibtex
176
+ @misc{
177
+ title={OmniNeural: World’s First NPU-aware Multimodal Model},
178
+ author={Nexa AI},
179
+ year={2025},
180
+ url={https://huggingface.co/NexaAI/OmniNeural-4B},
181
+ }
182
+ ```
config.json ADDED
File without changes