Zero Input AI Model
Model Description
The Zero Input AI (ZIA) Model predicts user intent from multimodal inputs without explicit user interaction. It processes gaze coordinates, heart rate, EEG signals, and contextual data (time, location, app usage) using a transformer-based architecture to classify intents (10 classes). This model enables hands-free, intent-driven interfaces.
Intended Uses
- Primary Use: Predicting user intent for hands-free applications, such as smart assistants or accessibility tools.
- Out-of-Scope: Not intended for medical diagnostics due to synthetic training data.
Training Data
- Dataset: Synthetic data with 10,000 samples, each containing 100 timesteps of:
- Gaze:
(100, 2)
(x, y screen coordinates) - Heart Rate:
(100,)
(beats per minute) - EEG:
(100, 4)
(4-channel signals) - Time:
(100, 32)
(sinusoidal encodings) - Location:
(100, 3)
(one-hot encoded: home, work, public) - Usage:
(100, 20)
(one-hot encoded app IDs) - Intent: Scalar (0-9)
- Gaze:
- Generated using custom scripts (
generate_zia_data.py
andpreprocess_zia_data.py
).
Performance Metrics
- Validation Accuracy: ~10-20% (limited by random synthetic data; real data expected to improve performance).
- Trained for 10 epochs with Adam optimizer (learning rate: 2e-5).
Usage Instructions
import torch
from zia_model import ZIAModel
# Load model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = ZIAModel().to(device)
model.load_state_dict(torch.load("zia_model.pt"))
model.eval()
# Example input (replace with real data)
gaze = torch.randn(1, 100, 2) # [batch, seq, features]
hr = torch.randn(1, 100) # [batch, seq]
eeg = torch.randn(1, 100, 4) # [batch, seq, features]
context = torch.randn(1, 100, 55) # [batch, seq, time+location+usage]
with torch.no_grad():
logits = model(gaze, hr, eeg, context)
predicted_intent = torch.argmax(logits, dim=1)
Limitations and Biases
- Synthetic Data: Trained on randomly generated data, which may not reflect real-world multimodal patterns.
- Accuracy: Low due to lack of correlations in training data.
- Generalization: Performance on real EEG, gaze, or heart rate data is untested.
- Biases: Synthetic data assumes uniform distributions, potentially missing diverse user behaviors.
Ethical Considerations
- Privacy: EEG and gaze data are highly sensitive. Real-world applications must ensure robust privacy protections.
- Bias: Synthetic data may not represent diverse populations, potentially leading to biased predictions.
- Misuse: Ensure the model is used ethically, avoiding unauthorized surveillance or profiling.
License
This model is licensed under the MIT License. See the LICENSE
file for details.
Acknowledgments
Developed as a prototype for Zero Input AI research, inspired by multimodal intent prediction concepts. ''')
Save LICENSE file (MIT)
with open(os.path.join(local_dir, "LICENSE"), "w") as f: f.write(''' MIT License
Copyright (c) 2025 [Aditi De]
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ''')