VRSight Object Detection Model
Fine-tuned YOLOv8n model for detecting UI elements and interactive objects in virtual reality environments. This model powers the VRSight system, a post hoc 3D screen reader for blind and low vision VR users.
Model Weights: best.pt
(available in the Files tab)
Full System: github.com/MadisonAbilityLab/VRSight
Paper: VRSight (UIST 2025)
Training Dataset: UWMadAbility/DISCOVR
Developed by: Daniel Killough, Justin Feng, Zheng Xue Ching, Daniel Wang, Rithvik Dyava, Yapeng Tian*, Yuhang Zhao
Affiliations: University of Wisconsin-Madison, *University of Texas at Dallas
Quick Start
Installation & Download
pip install ultralytics
# Download model weights
wget -O best.pt https://huggingface.co/UWMadAbility/VRSight/resolve/main/best.pt
Basic Usage
from ultralytics import YOLO
# Load model
model = YOLO('best.pt')
# Run inference on VR screenshot
results = model('vr_screenshot.jpg')
# Process results
for result in results:
boxes = result.boxes
for box in boxes:
class_id = int(box.cls[0])
confidence = float(box.conf[0])
bbox = box.xyxy[0].tolist()
print(f"Class: {model.names[class_id]}")
print(f"Confidence: {confidence:.2f}")
print(f"BBox: {bbox}")
Batch Processing
results = model.predict(
source='vr_screenshots/',
save=True,
conf=0.25,
device='0' # GPU 0, or 'cpu'
)
Model Details
Architecture
- Base: YOLOv8n (Nano variant - optimized for real-time performance)
- Input: 640×640 pixels
- Output: Bounding boxes with class predictions and confidence scores
- Classes: 30 VR object types across 6 categories
Performance
Metric | Test Set |
---|---|
mAP@50 | 67.3% |
mAP@75 | 49.5% |
mAP | 46.3% |
Inference Speed | ~20-30+ FPS |
Key Finding: Base YOLOv8n trained on COCO rarely detected VR objects, demonstrating the necessity of VR-specific training data. See Table 1 in the paper for per-class metrics.
Object Classes (30 Total)
The model detects 6 categories of VR objects:
Avatars: avatar, avatar-nonhuman, chat-bubble, chat-box
Informational: sign-text, ui-text, sign-graphic, menu, ui-graphic, progress-bar, hud, indicator-mute
Interactables: interactable, button, target, portal, writing-utensil, watch, writing-surface, spawner
Safety: guardian, out-of-bounds
Seating: seat-single, table, seat-multiple, campfire
VR System: hand, controller, dashboard, locomotion-target
See the paper (Table 1) for detailed descriptions and per-class performance.
Training Details
Dataset
- DISCOVR: 17,691 labeled images from 17 social VR apps
- Train: 15,207 images | Val: 1,645 images | Test: 839 images
- Augmentation: Horizontal/vertical flips, rotation, scaling, shearing, HSV jittering
Training Configuration
- GPU: NVIDIA A100
- Epochs: 250
- Image Size: 640×640
- Method: Fine-tuned from YOLOv8n pretrained weights
VRSight System Integration
This model is one component of the complete VRSight system, which combines:
- This object detection model (detects VR objects)
- Depth estimation (DepthAnythingV2)
- GPT-4o (scene atmosphere and detailed descriptions)
- OCR (text reading)
- Spatial audio (TTS -> WebVR app e.g., PlayCanvas)
To use the full VRSight system, see the GitHub repository.
Limitations
- VR-specific: Trained on social VR apps - performance varies on other VR types
- Lighting: Reduced accuracy in dark environments
- Coverage: 30 classes cover common social VR objects but not all possible VR elements
- Application types: Best performance in social VR; may struggle with faster-paced games
See Section 7.2 of the paper for detailed discussion.
Citation
Please cite use of this model, the DISCOVR dataset, or the fine-tuned object detection model using the VRSight paper:
@inproceedings{killough2025vrsight,
title={VRSight: An AI-Driven Scene Description System to Improve Virtual Reality Accessibility for Blind People},
author={Killough, Daniel and Feng, Justin and Ching, Zheng Xue and Wang, Daniel and Dyava, Rithvik and Tian, Yapeng and Zhao, Yuhang},
booktitle={Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology},
pages={1--17},
year={2025},
publisher={ACM},
address={Busan, Republic of Korea},
doi={10.1145/3746059.3747641}
}
License
CC BY 4.0 - Free to use with attribution
Contact
- GitHub Issues: github.com/MadisonAbilityLab/VRSight/issues
- Paper: dl.acm.org/doi/full/10.1145/3746059.3747641
- Lead Author: Daniel Killough (UW-Madison MadAbility Lab)
Related Resources
- VRSight GitHub - Complete system implementation
- DISCOVR Dataset - Training data
- UIST 2025 Paper - Research paper
- Video Demo - System in action
- Downloads last month
- 64
Model tree for UWMadAbility/VRSight
Base model
Ultralytics/YOLOv8