YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Pretrained Weights of Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks (RSS 2025)
Paper: https://arxiv.org/pdf/2412.06224
The model is trained on samples collected from the training splits of VLN-CE R2R and RxR, EVT-Bench, ObjectNav, EQA.
Evaliation Benchmark | TL | NE | OS | SR | SPL |
---|---|---|---|---|---|
VLN-CE R2R Val. | 9.22 | 4.96 | 57.4 | 51.8 | 47.7 |
VLN-CE RxR Val. | 18.4 | 5.67 | 64.4 | 66.4 | 44.5 |
The related inference code can be found in here
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support