YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Pretrained Weights of Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks (RSS 2025)

Paper: https://arxiv.org/pdf/2412.06224

The model is trained on samples collected from the training splits of VLN-CE R2R and RxR, EVT-Bench, ObjectNav, EQA.

Evaliation Benchmark TL NE OS SR SPL
VLN-CE R2R Val. 9.22 4.96 57.4 51.8 47.7
VLN-CE RxR Val. 18.4 5.67 64.4 66.4 44.5

The related inference code can be found in here

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support