YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Pretrained Weights of Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks (RSS 2025)

The model is trained on samples collected from the training splits of VLN-CE R2R and RxR, EVT-Bench, ObjectNav, EQA.

Evaliation Benchmark	TL	NE	OS	SR	SPL
VLN-CE R2R Val.	9.22	4.96	57.4	51.8	47.7
VLN-CE RxR Val.	18.4	5.67	64.4	66.4	44.5

The related inference code can be found in here

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support