Sa2VA-i: Improving Sa2VA Results with Consistent Training and Inference

Acknowledgement

We thank Sa2VA authors for their contribution.

@article{sa2va,
  title={Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos},
  author={Yuan, Haobo and Li, Xiangtai and Zhang, Tao and Huang, Zilong Huang and Xu, Shilin and Ji, Shunping and Tong, Yunhai and Qi, Lu and Feng, Jiashi and Yang, Ming-Hsuan},
  journal={arXiv preprint},
  year={2025}
}

Downloads last month: 6

Safetensors

Model size

3.95B params

Tensor type

F32

BF16

Model tree for kumuji/Sa2VA-i-4B

ByteDance/Sa2VA-4B

Merge model

this model