Image-Text-to-Text

Add model card for UniVLG

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +24 -0
README.md ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: image-text-to-text
3
+ license: cc-by-nc-4.0
4
+ ---
5
+
6
+ # UniVLG: Unifying 2D and 3D Vision-Language Understanding
7
+
8
+ This repository contains the UniVLG model, as presented in [Unifying 2D and 3D Vision-Language Understanding](https://arxiv.org/abs/2503.10745). UniVLG is a unified architecture for 2D and 3D vision-language understanding.
9
+
10
+ Project page: https://univlg.github.io
11
+
12
+ The model uses a custom loading tool (`uvx`). Checkpoints are available on Hugging Face: [Hugging Face](https://huggingface.co/katefgroup/UniVLG/tree/main). See the [GitHub repository](https://github.com/ayushjain1144/univlg) for code and instructions.
13
+
14
+ ## Citation
15
+ ```
16
+ @article{jain2025unifying,
17
+ title={Unifying 2D and 3D Vision-Language Understanding},
18
+ author={Jain, Ayush and Swerdlow, Alexander and Wang, Yuzhou and Arnaud, Sergio and Martin, Ada and Sax, Alexander and Meier, Franziska and Fragkiadaki, Katerina},
19
+ journal={arXiv preprint arXiv:2503.10745},
20
+ year={2025}
21
+ }
22
+ ```
23
+
24
+ **License Note:** The majority of UniVLG is licensed under CC-BY-NC, however, portions of the project (specifically Odin and Pointcept) are available under separate MIT license terms. Please refer to the GitHub repository for details.