zwli commited on
Commit
810b929
1 Parent(s): 692cc62

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -4
README.md CHANGED
@@ -1,4 +1,76 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
- GroundingGPT-7B
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # GroundingGPT: Language-Enhanced Multi-modal Grounding Model
2
+
3
+ <a href='https://lzw-lzw.github.io/GroundingGPT.github.io/'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://arxiv.org/abs/2401.06071'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> [![](https://img.shields.io/badge/Datasets-GroundingGPT-yellow)](https://huggingface.co/datasets/zwli/GroundingGPT)
4
+
5
+
6
+ ## Introduction
7
+ GroundingGPT is an end-to-end multimodal grounding model that accurately comprehends inputs and possesses robust grounding capabilities across multi modalities,including images, audios, and videos. To address the issue of limited data, we construct a diverse and high-quality multimodal training dataset. This dataset encompasses a rich collection of multimodal data enriched with spatial and temporal information, thereby serving as a valuable resource to foster further advancements in this field. Extensive experimental evaluations validate the effectiveness of the GroundingGPT model in understanding and grounding tasks across various modalities.
8
+
9
+ More details are available in our [project page](https://lzw-lzw.github.io/GroundingGPT.github.io/).
10
+
11
+ ## News
12
+ * **[2024.4]** Our [model](https://huggingface.co/zwli/GroundingGPT) is available now!
13
+ * **[2024.3]** Our [training dataset](https://huggingface.co/datasets/zwli/GroundingGPT) are available now!
14
+ * **[2024.3]** Our code are available now!
15
+
16
+ ## Dependencies and Installation
17
+ git clone https://github.com/lzw-lzw/GroundingGPT.git
18
+ cd GroundingGPT
19
+ conda create -n groundinggpt python=3.10 -y
20
+ conda activate groundinggpt
21
+ pip install -r requirements.txt
22
+ pip install flash-attn --no-build-isolation
23
+
24
+
25
+ ## Training
26
+ ### Training model preparation
27
+ - Put the prepared checkpoints in directory `./ckpt`.
28
+ - Prepare ImageBind checkpoint: download [imagebind_huge.pth](https://dl.fbaipublicfiles.com/imagebind/imagebind_huge.pth) in link and put it under directory `./ckpt/imagebind`.
29
+ - Prepare blip2 checkpoint: download [blip2_pretrained_flant5xxl.pth](https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/blip2_pretrained_flant5xxl.pth) in link and put it under directory `./ckpt`.
30
+
31
+ ### Training dataset preparation
32
+ - Please put the prepared checkpoints in file `dataset`.
33
+ - Prepare LLaVA, COCO, GQA, OCR-VQA, TextVQA, VisualGenome datasets: follow [LLaVA](https://github.com/haotian-liu/LLaVA).
34
+ - Prepare Flickr30K-Entities datasets: follow [Flickr30K-Entities](https://bryanplummer.com/Flickr30kEntities/).
35
+ - Prepare Valley datasets: follow [Valley](https://github.com/RupertLuo/Valley).
36
+ - Prepare DiDeMO datasets: follow [DiDeMO](https://github.com/LisaAnne/TemporalLanguageRelease).
37
+ - Prepare ActivityNet Captions datasets: follow [ActivityNet Captions](https://cs.stanford.edu/people/ranjaykrishna/densevid/).
38
+ - Prepare Charades-STA datasets: follow [Charades-STA](https://github.com/jiyanggao/TALL).
39
+ - Prepare VGGSS datasets: follow [VGGSS](https://www.robots.ox.ac.uk/~vgg/research/lvs/).
40
+ - Prepare WaveCaps datasets: follow [WaveCaps](https://github.com/XinhaoMei/WavCaps).
41
+ - Prepare Clotho datasets: follow [Clotho](https://zenodo.org/records/3490684).
42
+
43
+
44
+ ### Training
45
+
46
+
47
+
48
+ ## Inference
49
+
50
+ - Download [GroundingGPT-7B](https://huggingface.co/zwli/GroundingGPT) and change the model_path in `GroundingGPT/lego/serve/cli.py`
51
+ - Use the script to inference
52
+
53
+ python3 lego/serve/cli.py
54
+
55
+
56
+ ## Demo
57
+ - Download [GroundingGPT-7B](https://huggingface.co/zwli/GroundingGPT) and change the model_path in line 141 of `GroundingGPT/lego/serve/gradio_web_server.py`
58
+ - Use the script to launch a gradio web demo
59
+
60
+ python3 lego/serve/gradio_web_server.py
61
+
62
+
63
+ ## Acknowledgement
64
+ - [LLaVA](https://github.com/haotian-liu/LLaVA)
65
+ - [Video-LLaMA](https://github.com/DAMO-NLP-SG/Video-LLaMA)
66
+ - [Shikra](https://github.com/shikras/shikra)
67
+
68
+ ### Citation
69
+ If you find GroundingGPT useful for your your research and applications, please cite using this BibTeX:
70
+
71
+ @article{li2024lego,
72
+ title={LEGO: Language Enhanced Multi-modal Grounding Model},
73
+ author={Li, Zhaowei and Xu, Qi and Zhang, Dong and Song, Hang and Cai, Yiqing and Qi, Qi and Zhou, Ran and Pan, Junting and Li, Zefeng and Vu, Van Tu and others},
74
+ journal={arXiv preprint arXiv:2401.06071},
75
+ year={2024}
76
+ }