Inst-IT
/

LLaVA-Next-Inst-It-Vicuna-7B

Video-Text-to-Text

instance-understanding

Model card Files Files and versions Community

wjpoom commited on Feb 10

Commit

0d1e383

·

verified ·

1 Parent(s): 6036340

Update README.md

Files changed (1) hide show

README.md +10 -3

README.md CHANGED Viewed

@@ -160,11 +160,18 @@ model-index:
 ---
-# LLaVA-Next-Inst-It-Vicuna-7B: A Multimodal Model that Excels at Instance-level Understanding
-introduced in the paper [Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning](https://huggingface.co/papers/2412.03565)
 [**🌐 Homepage**](https://inst-it.github.io/) | [**Code**](https://github.com/inst-it/inst-it) | [**🤗 Paper**](https://huggingface.co/papers/2412.03565) | [**📖 arXiv**](https://arxiv.org/abs/2412.03565)
 ## Quick Start
 **Install**

 ---
+# LLaVA-Next-Inst-It-Vicuna-7B
 [**🌐 Homepage**](https://inst-it.github.io/) | [**Code**](https://github.com/inst-it/inst-it) | [**🤗 Paper**](https://huggingface.co/papers/2412.03565) | [**📖 arXiv**](https://arxiv.org/abs/2412.03565)
+LLaVA-Next-Inst-It-Vicuna-7B is a multimodal model that excels at instance-level understanding,
+which is introduced in the paper [Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning](https://huggingface.co/papers/2412.03565)
+* **Architecture**: clip-vit-large-patch14-336 + Vicuna-7B
+* **Initialized Model**: LLaVA-NeXT
+* **Data**: LLaVA-NeXT-Data / Inst-IT-Dataset
+* **Precision**: bfloat16
 ## Quick Start
 **Install**