Update README.md
Browse files
README.md
CHANGED
@@ -15,7 +15,9 @@ pipeline_tag: image-feature-extraction
|
|
15 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/AUE-3OBtfr9vDA7Elgkhd.webp" alt="Image Description" width="300" height="300">
|
16 |
</p>
|
17 |
|
18 |
-
\[
|
|
|
|
|
19 |
|
20 |
We develop InternViT-6B-448px-V1-5 based on the pre-training of the strong foundation of [InternViT-6B-448px-V1.2](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2). In this update, the resolution of training images is expanded from 448×448 to dynamic 448×448, where the basic tile size is 448×448 and the number of tiles ranges from 1 to 12.
|
21 |
Additionally, we enhance the data scale, quality, and diversity of the pre-training dataset, resulting in the powerful robustness, OCR capability, and high-resolution processing capability of our
|
@@ -82,6 +84,12 @@ If you find this project useful in your research, please consider citing:
|
|
82 |
journal={arXiv preprint arXiv:2312.14238},
|
83 |
year={2023}
|
84 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
85 |
```
|
86 |
|
87 |
|
|
|
15 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/AUE-3OBtfr9vDA7Elgkhd.webp" alt="Image Description" width="300" height="300">
|
16 |
</p>
|
17 |
|
18 |
+
[\[π Blog\]](https://internvl.github.io/blog/) [\[π InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) [\[π InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821) [\[π¨οΈ Chat Demo\]](https://internvl.opengvlab.com/)
|
19 |
+
|
20 |
+
[\[π€ HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[π Quick Start\]](#model-usage) [\[π Community-hosted API\]](https://rapidapi.com/adushar1320/api/internvl-chat) [\[π δΈζ解读\]](https://zhuanlan.zhihu.com/p/675877376)
|
21 |
|
22 |
We develop InternViT-6B-448px-V1-5 based on the pre-training of the strong foundation of [InternViT-6B-448px-V1.2](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2). In this update, the resolution of training images is expanded from 448×448 to dynamic 448×448, where the basic tile size is 448×448 and the number of tiles ranges from 1 to 12.
|
23 |
Additionally, we enhance the data scale, quality, and diversity of the pre-training dataset, resulting in the powerful robustness, OCR capability, and high-resolution processing capability of our
|
|
|
84 |
journal={arXiv preprint arXiv:2312.14238},
|
85 |
year={2023}
|
86 |
}
|
87 |
+
@article{chen2024far,
|
88 |
+
title={How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites},
|
89 |
+
author={Chen, Zhe and Wang, Weiyun and Tian, Hao and Ye, Shenglong and Gao, Zhangwei and Cui, Erfei and Tong, Wenwen and Hu, Kongzhi and Luo, Jiapeng and Ma, Zheng and others},
|
90 |
+
journal={arXiv preprint arXiv:2404.16821},
|
91 |
+
year={2024}
|
92 |
+
}
|
93 |
```
|
94 |
|
95 |
|