Update README.md
Browse files
README.md
CHANGED
@@ -36,4 +36,21 @@ tags:
|
|
36 |
| **Precision** | bfloat16 |
|
37 |
|
38 |
> [!note]
|
39 |
-
> The open dataset image-text response will be updated soon.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
| **Precision** | bfloat16 |
|
37 |
|
38 |
> [!note]
|
39 |
+
> The open dataset image-text response will be updated soon.
|
40 |
+
|
41 |
+
## References
|
42 |
+
|
43 |
+
- **DocVLM: Make Your VLM an Efficient Reader**
|
44 |
+
[https://arxiv.org/pdf/2412.08746v1](https://arxiv.org/pdf/2412.08746v1)
|
45 |
+
|
46 |
+
- **YaRN: Efficient Context Window Extension of Large Language Models**
|
47 |
+
[https://arxiv.org/pdf/2309.00071](https://arxiv.org/pdf/2309.00071)
|
48 |
+
|
49 |
+
- **Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution**
|
50 |
+
[https://arxiv.org/pdf/2409.12191](https://arxiv.org/pdf/2409.12191)
|
51 |
+
|
52 |
+
- **Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond**
|
53 |
+
[https://arxiv.org/pdf/2308.12966](https://arxiv.org/pdf/2308.12966)
|
54 |
+
|
55 |
+
- **A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy**
|
56 |
+
[https://arxiv.org/pdf/2412.02210](https://arxiv.org/pdf/2412.02210)
|