prithivMLmods
/

coreOCR-7B-050325-preview

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

prithivMLmods commited on May 4

Commit

6eba2e6

·

verified ·

1 Parent(s): 4cd14fd

Update README.md

Files changed (1) hide show

README.md +18 -1

README.md CHANGED Viewed

@@ -36,4 +36,21 @@ tags:
 | **Precision**           | bfloat16                                           |
 > [!note]
-> The open dataset image-text response will be updated soon.

 | **Precision**           | bfloat16                                           |
 > [!note]
+> The open dataset image-text response will be updated soon.
+## References
+- **DocVLM: Make Your VLM an Efficient Reader**
+  [https://arxiv.org/pdf/2412.08746v1](https://arxiv.org/pdf/2412.08746v1)
+- **YaRN: Efficient Context Window Extension of Large Language Models**
+  [https://arxiv.org/pdf/2309.00071](https://arxiv.org/pdf/2309.00071)
+- **Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution**
+  [https://arxiv.org/pdf/2409.12191](https://arxiv.org/pdf/2409.12191)
+- **Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond**
+  [https://arxiv.org/pdf/2308.12966](https://arxiv.org/pdf/2308.12966)
+- **A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy**
+  [https://arxiv.org/pdf/2412.02210](https://arxiv.org/pdf/2412.02210)