ibm-granite
/

granite-vision-3.1-2b-preview

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions Community

aarbelle commited on Feb 11

Commit

fdb5d25

·

verified ·

1 Parent(s): fb78cf4

Update README.md

Add disclaimer about model size

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -158,6 +158,12 @@ The architecture of granite-vision-3.1-2b-preview consists of the following comp
 We built upon LlaVA (https://llava-vl.github.io) to train our model. We use multi-layer encoder features and a denser grid resolution in AnyRes to enhance the model's ability to understand nuanced visual content, which is essential for accurately interpreting document images.
 **Training Data:**

 We built upon LlaVA (https://llava-vl.github.io) to train our model. We use multi-layer encoder features and a denser grid resolution in AnyRes to enhance the model's ability to understand nuanced visual content, which is essential for accurately interpreting document images.
+_Note:_
+We denote our model as Granite-Vision-3.1-2B-Preview, where the version (3.1) and size (2B) of the base large language model
+are explicitly indicated. However, when considering the integrated vision encoder and projector, the total parameter count of our
+model increases to 3 billion parameters.
 **Training Data:**