Update README.md
Browse filesAdd disclaimer about model size
README.md
CHANGED
@@ -158,6 +158,12 @@ The architecture of granite-vision-3.1-2b-preview consists of the following comp
|
|
158 |
|
159 |
We built upon LlaVA (https://llava-vl.github.io) to train our model. We use multi-layer encoder features and a denser grid resolution in AnyRes to enhance the model's ability to understand nuanced visual content, which is essential for accurately interpreting document images.
|
160 |
|
|
|
|
|
|
|
|
|
|
|
|
|
161 |
|
162 |
**Training Data:**
|
163 |
|
|
|
158 |
|
159 |
We built upon LlaVA (https://llava-vl.github.io) to train our model. We use multi-layer encoder features and a denser grid resolution in AnyRes to enhance the model's ability to understand nuanced visual content, which is essential for accurately interpreting document images.
|
160 |
|
161 |
+
_Note:_
|
162 |
+
|
163 |
+
We denote our model as Granite-Vision-3.1-2B-Preview, where the version (3.1) and size (2B) of the base large language model
|
164 |
+
are explicitly indicated. However, when considering the integrated vision encoder and projector, the total parameter count of our
|
165 |
+
model increases to 3 billion parameters.
|
166 |
+
|
167 |
|
168 |
**Training Data:**
|
169 |
|