aarticerebras
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -3,7 +3,7 @@
|
|
3 |
---
|
4 |
# Model Card for cerebras/Cerebras-LLaVA-13B
|
5 |
The checkpoints consists of Language encoder and projector weights of multimodal LLaVA-13B model trained with our Cerebras implementation and training recipe.
|
6 |
-
The vision encoder checkpoints for this model can be found at [cerebras/Cerebras-ViT-L-336-patch14-
|
7 |
|
8 |
**Note**: _ShareGPT4V_ is added to the vision model name to ensure correct loading of checkpoints in [LLaVA source repo](https://github.com/haotian-liu/LLaVA/blob/main/llava/model/multimodal_encoder/builder.py#L8)
|
9 |
|
@@ -14,7 +14,8 @@ Cerebras-Llava is licensed under the LLAMA 2 Community License, Copyright (c) Me
|
|
14 |
|
15 |
## Model Architecture
|
16 |
Cerebras-LLaVA-13B is a transformer model with the following architecture details
|
17 |
-
* Vision encoder: [CLIP-VisionModel-Large](cerebras/Cerebras-ViT-L-336-patch14-
|
|
|
18 |
* Large Language Model: Pretrained from Vicuna-13B checkpoints and instruction finetuned on various datasets.
|
19 |
* Projector: the projector module that connects the LLM and Vision encoder part consists of two linear layers with gelu activation (mlp2x-gelu)
|
20 |
|
|
|
3 |
---
|
4 |
# Model Card for cerebras/Cerebras-LLaVA-13B
|
5 |
The checkpoints consists of Language encoder and projector weights of multimodal LLaVA-13B model trained with our Cerebras implementation and training recipe.
|
6 |
+
The vision encoder checkpoints for this model can be found at [cerebras/Cerebras-ViT-L-336-patch14-llava13b-ShareGPT4V](https://huggingface.co/cerebras/Cerebras-ViT-L-336-patch14-llava13b-ShareGPT4V)
|
7 |
|
8 |
**Note**: _ShareGPT4V_ is added to the vision model name to ensure correct loading of checkpoints in [LLaVA source repo](https://github.com/haotian-liu/LLaVA/blob/main/llava/model/multimodal_encoder/builder.py#L8)
|
9 |
|
|
|
14 |
|
15 |
## Model Architecture
|
16 |
Cerebras-LLaVA-13B is a transformer model with the following architecture details
|
17 |
+
* Vision encoder: [CLIP-VisionModel-Large](https://huggingface.co/cerebras/Cerebras-ViT-L-336-patch14-llava13b-ShareGPT4V)
|
18 |
+
. It handles images of size 336 x 336 with patch size of 14
|
19 |
* Large Language Model: Pretrained from Vicuna-13B checkpoints and instruction finetuned on various datasets.
|
20 |
* Projector: the projector module that connects the LLM and Vision encoder part consists of two linear layers with gelu activation (mlp2x-gelu)
|
21 |
|