aarticerebras commited on
Commit
38588bf
·
verified ·
1 Parent(s): 8956b37

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -3,7 +3,7 @@
3
  ---
4
  # Model Card for cerebras/Cerebras-LLaVA-13B
5
  The checkpoints consists of Language encoder and projector weights of multimodal LLaVA-13B model trained with our Cerebras implementation and training recipe.
6
- The vision encoder checkpoints for this model can be found at [cerebras/Cerebras-ViT-L-336-patch14-llava7b-ShareGPT4V](https://huggingface.co/cerebras/Cerebras-ViT-L-336-patch14-llava7b-ShareGPT4V)
7
 
8
  **Note**: _ShareGPT4V_ is added to the vision model name to ensure correct loading of checkpoints in [LLaVA source repo](https://github.com/haotian-liu/LLaVA/blob/main/llava/model/multimodal_encoder/builder.py#L8)
9
 
@@ -14,7 +14,8 @@ Cerebras-Llava is licensed under the LLAMA 2 Community License, Copyright (c) Me
14
 
15
  ## Model Architecture
16
  Cerebras-LLaVA-13B is a transformer model with the following architecture details
17
- * Vision encoder: [CLIP-VisionModel-Large](cerebras/Cerebras-ViT-L-336-patch14-llava7b-ShareGPT4V). It handles images of size 336 x 336 with patch size of 14
 
18
  * Large Language Model: Pretrained from Vicuna-13B checkpoints and instruction finetuned on various datasets.
19
  * Projector: the projector module that connects the LLM and Vision encoder part consists of two linear layers with gelu activation (mlp2x-gelu)
20
 
 
3
  ---
4
  # Model Card for cerebras/Cerebras-LLaVA-13B
5
  The checkpoints consists of Language encoder and projector weights of multimodal LLaVA-13B model trained with our Cerebras implementation and training recipe.
6
+ The vision encoder checkpoints for this model can be found at [cerebras/Cerebras-ViT-L-336-patch14-llava13b-ShareGPT4V](https://huggingface.co/cerebras/Cerebras-ViT-L-336-patch14-llava13b-ShareGPT4V)
7
 
8
  **Note**: _ShareGPT4V_ is added to the vision model name to ensure correct loading of checkpoints in [LLaVA source repo](https://github.com/haotian-liu/LLaVA/blob/main/llava/model/multimodal_encoder/builder.py#L8)
9
 
 
14
 
15
  ## Model Architecture
16
  Cerebras-LLaVA-13B is a transformer model with the following architecture details
17
+ * Vision encoder: [CLIP-VisionModel-Large](https://huggingface.co/cerebras/Cerebras-ViT-L-336-patch14-llava13b-ShareGPT4V)
18
+ . It handles images of size 336 x 336 with patch size of 14
19
  * Large Language Model: Pretrained from Vicuna-13B checkpoints and instruction finetuned on various datasets.
20
  * Projector: the projector module that connects the LLM and Vision encoder part consists of two linear layers with gelu activation (mlp2x-gelu)
21