cerebras
/

Cerebras-LLaVA-13B

Text Generation

Inference Endpoints

Model card Files Files and versions Community

aarticerebras commited on Mar 19, 2024

Commit

38588bf

·

verified ·

1 Parent(s): 8956b37

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -3,7 +3,7 @@
 ---
 # Model Card for cerebras/Cerebras-LLaVA-13B
 The checkpoints consists of Language encoder and projector weights of multimodal LLaVA-13B model trained with our Cerebras implementation and training recipe.
-The vision encoder checkpoints for this model can be found at [cerebras/Cerebras-ViT-L-336-patch14-llava7b-ShareGPT4V](https://huggingface.co/cerebras/Cerebras-ViT-L-336-patch14-llava7b-ShareGPT4V)
 **Note**: _ShareGPT4V_ is added to the vision model name to ensure correct loading of checkpoints in [LLaVA source repo](https://github.com/haotian-liu/LLaVA/blob/main/llava/model/multimodal_encoder/builder.py#L8)
@@ -14,7 +14,8 @@ Cerebras-Llava is licensed under the LLAMA 2 Community License, Copyright (c) Me
 ## Model Architecture
 Cerebras-LLaVA-13B is a transformer model with the following architecture details
-* Vision encoder: [CLIP-VisionModel-Large](cerebras/Cerebras-ViT-L-336-patch14-llava7b-ShareGPT4V). It handles images of size 336 x 336 with patch size of 14
 * Large Language Model: Pretrained from Vicuna-13B checkpoints and instruction finetuned on various datasets.
 * Projector: the projector module that connects the LLM and Vision encoder part consists of two linear layers with gelu activation (mlp2x-gelu)

 ---
 # Model Card for cerebras/Cerebras-LLaVA-13B
 The checkpoints consists of Language encoder and projector weights of multimodal LLaVA-13B model trained with our Cerebras implementation and training recipe.
+The vision encoder checkpoints for this model can be found at [cerebras/Cerebras-ViT-L-336-patch14-llava13b-ShareGPT4V](https://huggingface.co/cerebras/Cerebras-ViT-L-336-patch14-llava13b-ShareGPT4V)
 **Note**: _ShareGPT4V_ is added to the vision model name to ensure correct loading of checkpoints in [LLaVA source repo](https://github.com/haotian-liu/LLaVA/blob/main/llava/model/multimodal_encoder/builder.py#L8)
 ## Model Architecture
 Cerebras-LLaVA-13B is a transformer model with the following architecture details
+* Vision encoder: [CLIP-VisionModel-Large](https://huggingface.co/cerebras/Cerebras-ViT-L-336-patch14-llava13b-ShareGPT4V)
+. It handles images of size 336 x 336 with patch size of 14
 * Large Language Model: Pretrained from Vicuna-13B checkpoints and instruction finetuned on various datasets.
 * Projector: the projector module that connects the LLM and Vision encoder part consists of two linear layers with gelu activation (mlp2x-gelu)