zhuoyanxu
/

ada-llava-L-v1.5-7b

Image-Text-to-Text

ada_llava_llama

text-generation

Model card Files Files and versions Community

zhuoyanxu commited on 9 days ago

Commit

5b3e371

·

verified ·

1 Parent(s): 0493528

Update README.md

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -8,12 +8,12 @@ base_model:
 - liuhaotian/llava-v1.5-7b
 ---
-```markdown
 # Ada-LLaVA Model Card
 <!-- Provide a quick summary of what the model is/does. -->
-Ada-LLaVA 7B is an open-source adaptive inference framework for multimodal Large Language Models (MLLMs) that dynamically adjusts its operations based on available computational resources and latency requirements.
 See the paper for more details: [Learning to Inference Adaptively for Multimodal Large Language Models](https://huggingface.co/papers/2503.10905)
@@ -55,4 +55,3 @@ AdaLLaVA is based on LLaVA-1.5 and thus follows its license. Llama 2 is licensed
 ## Limitations
 While Ada-LLaVA is currently limited to processing one image at a time and only applies adaptive operations in its later half of layers, future work could explore multi-image input support and extend the adaptive mechanisms throughout the entire model architecture, including the vision encoder. These improvements would make the model more versatile and applicable to a broader range of real-world scenarios.
-```

 - liuhaotian/llava-v1.5-7b
 ---
 # Ada-LLaVA Model Card
 <!-- Provide a quick summary of what the model is/does. -->
+Ada-LLaVA-L-7B is an open-source adaptive inference framework for multimodal Large Language Models (MLLMs) that dynamically adjusts its operations based on available computational resources and latency requirements.
 See the paper for more details: [Learning to Inference Adaptively for Multimodal Large Language Models](https://huggingface.co/papers/2503.10905)
 ## Limitations
 While Ada-LLaVA is currently limited to processing one image at a time and only applies adaptive operations in its later half of layers, future work could explore multi-image input support and extend the adaptive mechanisms throughout the entire model architecture, including the vision encoder. These improvements would make the model more versatile and applicable to a broader range of real-world scenarios.