prithivMLmods
/

DeepCaption-VLA-7B

Image-Text-to-Text

VisionLanguageAttribution

VisualUnderstanding

text-generation-inference

AttributeCaptioning

Model card Files Files and versions

prithivMLmods commited on 4 days ago

Commit

e93767f

·

verified ·

1 Parent(s): c8619b2

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -29,6 +29,8 @@ datasets:
 > The **DeepCaption-VLA-7B** model is a fine-tuned version of **Qwen2.5-VL-7B-Instruct**, tailored for **Image Captioning** and **Vision Language Attribution**. This variant is designed to generate precise, highly descriptive captions with a focus on **defining visual properties, object attributes, and scene details** across a wide spectrum of images and aspect ratios.
 # Key Highlights
 1. **Vision Language Attribution (VLA):** Specially fine-tuned to attribute and define visual properties of objects, scenes, and environments.

 > The **DeepCaption-VLA-7B** model is a fine-tuned version of **Qwen2.5-VL-7B-Instruct**, tailored for **Image Captioning** and **Vision Language Attribution**. This variant is designed to generate precise, highly descriptive captions with a focus on **defining visual properties, object attributes, and scene details** across a wide spectrum of images and aspect ratios.
+[![Download Demo Notebook](https://img.shields.io/badge/Open%20Demo%20Notebook-DeepCaption--VLA--7B-blue?style=for-the-badge&logo=jupyter)](https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/DeepCaption-VLA-7B%5B4bit%20-%20notebook%20demo%5D/DeepCaption-VLA-7B.ipynb)
 # Key Highlights
 1. **Vision Language Attribution (VLA):** Specially fine-tuned to attribute and define visual properties of objects, scenes, and environments.