prithivMLmods commited on
Commit
35a247d
·
verified ·
1 Parent(s): 63c7239

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -23
README.md CHANGED
@@ -22,23 +22,30 @@ library_name: transformers
22
 
23
  # **Qwen2.5-VL-7B-Abliterated-Caption-it**
24
 
25
- > **Qwen2.5-VL-7B-Abliterated-Caption-it** is a fine-tuned version of **Qwen2.5-VL-7B-Instruct**, optimized for **Abliterated Captioning** / **Uncensored Captioning**. This model excels at generating detailed, context-rich, and high-fidelity captions across **diverse image categories** and **variational aspect ratios**, offering robust visual understanding without filtering or censorship.
26
 
27
- # Key Enhancements
28
 
29
- * **Uncensored & Detailed Captioning**: Capable of producing in-depth captions for a wide range of image types, including complex or non-standard visual content.
30
 
31
- * **Aspect-Ratio-Aware Visual Description**: Robust performance across images of varying sizes, orientations, and layouts.
32
 
33
- * **Context-Aware Multimodal Reasoning**: Understands visual scenes in conjunction with textual prompts, enabling accurate and comprehensive interpretations.
34
 
35
- * **Support for OCR, Layout, and Visual QA Tasks**: Maintains strong performance on document-type images, retaining capability for text extraction and visual question answering.
36
 
37
- * **Instruction-Tuned for Precision**: Fine-tuned to follow user prompts and provide captions tailored to user intent, even with minimal or ambiguous input.
38
 
39
- * **Multilingual and Multi-Domain Compatibility**: Provides accurate captioning for content across languages and specialized domains.
40
 
41
- * **Stable Across Benchmark Tests**: Competent performance across visual-language benchmarks such as COCO, DocVQA, TextVQA, and others, including ablative scenarios.
 
 
 
 
 
 
 
42
 
43
  # Quick Start with Transformers
44
 
@@ -60,7 +67,7 @@ messages = [
60
  "type": "image",
61
  "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
62
  },
63
- {"type": "text", "text": "Describe this image."},
64
  ],
65
  }
66
  ]
@@ -90,20 +97,17 @@ print(output_text)
90
 
91
  # Intended Use
92
 
93
- This model is intended for:
94
 
95
- * Generating rich, uncensored captions from diverse image categories, including stylized, medical, artistic, and real-world scenarios.
96
- * Supporting flexible aspect ratios and complex visual scenes.
97
- * Captioning tasks that demand unfiltered, detailed outputs without masking sensitive or subtle content.
98
- * Multimodal reasoning between visual and textual content for creative, research, and analytical use cases.
99
- * Caption generation for datasets that require fine-grained annotation or description, such as image-based storytelling or surveillance contexts.
100
- * Multilingual caption generation for global image datasets.
101
- * Vision-based interaction systems that need unrestricted and accurate scene understanding.
102
 
103
  # Limitations
104
 
105
- * May produce uncensored or sensitive outputs not suitable for all applications.
106
- * Performance may vary on abstract or adversarial images outside of the training domain.
107
- * High computational demands; not optimized for edge or low-resource devices.
108
- * Accuracy on handwritten or heavily distorted content may be reduced.
109
- * As with most generative models, hallucination or context drift is possible in some scenarios.
 
22
 
23
  # **Qwen2.5-VL-7B-Abliterated-Caption-it**
24
 
25
+ > The **Qwen2.5-VL-7B-Abliterated-Caption-it** model is a fine-tuned version of **Qwen2.5-VL-7B-Instruct**, tailored for **Abliterated Captioning** / **Uncensored Image Captioning**. This variant is designed to generate highly detailed and descriptive captions across a broad range of visual categories, including images with complex, sensitive, or nuanced content—across varying aspect ratios and resolutions.
26
 
27
+ # Key Highlights
28
 
29
+ * **Abliterated / Uncensored Captioning**: Fine-tuned to bypass common content filters while preserving factual and descriptive richness across diverse visual categories.
30
 
31
+ * **High-Fidelity Descriptions**: Generates comprehensive captions for general, artistic, technical, abstract, and low-context images.
32
 
33
+ * **Robust Across Aspect Ratios**: Capable of accurately captioning images with wide, tall, square, and irregular dimensions.
34
 
35
+ * **Variational Detail Control**: Produces outputs with both high-level summaries and fine-grained descriptions as needed.
36
 
37
+ * **Foundation on Qwen2.5-VL Architecture**: Leverages the strengths of the Qwen2.5-VL-7B multimodal model for visual reasoning, comprehension, and instruction-following.
38
 
39
+ * **Multilingual Output Capability**: Can support multilingual descriptions (English as default), adaptable via prompt engineering.
40
 
41
+ # Training Details
42
+
43
+ This model was fine-tuned using the following datasets:
44
+
45
+ * **[prithivMLmods/blip3o-caption-mini-arrow](https://huggingface.co/datasets/prithivMLmods/blip3o-caption-mini-arrow)**
46
+ * **Private/unlisted datasets** curated for uncensored and domain-specific image captioning tasks.
47
+
48
+ The training objective focused on enhancing performance in unconstrained, descriptive image captioning—especially for edge cases commonly filtered out in standard captioning benchmarks.
49
 
50
  # Quick Start with Transformers
51
 
 
67
  "type": "image",
68
  "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
69
  },
70
+ {"type": "text", "text": "Describe this image in detail."},
71
  ],
72
  }
73
  ]
 
97
 
98
  # Intended Use
99
 
100
+ This model is suited for:
101
 
102
+ * Generating detailed and unfiltered image captions for general-purpose or artistic datasets.
103
+ * Content moderation research, red-teaming, and generative safety evaluations.
104
+ * Enabling descriptive captioning for visual datasets typically excluded from mainstream models.
105
+ * Use in creative applications (e.g., storytelling, art generation) that benefit from rich descriptive captions.
106
+ * Captioning for non-standard aspect ratios and stylized visual content.
 
 
107
 
108
  # Limitations
109
 
110
+ * May produce explicit, sensitive, or offensive descriptions depending on image content and prompts.
111
+ * Not suitable for deployment in production systems requiring content filtering or moderation.
112
+ * Can exhibit variability in caption tone or style depending on input prompt phrasing.
113
+ * Accuracy for unfamiliar or synthetic visual styles may vary.