FinetunedQWEN / README.md
MohammedSameerSyed's picture
Update README.md
a268b51 verified
metadata
language: en
license: apache-2.0
base_model: Qwen/Qwen2.5-VL-3B-Instruct
tags:
  - vision
  - image-to-text
  - document-understanding
  - content-creators
  - tiktok

FinetunedQWEN Overlay Text Extractor

A specialized vision-language model that extracts overlaid text from images like captions, titles, and promotional text while ignoring background text.

Features

  • Specialized Text Extraction: Focuses on deliberately overlaid text elements
  • Real-time Processing: Deployed on Hugging Face Inference Endpoints
  • Simple JSON Interface: Easy to integrate with existing workflows
  • Lightweight Model: Based on Qwen2.5-VL-3B-Instruct with a fine-tuned adapter

Use Cases

  • Video caption extraction
  • Content moderation
  • Graphic design analysis
  • Accessibility improvements
  • Marketing analytics

Technical Details

  • Base Model: Qwen/Qwen2.5-VL-3B-Instruct
  • Fine-tuned Adapter: MohammedSameerSyed/FinetunedQWEN
  • Input: Base64-encoded image
  • Output: JSON with extracted text or "{none}" indicator

Quick Start

Test the model with this simple Python code:

import requests
import base64
import json

def test_model(image_path, endpoint_url):
    with open(image_path, "rb") as f:
        base64_image = base64.b64encode(f.read()).decode("utf-8")
    
    payload = json.dumps({"inputs": base64_image})
    headers = {"Content-Type": "application/json"}

    response = requests.post(endpoint_url, data=payload, headers=headers)
    return response.json()

image_path = "your_image.jpg"
endpoint_url = "YOUR_ENDPOINT_URL"
result = test_model(image_path, endpoint_url)
print(f"Extracted text: {result.get('overlay_text', 'None found')}")

API Usage

Basic request:

curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"inputs": "BASE64_ENCODED_IMAGE"}' \
  YOUR_ENDPOINT_URL

With custom prefix:

{
  "inputs": "BASE64_ENCODED_IMAGE", 
  "parameters": {"prefix": "Extract overlay text: "}
}

Limitations

  • Works best with clear, deliberate text overlays
  • May struggle with noisy backgrounds or complex overlapping text
  • Limited support for non-Latin scripts
  • Performance varies with image quality

Performance Tips

  • Use high-contrast text for best results
  • Ensure overlay text is clearly distinguished from background
  • Avoid highly stylized fonts when possible
  • Test with your specific image types for optimal results

Ethical Considerations

  • Respect copyright when extracting text from images
  • Be mindful of privacy when processing images with personal information
  • Consider bias in text recognition performance across different languages

Contact

Acknowledgements

  • Qwen Team for the base Qwen2.5-VL-3B-Instruct model
  • Hugging Face for the infrastructure and tools