You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

FinetunedQWEN Overlay Text Extractor

A specialized vision-language model that extracts overlaid text from images like captions, titles, and promotional text while ignoring background text.

Features

  • Specialized Text Extraction: Focuses on deliberately overlaid text elements
  • Real-time Processing: Deployed on Hugging Face Inference Endpoints
  • Simple JSON Interface: Easy to integrate with existing workflows
  • Lightweight Model: Based on Qwen2.5-VL-3B-Instruct with a fine-tuned adapter

Use Cases

  • Video caption extraction
  • Content moderation
  • Graphic design analysis
  • Accessibility improvements
  • Marketing analytics

Technical Details

  • Base Model: Qwen/Qwen2.5-VL-3B-Instruct
  • Fine-tuned Adapter: MohammedSameerSyed/FinetunedQWEN
  • Input: Base64-encoded image
  • Output: JSON with extracted text or "{none}" indicator

Quick Start

Test the model with this simple Python code:

import requests
import base64
import json

def test_model(image_path, endpoint_url):
    with open(image_path, "rb") as f:
        base64_image = base64.b64encode(f.read()).decode("utf-8")
    
    payload = json.dumps({"inputs": base64_image})
    headers = {"Content-Type": "application/json"}

    response = requests.post(endpoint_url, data=payload, headers=headers)
    return response.json()

image_path = "your_image.jpg"
endpoint_url = "YOUR_ENDPOINT_URL"
result = test_model(image_path, endpoint_url)
print(f"Extracted text: {result.get('overlay_text', 'None found')}")

API Usage

Basic request:

curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"inputs": "BASE64_ENCODED_IMAGE"}' \
  YOUR_ENDPOINT_URL

With custom prefix:

{
  "inputs": "BASE64_ENCODED_IMAGE", 
  "parameters": {"prefix": "Extract overlay text: "}
}

Limitations

  • Works best with clear, deliberate text overlays
  • May struggle with noisy backgrounds or complex overlapping text
  • Limited support for non-Latin scripts
  • Performance varies with image quality

Performance Tips

  • Use high-contrast text for best results
  • Ensure overlay text is clearly distinguished from background
  • Avoid highly stylized fonts when possible
  • Test with your specific image types for optimal results

Ethical Considerations

  • Respect copyright when extracting text from images
  • Be mindful of privacy when processing images with personal information
  • Consider bias in text recognition performance across different languages

Contact

Acknowledgements

  • Qwen Team for the base Qwen2.5-VL-3B-Instruct model
  • Hugging Face for the infrastructure and tools
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for MohammedSameerSyed/FinetunedQWEN

Finetuned
(141)
this model