xvzxfv.png

Tokenized-OCR

Tokenized-OCR is an advanced OCR-based text extraction tool optimized for generating structured, tokenized outputs. Built upon a powerful vision-language architecture with enhanced OCR and multilingual support, Tokenized-OCR accurately extracts text from images and returns it as a comma-separated sequence.

Key Enhancements:

  • Advanced OCR Engine: Fine-tuned on extensive datasets, Tokenized-OCR ensures precise text recognition and tokenization.
  • Optimized for Tokenized Output: Produces structured comma-separated text, making it ideal for downstream NLP tasks, automation pipelines, and database integrations.
  • Enhanced Multilingual OCR: Supports text extraction in multiple languages, including English, Chinese, Japanese, Korean, Arabic, and more.
  • Multimodal Processing: Seamlessly processes both image and text inputs, providing structured tokenized outputs.
  • Secure and Optimized Model Weights: Employs safetensors for efficient and secure model loading.

Demo Inference

Instruction : "Extract and return the tokenized OCR text from the image, ensuring separated by commas."

sdsdfsd.png

How to Use

from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

# Load the Tokenized-OCR model with optimized parameters
model = Qwen2VLForConditionalGeneration.from_pretrained(
    "prithivMLmods/Tokenized-OCR", torch_dtype="auto", device_map="auto"
)

# Recommended acceleration for performance optimization:
# model = Qwen2VLForConditionalGeneration.from_pretrained(
#     "prithivMLmods/Tokenized-OCR",
#     torch_dtype=torch.bfloat16,
#     attn_implementation="flash_attention_2",
#     device_map="auto",
# )

# Load the default processor for Tokenized-OCR
processor = AutoProcessor.from_pretrained("prithivMLmods/Tokenized-OCR")

# Define the input messages with both an image and a text prompt
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://flux-generated.com/sample_image.jpeg",
            },
            {"type": "text", "text": "Extract and return the tokenized OCR text from the image, ensuring each word is accurately recognized and separated by commas."},
        ],
    }
]

# Prepare the input for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")

# Generate the output
generated_ids = model.generate(**inputs, max_new_tokens=256)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

Key Features

  1. High-Accuracy OCR Processing

    • Extracts and tokenizes text from images with exceptional precision.
  2. Multilingual Text Recognition

    • Supports multiple languages, ensuring comprehensive OCR capabilities.
  3. Comma-Separated Tokenized Output

    • Generates structured text for seamless NLP and data processing tasks.
  4. Efficient Image & Text Processing

    • Handles both visual and textual inputs, ensuring accurate OCR-based extraction.
  5. Optimized for Secure Deployment

    • Uses safetensors for enhanced security and model efficiency.

Tokenized-OCR revolutionizes text extraction from images, providing tokenized outputs that are easy to integrate into automated workflows, search engines, and language processing applications.

Downloads last month
25
Safetensors
Model size
2.21B params
Tensor type
BF16
·
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for prithivMLmods/Tokenized-OCR

Base model

Qwen/Qwen2-VL-2B
Finetuned
(2)
this model
Quantizations
2 models