--- license: mit language: - zh - en tags: - document-parsing - document-understanding - document-intelligence - ocr - layout-analysis - table-extraction - multimodal - vision-language-model datasets: - custom pipeline_tag: image-text-to-text library_name: transformers --- # Dolphin OCR Deployment on Hugging Face Inference Toolkit This guide provides step-by-step instructions to deploy the **Bytedance Dolphin OCR model** using the **Hugging Face Inference Toolkit** with GPU support. --- ## 🔹 Prerequisites - Docker installed - a GPU in your local machine - A [Hugging Face account](https://huggingface.co/) - Basic familiarity with command-line tools --- ## 🔢 Step 1: Duplicate the Dolphin Model Repository 1. Visit: [https://huggingface.co/spaces/huggingface-projects/repo\_duplicator](https://huggingface.co/spaces/huggingface-projects/repo_duplicator) 2. Enter the source repo, in this case `Bytedance/Dolphin`. 3. Name your new repo: `luquiT4/DolphinInference` (or any name you prefer). --- ## 🔢 Step 2: Add the handler to the Model Repository to in the documentation they mention that this files helps for compatibility https://github.com/huggingface/huggingface-inference-toolkit/#custom-handler-and-dependency-support - `handler.py` (Custom inference handler) - `requirements.txt` (Dependencies) to add them we need to... 1. Add a new file to the new repo: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/67116e3a75abfd0db8e1b154/wlXCsuIQJlMOf-kKG4c0U.png) 2. And paste this: ```python import base64 import io from typing import Dict, Any import torch from PIL import Image from transformers import AutoProcessor, VisionEncoderDecoderModel class EndpointHandler: def __init__(self, path=""): # Load processor and model from the provided path or model ID self.processor = AutoProcessor.from_pretrained(path or "bytedance/Dolphin") self.model = VisionEncoderDecoderModel.from_pretrained(path or "bytedance/Dolphin") self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") self.model.to(self.device) self.model.eval() self.model = self.model.half() # Half precision for speed self.tokenizer = self.processor.tokenizer def decode_base64_image(self, image_base64: str) -> Image.Image: image_bytes = base64.b64decode(image_base64) return Image.open(io.BytesIO(image_bytes)).convert("RGB") def __call__(self, data: Dict[str, Any]) -> Dict[str, Any]: # Check for image input if "inputs" not in data: return {"error": "No inputs provided"} image_input = data["inputs"] # Support both base64 image strings and raw images (Hugging Face supports both) if isinstance(image_input, str): try: image = self.decode_base64_image(image_input) except Exception as e: return {"error": f"Invalid base64 image: {str(e)}"} else: image = image_input # Assume PIL-compatible image # Optional: Custom prompt (default: text reading) prompt = data.get("prompt", "Read text in the image.") full_prompt = f"{prompt} " # Preprocess inputs inputs = self.processor(image, return_tensors="pt") pixel_values = inputs.pixel_values.half().to(self.device) prompt_ids = self.tokenizer(full_prompt, add_special_tokens=False, return_tensors="pt").input_ids.to(self.device) decoder_attention_mask = torch.ones_like(prompt_ids).to(self.device) # Inference outputs = self.model.generate( pixel_values=pixel_values, decoder_input_ids=prompt_ids, decoder_attention_mask=decoder_attention_mask, min_length=1, max_length=4096, pad_token_id=self.tokenizer.pad_token_id, eos_token_id=self.tokenizer.eos_token_id, use_cache=True, bad_words_ids=[[self.tokenizer.unk_token_id]], return_dict_in_generate=True, do_sample=False, num_beams=1, ) sequence = self.tokenizer.batch_decode(outputs.sequences, skip_special_tokens=False)[0] # Clean up generated_text = sequence.replace(full_prompt, "").replace("", "").replace("", "").strip() return {"text": generated_text} ``` this has been generated using ChatGPT and this sources: - https://huggingface.co/docs/inference-endpoints/guides/custom_handler (main documentation) - https://github.com/bytedance/Dolphin/blob/master/demo_page_hf.py (Demo script of Dolphin) - https://github.com/bytedance/Dolphin/blob/master/demo_element_hf.py (Demo script of Dolphin) - https://github.com/bytedance/Dolphin/blob/master/deployment/vllm/api_server.py (VLLM implementation of Dolphin) - https://huggingface.co/philschmid/donut-base-finetuned-cord-v2/blob/main/handler.py (similar model `handler.py`) in this case it works using only `handler.py` without `requirements.txt` --- ## 🔢 Step 3: Build the Hugging Face Inference Toolkit Docker Image 1. Clone the toolkit: ```bash git clone https://github.com/huggingface/huggingface-inference-toolkit.git cd huggingface-inference-toolkit ``` 2. **Important:** If you are on Windows, use **WSL or Linux** to avoid line-ending issues (`^M: bad interpreter`). 3. Build the GPU Docker image: ```bash make inference-pytorch-gpu # on the back will run this # docker build -t integration-test-pytorch:gpu -f docker/Dockerfile.pytorch . ``` --- ## 🔢 Step 4: Run the Inference Server with Dolphin Model ```bash docker run -ti -p 5001:5000 --gpus all \ -e HF_MODEL_ID=luquiT4/DolphinInference \ -e HF_TASK=image-to-text \ integration-test-pytorch:gpu ``` - `HF_MODEL_ID` = your Hugging Face model name - `HF_TASK` = task type (image-to-text) --- ## 🔢 Step 5: Test the Endpoint 1. Send an inference request: ```bash curl --request POST \ --url http://localhost:5001/ \ --header 'accept: application/json' \ --header 'content-type: application/octet-stream' \ --data 'C:\path\to\imagewithtext.png' ``` 1. Enjoy a successful request --- ## 🔢 Step 6 (Coming Soon): Deploy to Azure Serverless Function as an API - Use **serverless GPU (NC T4 v3)** for low-cost inference. - Configure **scale-to-zero** in Azure Container Apps to avoid idle GPU charges. - Monitor with Azure budgets and alerts. info: - https://learn.microsoft.com/en-us/azure/container-apps/gpu-image-generation?pivots=azure-portal - https://azure.microsoft.com/en-us/pricing/details/container-apps/?cdn=disable - https://learn.microsoft.com/en-us/azure/container-apps/gpu-serverless-overview --- ## 🔹 Troubleshooting | Issue | Solution | | --------------------------- | -------------------------------------------------------------- | | `404 requirements.txt` | (Optionaal) Create `requirements.txt` on your HF model repo | | `Safetensor HeaderTooLarge` | Clone the repo on the cloud using Hugging Face Repo Duplicator | | `^M bad interpreter` | Build Docker image on WSL or Linux | --- ## 👍 Useful Links - Dolphin GitHub: [https://github.com/bytedance/Dolphin](https://github.com/bytedance/Dolphin) - Hugging Face Inference Toolkit: [https://github.com/huggingface/huggingface-inference-toolkit](https://github.com/huggingface/huggingface-inference-toolkit) - Hugging Face Repo Duplicator: [https://huggingface.co/spaces/huggingface-projects/repo\_duplicator](https://huggingface.co/spaces/huggingface-projects/repo_duplicator) --- You are now ready to deploy and run Dolphin OCR as a custom Hugging Face Inference Endpoint!