Post
427
How fast can you create an endpoint in Hugging Face Inference Endpoints with a new model + vLLM to deploy a state-of-the-art OCR model?
Let’s break it down step by step.
1️⃣ Create your endpoint
Go to Hugging Face Endpoints → + NEW
Select Deploy from Hub → rednote-hilab/dots.ocr → Configure 🛠️
2️⃣ Configure hardware & container
Pick hardware: AWS/GPU/L4 ⚡
Set container: vLLM 🐇
Click Create ✅
3️⃣ Update endpoint settings
Container: Container URI: vllm/vllm-openai:nightly → Update
Advanced: add flag --trust-remote-code → Update ⚠️
4️⃣ Run inference
Download the script 📝: ariG23498/useful-scripts
Set your HF_TOKEN and update base_url in the script.
Run it. ✅
Your OCR model is now live via HF Inference Endpoints!
Let’s break it down step by step.
1️⃣ Create your endpoint
Go to Hugging Face Endpoints → + NEW
Select Deploy from Hub → rednote-hilab/dots.ocr → Configure 🛠️
2️⃣ Configure hardware & container
Pick hardware: AWS/GPU/L4 ⚡
Set container: vLLM 🐇
Click Create ✅
3️⃣ Update endpoint settings
Container: Container URI: vllm/vllm-openai:nightly → Update
Advanced: add flag --trust-remote-code → Update ⚠️
4️⃣ Run inference
Download the script 📝: ariG23498/useful-scripts
Set your HF_TOKEN and update base_url in the script.
Run it. ✅
Your OCR model is now live via HF Inference Endpoints!