Using smolDocling alongside docling

#38

by abdbrainy - opened 27 days ago

27 days ago

Hi my use case scenario is to use docling for all document parsing all the input it supports, however I want to use smolDocling for unstructured pdfs and images. What would be a viable architecture to do this, currently I was thinking to use the docling library for all documents except pdf and then to host smolDocling on a cloud service and use its endpoint to parse unstructured pdfs and images is that a viable workflow.

PeterWJStaar

Docling org 27 days ago

@abdbrainy Smoldocling is natively integrated into docling, just update the pipeline settings (see here: https://github.com/docling-project/docling/blob/main/docs/examples/minimal_vlm_pipeline.py#L37)

PeterWJStaar changed discussion status to closed 27 days ago

abdbrainy

26 days ago

Error parsing document with VLM: It looks like the config file at '/root/.cache/huggingface/hub/models--ds4sd--SmolDocling-256M-preview/snapshots/492bde898f2bed6b493b4da8256c93de29e03a9b/preprocessor_config.json' is not a valid JSON file.

I get this error while using smoldocling by editing the vlm pipeline.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment