Qwen/Qwen2.5-7B-Instruct · Performance Problem

Our server has an Nvidia H200 x 8. Some of the documents we process have resolutions of 1900 x 2800. Which framework and configuration would be best for faster and more efficient execution of this model? (vLLM, etc.) We need to extract 3000 of these types of documents per hour.

Our current configuration:
Server: Asus asus esc8000a e12
2 x epyc 9965
1.5 TB RAM
8 x H200 PCI

Our application:
8 x Dockerized Qwen2.5-vl-instruct -> 1 H200
Each vision model was designed to use 1 GPU.

Thank you in advance.