Optimizing Input for Faster Rendering with NanoNet

#34
by FastaLaPasta - opened

Hi, I'm wondering if there are ways to optimize the input I send to the model in order to achieve faster rendering times. Are there any recommended preprocessing steps, input formats, or parameters (liek resolution, token length, or batch size) that could help improve inference speed?
thanks :)

Nanonets org

Hello, So assuming you are using vLLM for deployment.

  1. Generally when you increase resolution the processing time will increase and acc will increase. But I have seen the acc drop after increasing the width more than 3000. You can even decrease resolution incase your document is not complex or does not have any dense text
  2. Increasing batch size will increase the throughput of the model overall but will not reduce single request processing time. If you are using large gpu you should always try to use as much batch size as possible.
  3. By token length I assuming you mean generation length? Dense text will take more time because these models are auto-regressive (they generate one token at a time). But if you want to convert your document fully to markdown you cannot do anything here. For dense documents it will take more than than documents with less text.
  4. Using better GPUs will make the computation faster. Or you can try https://docstrange.nanonets.com. We give free processing of 10k docs per month.

Sign up or log in to comment