Can be served?
#4
by
prudant
- opened
vllm, or something like that for production ready high demand scenarios?
@prudant , you can serve it on triton as an onnx model with a python backend ensemble. That is pretty fast. Need higher demand than that?