Need help with spilting
Has anyone successfully split CXDuncan's MADLAD quantized ONNX models (CXDuncan/madlad400-3b-mt-optimized-quantized-onnx) into components like MADLAD_embed.onnx (embeddings+LM head) and MADLAD_cache_initializer.onnx
Attempts to split using standard onnx library tools fail with No Op registered for SimplifiedLayerNormalization with domain_version of 13. Does anyone know how to handle this operator or have compatible split components?
I'm not sure you'll be able to. This was created with a painful day of trial and error to see if I could get anything working:
mkdir optimized-onnx/
optimum-cli export onnx --model google/madlad400-3b-mt --optimize O2 optimized-onnx/
mkdir optim-quan-onnx
optimum-cli onnxruntime quantize --onnx_model optimized-onnx/ --avx512 -o optim-quan-onnx/
on e2-highmem-16 (16 vCPUs, 128 GB memory)
you might have to take a different approach.
Thanks for replying. Yeah I tried actually but the process was compute rich and I don't have much hardware resources to make it work. I am building a real time translation service and this is the only step I am stuck at. Anyone in the team or in your known can help me with this?