Need help with spilting

#1
by manancode - opened

Has anyone successfully split CXDuncan's MADLAD quantized ONNX models (CXDuncan/madlad400-3b-mt-optimized-quantized-onnx) into components like MADLAD_embed.onnx (embeddings+LM head) and MADLAD_cache_initializer.onnx

Attempts to split using standard onnx library tools fail with No Op registered for SimplifiedLayerNormalization with domain_version of 13. Does anyone know how to handle this operator or have compatible split components?

I'm not sure you'll be able to. This was created with a painful day of trial and error to see if I could get anything working:

mkdir optimized-onnx/
optimum-cli export onnx --model google/madlad400-3b-mt --optimize O2 optimized-onnx/
mkdir optim-quan-onnx
optimum-cli onnxruntime quantize --onnx_model optimized-onnx/ --avx512 -o optim-quan-onnx/

on e2-highmem-16 (16 vCPUs, 128 GB memory)
you might have to take a different approach.

Thanks for replying. Yeah I tried actually but the process was compute rich and I don't have much hardware resources to make it work. I am building a real time translation service and this is the only step I am stuck at. Anyone in the team or in your known can help me with this?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment