Need help with spilting

by manancode - opened Apr 28

Apr 28

Has anyone successfully split CXDuncan's MADLAD quantized ONNX models (CXDuncan/madlad400-3b-mt-optimized-quantized-onnx) into components like MADLAD_embed.onnx (embeddings+LM head) and MADLAD_cache_initializer.onnx

Attempts to split using standard onnx library tools fail with No Op registered for SimplifiedLayerNormalization with domain_version of 13. Does anyone know how to handle this operator or have compatible split components?

CXDuncan

Owner Apr 30

I'm not sure you'll be able to. This was created with a painful day of trial and error to see if I could get anything working:

mkdir optimized-onnx/
optimum-cli export onnx --model google/madlad400-3b-mt --optimize O2 optimized-onnx/
mkdir optim-quan-onnx
optimum-cli onnxruntime quantize --onnx_model optimized-onnx/ --avx512 -o optim-quan-onnx/

on e2-highmem-16 (16 vCPUs, 128 GB memory)
you might have to take a different approach.

manancode

Apr 30

Thanks for replying. Yeah I tried actually but the process was compute rich and I don't have much hardware resources to make it work. I am building a real time translation service and this is the only step I am stuck at. Anyone in the team or in your known can help me with this?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment