Add/update the quantized ONNX model files and README.md for Transformers.js v3

#2
by whitphx HF Staff - opened

Applied Quantizations

❌ Based on encoder_model.onnx with slimming

None

↳ ❌ int8: encoder_model_int8.onnx (added but JS-based E2E test failed)

dtype not specified for "decoder_model_merged". Using the default dtype (fp32) for this device (cpu).
/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:25
            __classPrivateFieldGet(this, _OnnxruntimeSessionHandler_inferenceSession, "f").loadModel(pathOrBuffer, options);
                                                                                           ^

Error: Could not find an implementation for ConvInteger(10) node with name '/conv1/Conv_quant'
    at new OnnxruntimeSessionHandler (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:25:92)
    at Immediate.<anonymous> (/home/ubuntu/src/tjsmigration/node_modules/.pnpm/[email protected]/node_modules/onnxruntime-node/dist/backend.js:67:29)
    at process.processImmediate (node:internal/timers:485:21)

Node.js v22.16.0

❌ Based on decoder_model_merged.onnx with slimming

0%|          | 0/1 [00:00<?, ?it/s]
Processing /tmp/tmpqar5es1_/decoder_model_merged.onnx:   0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/6 [00:00<?, ?it/s]

 - Quantizing to fp16:   0%|          | 0/6 [00:00<?, ?it/s]/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 2.240995833346915e-09 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -8.977981025282133e-09 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 8.06769619998704e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -5.251329682209871e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -2.3526995462930245e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -4.9463871221178124e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -5.8767781752067094e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.283024341271812e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -1.828350271182444e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.2955544071455734e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -1.4954201432715308e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 7.732786855285667e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -1.952086137180231e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 1.686143846768573e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -7.418340608467133e-09 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -9.58146273433158e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 7.881167363166242e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -3.490806221861931e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 1.372817415301597e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -8.779268867442624e-09 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 1.750978384507107e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -5.010136661098841e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -3.270974602287424e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 6.585358924837692e-10 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -5.558971594155082e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -4.8950699493843786e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -7.166952542547733e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 1.379105363241706e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 4.174123446887279e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 6.318246370540237e-09 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 2.6449882284396153e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -6.847692968214858e-10 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 8.60018189996481e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -6.534349950015894e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -3.529433456606057e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -1.122839066169945e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 2.805253451754197e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -2.2737046023735275e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 9.506624110144912e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 8.196622047762503e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 8.201104861882413e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -8.48596783953326e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.8338487939881816e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -3.1586133708572106e-09 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -8.213839608472995e-10 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 2.514668295816591e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 5.2583828846763936e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 5.442695893975724e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -8.308015964075821e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 2.08206318852433e-09 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -1.9053732813745228e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:85: UserWarning: the float32 number -3.4028234663852886e+38 will be truncated to -10000.0
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 6.88486423428003e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -7.044531713518154e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 2.7650150613567348e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 5.08155473255556e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -7.165546378473664e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 1.9205208090511405e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 1.440300145816309e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -2.575421831352287e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 1.6368847610692683e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -9.599279593430765e-08 will be truncated to -1e-07
  warnings.warn(

 - Quantizing to fp16:   0%|          | 0/6 [00:03<?, ?it/s]

Processing /tmp/tmpqar5es1_/decoder_model_merged.onnx:   0%|          | 0/1 [00:03<?, ?it/s]
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 377, in <module>
    main()
  File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 374, in main
    quantize(input_folder, output_folder, quantization_args)
  File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 309, in quantize
    quantize_fp16(
  File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 223, in quantize_fp16
    check_and_save_model(model_fp16, save_path)
  File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 29, in check_and_save_model
    strict_check_model(model)
  File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 21, in strict_check_model
    raise e
  File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 16, in strict_check_model
    onnx.checker.check_model(model_or_path, full_check=True)
  File "/home/ubuntu/.cache/uv/archive-v0/cQ6A7vyzEBQhtbSuz6CnD/lib/python3.12/site-packages/onnx/checker.py", line 179, in check_model
    C.check_model(
onnx.onnx_cpp2py_export.shape_inference.InferenceError: [ShapeInferenceError] Inference error(s): (op_type:If, node name: optimum::if): [ShapeInferenceError] Inference error(s): (op_type:Add, node name: /model/decoder/embed_positions/Add): [ShapeInferenceError] Inferred shape and existing shape differ in rank: (1) vs (0)

βœ… Based on decoder_model_merged.onnx without slimming

↳ βœ… fp16: decoder_model_merged_fp16.onnx (replaced because it was invalid)
↳ βœ… int8: decoder_model_merged_int8.onnx (added)
↳ βœ… uint8: decoder_model_merged_uint8.onnx (added)
↳ βœ… q4: decoder_model_merged_q4.onnx (added)
↳ βœ… q4f16: decoder_model_merged_q4f16.onnx (added)
↳ βœ… bnb4: decoder_model_merged_bnb4.onnx (added)

Xenova changed pull request status to merged

Sign up or log in to comment