This repository is a fork of philschmid/all-MiniLM-L6-v2-optimum-embeddings. My own ONNX conversion seems to be about 4x slower, no discernable reason why: the quantized models seem roughly the same. The idea here is by forking we can ex. upgrade the Optimum lib used as well.

Downloads last month
26
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support