optimum-neuron-cache / inference-cache-config
16.4 kB
dacorvo's picture
dacorvo HF Staff
Add batch size 4 configurations for LLama 1B and 3B models
3b6312a verified