neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w8a8 Text Generation • Updated Oct 10, 2024 • 7.43k • 19
view article Article Token Merging for fast LLM inference : Background and first trials with Mistral By samchain • Apr 30, 2024 • 4