Llama.cpp ultravox-v0_5-llama-3_3-70b by fixie-ai
Original model: https://huggingface.co/fixie-ai/ultravox-v0_5-llama-3_3-70b
This is a F16 mmproj file intended to be used in conjunction with Llama-3.3-70B-Instruct. High performance hybrid quants of Llama-3.3-70B-Instruct are available here: https://huggingface.co/steampunque/Llama-3.3-70B-Instruct-Hybrid-GGUF
Usage:
Llama-3.3-70B-Instruct is made into an audio capable model using the fixie-ai audio multimedia projector tuned to work with it. This enables the model to input both audio (.mp3 and .wav files) and text inputs and generate text outputs. The mmproj file is made available in this repository and the hybrid quant model file is linked above and below. More information about running multimedia may be found in the docs in the mtmd readme in the tools directory of the llama.cpp source tree https://github.com/ggml-org/llama.cpp/blob/master/tools/mtmd/README.md.
Extremely accurate audio transcription was found using sample rate 16000, single channel, 16 bit wav, broken up into 30s chunks with ffmpeg. If offloading to GPU make sure to configure llama.cpp ngl and context size to leave some reserve space in VRAM for clip buffers, otherwise the model can SEGV with no error messages. The amount of reserve needed may vary from system to system so it may be necessary to experiment with this.
Benchmarks:
Audio benchmarks for the model will eventually be given here: https://huggingface.co/spaces/steampunque/benchlm
Download the file from below:
Link | Type | Size/e9 B | Notes |
---|---|---|---|
Llama-3.3-70B-Instruct.Q3_S_H.gguf | Q3_S_H | 32.6e9 B | 1.7B smaller than Q3_K_M |
Llama-3.3-70B-Instruct.Q3_K_H.gguf | Q3_K_H | 33.4e9 B | 0.9B smaller than Q3_K_M |
Llama-3.3-70B-Instruct.Q4_K_H.gguf | Q4_K_H | 37.5e9 B | 0.8B smaller than IQ4_XS |
ultravox-v0_5-llama-3_3-70b.mmproj.gguf | mmproj | 1.38e9 B | multimedia projector |
A discussion thread about the hybrid layer quant approach can be found here on the llama.cpp git repository:
- Downloads last month
- 11
Model tree for steampunque/ultravox-v0_5-llama-3_3-70b-Hybrid-GGUF
Base model
fixie-ai/ultravox-v0_5-llama-3_3-70b