steampunque/ultravox-v0_5-llama-3_3-70b-Hybrid-GGUF

Llama.cpp ultravox-v0_5-llama-3_3-70b by fixie-ai

Original model: https://huggingface.co/fixie-ai/ultravox-v0_5-llama-3_3-70b

This is a F16 mmproj file intended to be used in conjunction with Llama-3.3-70B-Instruct. High performance hybrid quants of Llama-3.3-70B-Instruct are available here: https://huggingface.co/steampunque/Llama-3.3-70B-Instruct-Hybrid-GGUF

Usage:

Llama-3.3-70B-Instruct is made into an audio capable model using the fixie-ai audio multimedia projector tuned to work with it. This enables the model to input both audio (.mp3 and .wav files) and text inputs and generate text outputs. The mmproj file is made available in this repository and the hybrid quant model file is linked above and below. More information about running multimedia may be found in the docs in the mtmd readme in the tools directory of the llama.cpp source tree https://github.com/ggml-org/llama.cpp/blob/master/tools/mtmd/README.md.

Extremely accurate audio transcription was found using sample rate 16000, single channel, 16 bit wav, broken up into 30s chunks with ffmpeg. If offloading to GPU make sure to configure llama.cpp ngl and context size to leave some reserve space in VRAM for clip buffers, otherwise the model can SEGV with no error messages. The amount of reserve needed may vary from system to system so it may be necessary to experiment with this.

Benchmarks:

Audio benchmarks for the model will eventually be given here: https://huggingface.co/spaces/steampunque/benchlm

Download the file from below:

Link	Type	Size/e9 B	Notes
Llama-3.3-70B-Instruct.Q3_S_H.gguf	Q3_S_H	32.6e9 B	1.7B smaller than Q3_K_M
Llama-3.3-70B-Instruct.Q3_K_H.gguf	Q3_K_H	33.4e9 B	0.9B smaller than Q3_K_M
Llama-3.3-70B-Instruct.Q4_K_H.gguf	Q4_K_H	37.5e9 B	0.8B smaller than IQ4_XS
ultravox-v0_5-llama-3_3-70b.mmproj.gguf	mmproj	1.38e9 B	multimedia projector

A discussion thread about the hybrid layer quant approach can be found here on the llama.cpp git repository:

https://github.com/ggml-org/llama.cpp/discussions/13040

steampunque
/

ultravox-v0_5-llama-3_3-70b-Hybrid-GGUF

Llama.cpp ultravox-v0_5-llama-3_3-70b by fixie-ai

Download the file from below:

Model tree for steampunque/ultravox-v0_5-llama-3_3-70b-Hybrid-GGUF