Llama.cpp ultravox-v0_6-llama-3_3-70b by fixie-ai

Original model: https://huggingface.co/fixie-ai/ultravox-v0_6-llama-3_3-70b

This is a F16 mmproj file intended to be used in conjunction with Llama-3.3-70B-Instruct. High performance hybrid quants of Llama-3.3-70B-Instruct are available here: https://huggingface.co/steampunque/Llama-3.3-70B-Instruct-Hybrid-GGUF

Usage:

Llama-3.3-70B-Instruct is made into an audio capable model using the fixie-ai audio multimedia projector tuned to work with it. This enables the model to input both audio (.mp3 and .wav files) and text inputs and generate text outputs. The mmproj file is made available in this repository and the hybrid quant model file is linked above and below. More information about running multimedia may be found in the docs in the mtmd readme in the tools directory of the llama.cpp source tree https://github.com/ggml-org/llama.cpp/blob/master/tools/mtmd/README.md.

Extremely accurate audio transcription was found using sample rate 16000, single channel, 16 bit wav, broken up into 30s chunks with ffmpeg. If offloading to GPU make sure to configure llama.cpp ngl and context size to leave some reserve space in VRAM for clip buffers, otherwise the model can SEGV with no error messages. The amount of reserve needed may vary from system to system so it may be necessary to experiment with this.

Benchmarks:

Audio benchmarks for the model will eventually be given here: https://huggingface.co/spaces/steampunque/benchlm

Download the file from below:

Link Type Size/e9 B Notes
Llama-3.3-70B-Instruct.Q3_S_H.gguf Q3_S_H 32.6e9 B 1.7B smaller than Q3_K_M
Llama-3.3-70B-Instruct.Q3_K_H.gguf Q3_K_H 33.4e9 B 0.9B smaller than Q3_K_M
Llama-3.3-70B-Instruct.Q4_K_H.gguf Q4_K_H 37.5e9 B 0.8B smaller than IQ4_XS
ultravox-v0_6-llama-3_3-70b.mmproj.gguf mmproj 1.38e9 B multimedia projector

A discussion thread about the hybrid layer quant approach can be found here on the llama.cpp git repository:

https://github.com/ggml-org/llama.cpp/discussions/13040

Downloads last month
13
GGUF
Model size
696M params
Architecture
clip
Hardware compatibility
Log In to view the estimation
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for steampunque/ultravox-v0_6-llama-3_3-70b-Hybrid-GGUF

Quantized
(1)
this model