model: chargoddard/mixtralnt-4x7b-test

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate

(For You)

llm_load_print_meta: model ftype      = mostly Q2_K
llm_load_print_meta: model params     = 24.15 B
llm_load_print_meta: model size       = 7.51 GiB (2.67 BPW)

llm_load_print_meta: model ftype      = mostly Q3_K - Medium
llm_load_print_meta: model params     = 24.15 B
llm_load_print_meta: model size       = 9.80 GiB (3.48 BPW)

llm_load_print_meta: model ftype      = mostly Q4_K - Medium
llm_load_print_meta: model params     = 24.15 B
llm_load_print_meta: model size       = 12.70 GiB (4.52 BPW)

llm_load_print_meta: model ftype      = mostly Q5_K - Medium
llm_load_print_meta: model params     = 24.15 B
llm_load_print_meta: model size       = 15.49 GiB (5.51 BPW)

llm_load_print_meta: model ftype      = mostly Q6_K
llm_load_print_meta: model params     = 24.15 B
llm_load_print_meta: model size       = 18.45 GiB (6.56 BPW)
Downloads last month
42
GGUF
Model size
24.2B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support