About

Bitsandbytes 4bit quantized version of https://huggingface.co/sometimesanotion/Lamarck-14B-v0.7

Ideal for faster and cheaper GPU inference e.g. in VLLM.
Stats from running on RTX 4090:

model weights take 9.35GiB; non_torch_memory takes 0.08GiB; PyTorch activation peak memory takes 4.50GiB; the rest of the memory reserved for KV Cache is 9.24GiB.
Maximum concurrency for 32768 tokens per request: 1.54x
Downloads last month
8
Safetensors
Model size
8.37B params
Tensor type
F32
BF16
U8
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for 3WaD/Lamarck-14B-v0.7-bnb-4bit

Quantized
(9)
this model