About

Bitsandbytes 4bit quantized version of https://huggingface.co/sometimesanotion/Lamarck-14B-v0.7

Ideal for faster and cheaper GPU inference e.g. in VLLM.
Stats from running on RTX 4090:

model weights take 9.35GiB; non_torch_memory takes 0.08GiB; PyTorch activation peak memory takes 4.50GiB; the rest of the memory reserved for KV Cache is 9.24GiB.
Maximum concurrency for 32768 tokens per request: 1.54x

Downloads last month: 8

Safetensors

Model size

8.37B params

Tensor type

F32

BF16

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for 3WaD/Lamarck-14B-v0.7-bnb-4bit

Base model

sometimesanotion/Lamarck-14B-v0.7

Quantized

(9)

this model