About
Bitsandbytes 4bit quantized version of https://huggingface.co/sometimesanotion/Lamarck-14B-v0.7
Ideal for faster and cheaper GPU inference e.g. in VLLM.
Stats from running on RTX 4090:
model weights take 9.35GiB; non_torch_memory takes 0.08GiB; PyTorch activation peak memory takes 4.50GiB; the rest of the memory reserved for KV Cache is 9.24GiB.
Maximum concurrency for 32768 tokens per request: 1.54x
- Downloads last month
- 8
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.
Model tree for 3WaD/Lamarck-14B-v0.7-bnb-4bit
Base model
sometimesanotion/Lamarck-14B-v0.7