TeeZee
/

Lumimaid-v0.2-70B-awq

Not-For-All-Audiences

Model card Files Files and versions Community

To run on 2x3090 with 48 GB VRAM use vLLM:

docker run --runtime nvidia --gpus all -v vllm-models:/root/.cache/huggingface --env "HUGGING_FACE_HUB_TOKEN=<your token>" -p 8000:8000 --ipc=host vllm/vllm-openai:latest --model TeeZee/Lumimaid-v0.2-70B-awq --tensor-parallel-size 2 --quantization awq --max_model_len 8192 --dtype half --gpu_memory_utilization 0.97

Downloads last month: 6

Safetensors

Model size

11.3B params

Tensor type

I32

BF16

FP16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TeeZee/Lumimaid-v0.2-70B-awq

Base model

NeverSleep/Lumimaid-v0.2-70B

Quantized

(5)

this model

Collection including TeeZee/Lumimaid-v0.2-70B-awq

24 GB VRAM

Collection

Quants that run fast on single 3090/4090 card with 24GB of VRAM and 4096 context length • 19 items • Updated Mar 21 • 7