README.md · TeeZee/Lumimaid-v0.2-70B-awq at main

metadata

base_model:
  - NeverSleep/Lumimaid-v0.2-70B
tags:
  - not-for-all-audiences

To run on 2x3090 with 48 GB VRAM use vLLM:

docker run --runtime nvidia --gpus all -v vllm-models:/root/.cache/huggingface --env "HUGGING_FACE_HUB_TOKEN=<your token>" -p 8000:8000 --ipc=host vllm/vllm-openai:latest --model TeeZee/Lumimaid-v0.2-70B-awq --tensor-parallel-size 2 --quantization awq --max_model_len 8192 --dtype half --gpu_memory_utilization 0.97