runtime error

Exit code: 1. Reason: e-packages/huggingface_hub/file_download.py:945: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( quantize_config.json: 0%| | 0.00/188 [00:00<?, ?B/s] quantize_config.json: 100%|██████████| 188/188 [00:00<00:00, 1.69MB/s] model.safetensors: 0%| | 0.00/3.90G [00:00<?, ?B/s] model.safetensors: 0%| | 10.5M/3.90G [00:01<07:55, 8.18MB/s] model.safetensors: 2%|▏ | 62.9M/3.90G [00:02<02:21, 27.2MB/s] model.safetensors: 13%|█▎ | 524M/3.90G [00:03<00:17, 198MB/s]  model.safetensors: 31%|███ | 1.22G/3.90G [00:04<00:07, 355MB/s] model.safetensors: 41%|████ | 1.59G/3.90G [00:06<00:06, 332MB/s] model.safetensors: 58%|█████▊ | 2.26G/3.90G [00:07<00:03, 433MB/s] model.safetensors: 70%|██████▉ | 2.72G/3.90G [00:08<00:03, 381MB/s] model.safetensors: 82%|████████▏ | 3.19G/3.90G [00:09<00:01, 400MB/s] model.safetensors: 100%|█████████▉| 3.90G/3.90G [00:10<00:00, 373MB/s] WARNING - Overriding use_cuda_fp16 to False since torch_dtype is not torch.float16. INFO - The layer lm_head is not quantized. Traceback (most recent call last): File "/home/user/app/app.py", line 24, in <module> model = AutoGPTQForCausalLM.from_quantized( File "/usr/local/lib/python3.10/site-packages/auto_gptq/modeling/auto.py", line 135, in from_quantized return quant_func( File "/usr/local/lib/python3.10/site-packages/auto_gptq/modeling/_base.py", line 1246, in from_quantized accelerate.utils.modeling.load_checkpoint_in_model( File "/usr/local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 1878, in load_checkpoint_in_model raise ValueError( ValueError: At least one of the model submodule will be offloaded to disk, please pass along an `offload_folder`.

Container logs:

Fetching error logs...