Can not infer, depolyed Baichuan2-13b-Chat with 2x RTX3090 24G and FastChat.
#36
by
Rethen
- opened
When I input some question on gui, it output only few text steamly, and then crashed.
$ python3 -m fastchat.serve.model_worker --model-path baichuan-inc/Baichuan2-13B-Chat --revision v2.0.1 --num-gpus 2 --dtype float16
2024-03-11 10:07:59 | INFO | model_worker | args: Namespace(host='localhost', port=21002, worker_address='http://localhost:21002', controller_address='http://localhost:21001', model_path='baichuan-inc/Baichuan2-13B-Chat', revision='v2.0.1', device='cuda', gpus=None, num_gpus=2, max_gpu_memory=None, dtype='float16', load_8bit=False, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, awq_ckpt=None, awq_wbits=16, awq_groupsize=-1, enable_exllama=False, exllama_max_seq_len=4096, exllama_gpu_split=None, exllama_cache_8bit=False, enable_xft=False, xft_max_seq_len=4096, xft_dtype=None, model_names=None, conv_template=None, embed_in_truncate=False, limit_worker_concurrency=5, stream_interval=2, no_register=False, seed=None, debug=False, ssl=False)
2024-03-11 10:07:59 | INFO | model_worker | Loading the model ['Baichuan2-13B-Chat'] on worker 63d1570a ...
2024-03-11 10:07:59 | ERROR | stderr | 2024-03-11 10:07:59,815 - modelscope - INFO - PyTorch version 2.2.0 Found.
2024-03-11 10:07:59 | ERROR | stderr | 2024-03-11 10:07:59,816 - modelscope - INFO - Loading ast index from /home/amax/.cache/modelscope/ast_indexer
2024-03-11 10:07:59 | ERROR | stderr | 2024-03-11 10:07:59,848 - modelscope - INFO - Loading done! Current index file version is 1.12.0, with md5 a89e5217f17b0a18ae9f5299be78e741 and a total number of 964 components indexed
2024-03-11 10:08:00 | ERROR | stderr | 2024-03-11 10:08:00,457 - modelscope - INFO - Use user-specified model revision: v2.0.1
Loading checkpoint shards: 0%| | 0/6 [00:00<?, ?it/s]
Loading checkpoint shards: 17%|ββββββββββββββββ | 1/6 [00:02<00:10, 2.10s/it]
Loading checkpoint shards: 33%|ββββββββββββββββββββββββββββββββ | 2/6 [00:04<00:08, 2.06s/it]
Loading checkpoint shards: 50%|βββββββββββββββββββββββββββββββββββββββββββββββ | 3/6 [00:06<00:06, 2.01s/it]
Loading checkpoint shards: 67%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 4/6 [00:08<00:03, 2.00s/it]
Loading checkpoint shards: 83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 5/6 [00:10<00:01, 2.00s/it]
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 6/6 [00:11<00:00, 1.75s/it]
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 6/6 [00:11<00:00, 1.89s/it]
2024-03-11 10:08:12 | ERROR | stderr |
2024-03-11 10:08:13 | INFO | model_worker | Register to controller
2024-03-11 10:08:13 | ERROR | stderr | INFO: Started server process [1687655]
2024-03-11 10:08:13 | ERROR | stderr | INFO: Waiting for application startup.
2024-03-11 10:08:13 | ERROR | stderr | INFO: Application startup complete.
2024-03-11 10:08:13 | ERROR | stderr | INFO: Uvicorn running on http://localhost:21002 (Press CTRL+C to quit)
2024-03-11 10:08:44 | INFO | stdout | INFO: 127.0.0.1:48882 - "POST /worker_generate_stream HTTP/1.1" 200 OK
2024-03-11 10:08:58 | INFO | model_worker | Send heart beat. Models: ['Baichuan2-13B-Chat']. Semaphore: Semaphore(value=5, locked=False). call_ct: 1. worker_id: 63d1570a.
2024-03-11 10:09:19 | INFO | stdout | INFO: 127.0.0.1:49106 - "POST /worker_generate_stream HTTP/1.1" 200 OK
2024-03-11 10:09:40 | INFO | stdout | INFO: 127.0.0.1:35636 - "POST /worker_generate_stream HTTP/1.1" 200 OK