baichuan-inc/Baichuan2-13B-Chat · Can not infer, depolyed Baichuan2-13b-Chat with 2x RTX3090 24G and FastChat.

When I input some question on gui, it output only few text steamly, and then crashed.
$ python3 -m fastchat.serve.model_worker --model-path baichuan-inc/Baichuan2-13B-Chat --revision v2.0.1 --num-gpus 2 --dtype float16
2024-03-11 10:07:59 | INFO | model_worker | args: Namespace(host='localhost', port=21002, worker_address='http://localhost:21002', controller_address='http://localhost:21001', model_path='baichuan-inc/Baichuan2-13B-Chat', revision='v2.0.1', device='cuda', gpus=None, num_gpus=2, max_gpu_memory=None, dtype='float16', load_8bit=False, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, awq_ckpt=None, awq_wbits=16, awq_groupsize=-1, enable_exllama=False, exllama_max_seq_len=4096, exllama_gpu_split=None, exllama_cache_8bit=False, enable_xft=False, xft_max_seq_len=4096, xft_dtype=None, model_names=None, conv_template=None, embed_in_truncate=False, limit_worker_concurrency=5, stream_interval=2, no_register=False, seed=None, debug=False, ssl=False)
2024-03-11 10:07:59 | INFO | model_worker | Loading the model ['Baichuan2-13B-Chat'] on worker 63d1570a ...
2024-03-11 10:07:59 | ERROR | stderr | 2024-03-11 10:07:59,815 - modelscope - INFO - PyTorch version 2.2.0 Found.
2024-03-11 10:07:59 | ERROR | stderr | 2024-03-11 10:07:59,816 - modelscope - INFO - Loading ast index from /home/amax/.cache/modelscope/ast_indexer
2024-03-11 10:07:59 | ERROR | stderr | 2024-03-11 10:07:59,848 - modelscope - INFO - Loading done! Current index file version is 1.12.0, with md5 a89e5217f17b0a18ae9f5299be78e741 and a total number of 964 components indexed
2024-03-11 10:08:00 | ERROR | stderr | 2024-03-11 10:08:00,457 - modelscope - INFO - Use user-specified model revision: v2.0.1
Loading checkpoint shards:   0%|                                                                                                      | 0/6 [00:00<?, ?it/s]
Loading checkpoint shards:  17%|███████████████▋                                                                              | 1/6 [00:02<00:10,  2.10s/it]
Loading checkpoint shards:  33%|███████████████████████████████▎                                                              | 2/6 [00:04<00:08,  2.06s/it]
Loading checkpoint shards:  50%|███████████████████████████████████████████████                                               | 3/6 [00:06<00:06,  2.01s/it]
Loading checkpoint shards:  67%|██████████████████████████████████████████████████████████████▋                               | 4/6 [00:08<00:03,  2.00s/it]
Loading checkpoint shards:  83%|██████████████████████████████████████████████████████████████████████████████▎               | 5/6 [00:10<00:01,  2.00s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:11<00:00,  1.75s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:11<00:00,  1.89s/it]
2024-03-11 10:08:12 | ERROR | stderr | 
2024-03-11 10:08:13 | INFO | model_worker | Register to controller
2024-03-11 10:08:13 | ERROR | stderr | INFO:     Started server process [1687655]
2024-03-11 10:08:13 | ERROR | stderr | INFO:     Waiting for application startup.
2024-03-11 10:08:13 | ERROR | stderr | INFO:     Application startup complete.
2024-03-11 10:08:13 | ERROR | stderr | INFO:     Uvicorn running on http://localhost:21002 (Press CTRL+C to quit)
2024-03-11 10:08:44 | INFO | stdout | INFO:     127.0.0.1:48882 - "POST /worker_generate_stream HTTP/1.1" 200 OK
2024-03-11 10:08:58 | INFO | model_worker | Send heart beat. Models: ['Baichuan2-13B-Chat']. Semaphore: Semaphore(value=5, locked=False). call_ct: 1. worker_id: 63d1570a. 
2024-03-11 10:09:19 | INFO | stdout | INFO:     127.0.0.1:49106 - "POST /worker_generate_stream HTTP/1.1" 200 OK
2024-03-11 10:09:40 | INFO | stdout | INFO:     127.0.0.1:35636 - "POST /worker_generate_stream HTTP/1.1" 200 OK