2 2

Yu Chin Fabian Lim

mirinflim

AI & ML interests

None yet

Recent Activity

new activity 7 days ago

ibm-ai-platform/Bamba-9B-v1:adding context length var to config.json, this is the same var used in bamaba v2 9b

commented on their article 24 days ago

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

commented on their article 24 days ago

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

View all activity

Organizations

New activity in ibm-ai-platform/Bamba-9B-v1 7 days ago

adding context length var to config.json, this is the same var used in bamaba v2 9b

#6 opened 10 days ago by

anakin004

commented on No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL 24 days ago

I see you are using tensor_parallel_size > 1, In this case both shards need to communicate with each other, so it depends on the settings in your machine.

the collocate does not use ray, so its not needed.
I assume unsloth_grpo.py is similar to the script we gave on the top of the issue.

Few suggestions to debug

Try a case with tensor_parallel_size=1 and if it works its a networking issue.
Try to run vllm serve with tensor_parallel_size=2 to make sure you isolate any TCP issues in your machine with vllm tensor parallel. See https://docs.vllm.ai/en/latest/serving/distributed_serving.html#running-vllm-on-a-single-node. Im not sure how your GPUs are networked together.
Try to downgrade vllm==0.8, remove ray and torch=2.6.0, trl=0.18 just to see if its a versions issue.
Still does not work you can try to export VLLM_WORKER_MULTIPROC_METHOD=spawn, but this is a wild shot.

commented on No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL 24 days ago

this is not normal. make sure you set vllm_mode="colocate"

commented on No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL about 2 months ago

@ajinkya-tejankar in our private experimentation, we have tried to hack in FSDP2 into accelerate, and tested it with collocate. There are a few issues I believe that remain. 1. TRL's weight loading code only works with FSDP1 I believe. 2. FSDP1 has a NAN problem and I had filed a bug report awhile back https://github.com/vllm-project/vllm/issues/14443

See the previous discussion here:
https://github.com/huggingface/trl/pull/3317#issuecomment-2842576427

commented on No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL about 2 months ago

@lhkhiem28 actually we didnt try this, however there is no reason that it wouldnt work since LoRA is relates to model training, whereas our change relates to generation. However, it seems that @ajinkya-tejankar below has tried it and it seems to work