"RuntimeError: probability tensor contains either `inf`, `nan` or element < 0" when running in multi-gpu

#53

by greeksharifa - opened Oct 8, 2024

Oct 8, 2024

•

If I run the code like...

(...)
outputs = model.generate(**inputs, max_new_tokens=30)

Then this erorr occurs:

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Environments:

# python 3.10
# 6 x A6000 GPUs
transformers==4.45.2
torch==2.4.1
torchaudio==2.4.1
torchvision==0.19.1
accelerate==1.0.0

Question) What is the recommended CUDA version? I used CUDA 12.2 or 11.8.

Nov 5, 2024

seeing the same issue. Did you figure it out?

Jan 3

same problem. Anyone solved it??

zhaohf

Feb 20

same problem. Anyone solved it??

Apr 7

•

fra-wee
I was able to solve this by changing device_map to 'sequential'. The issue persists with device_map='auto'.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment