RuntimeError: Tensors must have same number of dimensions: got 2 and 3 when running sample inference code

#14
by sbhctashi - opened

Description

When running the sample inference code with the Phi-4-mini-instruct checkpoints, I encountered a RuntimeError related to tensor dimension mismatch during generation.

Reproduction Steps

  1. Obtain the Phi-4-mini-instruct model checkpoints.
  2. Run the following sample code (inference.py)
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
 
torch.random.manual_seed(0)

model_path = "microsoft/Phi-4-mini-instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
 
messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]
 
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)
 
generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}
 
output = pipe(messages, **generation_args)
print(output[0]['generated_text'])
  1. Execute the script in a CUDA-enabled environment (Python 3.10).
  2. The error occurs during the generation step.

Error Output (excerpt)

...
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:01<00:00,  1.72it/s]
Device set to use cuda:0
...
Traceback (most recent call last):
  File "/home/<username>/new_work/phi4mini/cookbook/inference.py", line 36, in <module>
    output = pipe(messages, **generation_args)
  File ".../transformers/pipelines/text_generation.py", line 278, in __call__
    return super().__call__(Chat(text_inputs), **kwargs)
  ...
  File ".../transformers/generation/utils.py", line 3309, in _sample
    input_ids = torch.cat([input_ids, next_tokens[:, None]], dim=-1)
RuntimeError: Tensors must have same number of dimensions: got 2 and 3

Additional Notes:

The execution environment uses CUDA on Python 3.10.
Personal identifiers in paths have been replaced with for privacy.
Could you please investigate this issue and provide guidance on how to resolve it? Any assistance would be greatly appreciated.

Microsoft org

@sbhctashi Thanks for reporting that.
Can you try with Python 3.8?

Check ur dependencies plz:

flash_attn==2.7.4.post1
torch==2.5.1
transformers==4.49.0
accelerate==1.3.0

After i update the transformers v4.48.0 to v4.49.0, i fixed the problem

Microsoft org

Yes, we check Python 3.8, 3.9, and 3.10 + the those package dependencies and the model work correctly.

nguyenbh changed discussion status to closed

Hi, I ran this model on my computer some time ago and it worked fine, but strangely, I got the same error when I ran it today. My environment and code have not changed. I tried to create a new python3.8 environment and install the dependencies, and I got the following error:

(x-r1-3b-py38) hygx@hygx:~/code/X-R1-3B-HuaTuo-O1$ python -m pip install -r requirements.txt 
ERROR: Ignored the following yanked versions: 1.0.3
ERROR: Ignored the following versions that require a different python version: 2.7.1.post4 Requires-Python >=3.9; 2.7.2.post1 Requires-Python >=3.9; 2.7.3 Requires-Python >=3.9; 2.7.4.post1 Requires-Python >=3.9
ERROR: Could not find a version that satisfies the requirement flash_attn==2.7.4.post1 (from versions: 0.2.0, 0.2.1, 0.2.2, 0.2.3, 0.2.4, 0.2.5, 0.2.6.post1, 0.2.7, 0.2.8, 1.0.0, 1.0.1, 1.0.2, 1.0.3.post0, 1.0.4, 1.0.5, 1.0.6, 1.0.7, 1.0.8, 1.0.9, 2.0.0.post1, 2.0.1, 2.0.2, 2.0.3, 2.0.4, 2.0.5, 2.0.6, 2.0.6.post2, 2.0.7, 2.0.8, 2.0.9, 2.1.0, 2.1.1, 2.1.2.post3, 2.2.0, 2.2.1, 2.2.2, 2.2.3.post2, 2.2.4, 2.2.4.post1, 2.2.5, 2.3.0, 2.3.1.post1, 2.3.2, 2.3.3, 2.3.4, 2.3.5, 2.3.6, 2.4.0.post1, 2.4.1, 2.4.2, 2.4.3.post1, 2.5.0, 2.5.1.post1, 2.5.2, 2.5.3, 2.5.4, 2.5.5, 2.5.6, 2.5.7, 2.5.8, 2.5.9.post1, 2.6.0.post1, 2.6.1, 2.6.2, 2.6.3, 2.7.0.post2)

Then I changed the python3.10 environment and reinstalled the dependencies, and now it works

Sign up or log in to comment