RWKV/RWKV7-Goose-Pile-168M-HF · RuntimeError: The size of tensor a (768) must match the size of tensor b (9216) at non-singleton dimension 2

Mar 20

Consider the following code:

import torch
import transformers
model_path = "../models/RWKV7-Goose-Pile-168M-HF"

tokenizer = transformers.AutoTokenizer.from_pretrained(model_path)
model = transformers.AutoModelForCausalLM.from_pretrained(model_path, device_map = "cuda", trust_remote_code = True)

inputs = tokenizer("Hello world!")
model(input_ids = torch.tensor(inputs["input_ids"], device = "cuda").unsqueeze(0))

This outputs:

Traceback (most recent call last):
  File "./test-rwkv.py", line 11, in <module>
    model(input_ids = torch.tensor(inputs["input_ids"], device = "cuda").unsqueeze(0))
  File ".venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/fla/models/rwkv7/modeling_rwkv7.py", line 445, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/fla/models/rwkv7/modeling_rwkv7.py", line 314, in forward
    hidden_states, attentions, past_key_values, v_first = layer(
                                                          ^^^^^^
  File ".venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/fla/models/rwkv7/modeling_rwkv7.py", line 156, in forward
    hidden_states, attentions, past_key_values, v_first = self.attn(
                                                          ^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/fla/layers/rwkv7.py", line 217, in forward
    o = o + ((r * k * self.r_k).sum(-1, keepdim=True) * v).view(batch_size, seq_len, -1)
        ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (768) must match the size of tensor b (9216) at non-singleton dimension 2

Dependencies:

fla @ git+https://github.com/fla-org/flash-linear-attention@7eff5519d42629ee37453765d41057b393068a50#7eff5519d42629ee37453765d41057b393068a50
transformers @ git+https://github.com/huggingface/transformers@0ebd6651acd32c982fee265b23243b89bdb89577
torch==2.6.0

Is this just broken, or do I need some specific versions of the libraries?

koute

Mar 22

Seems like it's fixed now in the newest version of flash linear attention.

koute changed discussion status to closed Mar 22