Errors in HybridMambaAttentionDynamicCache
Hi, thanks for the great model!
I tried to use the cache in an iterative generation process but had errors.
The first error is:
File "modules/transformers_modules/nvidia/Nemotron-H-8B-Reasoning-128K/2dcbcfd95b103843b6ad8e79690f34480ce5a5ae/modeling_nemotron_h.py", line 460, in cuda_kernels_forward
(cache_params.conv_kernel_size - hidden_states_B_C_transposed.shape[-1], 0),
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1940, in __getattr__
raise AttributeError(
AttributeError: 'HybridMambaAttentionDynamicCache' object has no attribute 'conv_kernel_size'
The conv_kernel_size
is accessed here: https://huggingface.co/nvidia/Nemotron-H-8B-Reasoning-128K/blob/main/modeling_nemotron_h.py#L460
But it is not stored in the cache object: https://huggingface.co/nvidia/Nemotron-H-8B-Reasoning-128K/blob/main/modeling_nemotron_h.py#L176
The second error is:
self.conv_states[layer_idx] = new_conv_state.to(self.conv_states.device)
AttributeError: 'list' object has no attribute 'device'
self.conv_states
is initialized as a list: https://huggingface.co/nvidia/Nemotron-H-8B-Reasoning-128K/blob/main/modeling_nemotron_h.py#L177
But it is later used to obtain the device
: https://huggingface.co/nvidia/Nemotron-H-8B-Reasoning-128K/blob/main/modeling_nemotron_h.py#L248self.ssm_states
has similar issue: https://huggingface.co/nvidia/Nemotron-H-8B-Reasoning-128K/blob/main/modeling_nemotron_h.py#L255
I guess this HybridMambaAttentionDynamicCache
class is assembled from different models, so there may be some inconsistencies.
Could you confirm the above issues and share if iterative generation with cache works or not? Thanks.