Text Generation
Transformers
Safetensors
PyTorch
English
nvidia
conversational

Errors in HybridMambaAttentionDynamicCache

#1
by pyf98 - opened

Hi, thanks for the great model!

I tried to use the cache in an iterative generation process but had errors.
The first error is:

File "modules/transformers_modules/nvidia/Nemotron-H-8B-Reasoning-128K/2dcbcfd95b103843b6ad8e79690f34480ce5a5ae/modeling_nemotron_h.py", line 460, in cuda_kernels_forward
(cache_params.conv_kernel_size - hidden_states_B_C_transposed.shape[-1], 0),
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1940, in __getattr__
raise AttributeError(
AttributeError: 'HybridMambaAttentionDynamicCache' object has no attribute 'conv_kernel_size'

The conv_kernel_size is accessed here: https://huggingface.co/nvidia/Nemotron-H-8B-Reasoning-128K/blob/main/modeling_nemotron_h.py#L460
But it is not stored in the cache object: https://huggingface.co/nvidia/Nemotron-H-8B-Reasoning-128K/blob/main/modeling_nemotron_h.py#L176

The second error is:

self.conv_states[layer_idx] = new_conv_state.to(self.conv_states.device)
AttributeError: 'list' object has no attribute 'device'

self.conv_states is initialized as a list: https://huggingface.co/nvidia/Nemotron-H-8B-Reasoning-128K/blob/main/modeling_nemotron_h.py#L177
But it is later used to obtain the device: https://huggingface.co/nvidia/Nemotron-H-8B-Reasoning-128K/blob/main/modeling_nemotron_h.py#L248
self.ssm_states has similar issue: https://huggingface.co/nvidia/Nemotron-H-8B-Reasoning-128K/blob/main/modeling_nemotron_h.py#L255

I guess this HybridMambaAttentionDynamicCache class is assembled from different models, so there may be some inconsistencies.

Could you confirm the above issues and share if iterative generation with cache works or not? Thanks.

Sign up or log in to comment