Using google/gemma-3-4b-it as asssitant model for speculative decoding does not work

#51
by sayambhu - opened

I get a warning

An assistant model is provided, using a dynamic cache instead of a cache of type='hybrid'.
generation_config default values have been modified to match model-specific defaults: {'cache_implementation': 'hybrid'}. If this is not desired, please set these values explicitly.

and then an error

for idx in range(len(past_key_values)):
TypeError: object of type 'HybridCache' has no len()

How to fix it?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment