Using google/gemma-3-4b-it as asssitant model for speculative decoding does not work
#51
by
sayambhu
- opened
I get a warning
An assistant model is provided, using a dynamic cache instead of a cache of type='hybrid'.generation_config
default values have been modified to match model-specific defaults: {'cache_implementation': 'hybrid'}. If this is not desired, please set these values explicitly.
and then an error
for idx in range(len(past_key_values)):
TypeError: object of type 'HybridCache' has no len()
How to fix it?