max_new_tokens Issue
First of all, thank you for sharing such a great model.
I just wanted to report that in the VLM Example section of the example code in the model card, setting max_new_tokens does not seem to control the number of output tokens. On the other hand, setting max_length does apply the token limit as expected.
While max_new_tokens works as intended in the LLM Example section, it doesn't seem to function properly in the VLM Example.
Could you please check if max_new_tokens is being applied correctly?
Hello, and thank you for bringing this to our attention. We apologize for any inconvenience caused.
As you pointed out, We've noticed max_new_tokens is not being applied correctly.
This appears to be a bug in the modeling_hyperclovax, and we will correct it promptly.
Thank you for letting us know.
Model code has been updated. Can you try once more? (with removal of a local machine's hf-cache?)
Thank you for the quick feedback and update!
I checked the updated modeling_hyperclovax.py
and confirmed that **kwargs
has been added to the call to self.language_model.generate()
inside the generate()
function, which is great.
However, when I call generate()
and explicitly pass max_new_tokens
, the following error occurs:
TypeError: generate() got multiple values for keyword argument 'max_new_tokens'
The issue seems to come from this line inside the same function, where self.language_model.generate()
is called like this:
pred = self.language_model.generate(
...
max_new_tokens=max_length,
...
**kwargs,
)
In this structure, when max_new_tokens
is passed externally via **kwargs
, it conflicts with the explicitly set max_new_tokens=max_length
, resulting in the error above.
Additionally, since max_length
has a default value of 196
, this causes the model to always generate 196 tokens, regardless of the max_new_tokens
value provided externally. So even when the error is avoided, the max_new_tokens
parameter is effectively ignored.
Just wanted to share this in case further refinement is needed.
This is quite a serious bug. :(
The behaviors of max_length and max_new_tokens should clearly be different.
I've updated the code and completed some tests.
Thank you for bringing this to our attention.