Custom `generate` methods discussion
Hi everyone ๐
This thread aims to centralize the discussion about custom generate
methods. Questions about how it works, improvement suggestions, and requests for new generation methods are welcome!
Resources:
๐ docs
๐ all custom generate methods
Hi @joaogante
We want to integrate our method into the transformers-community
(advised by your official team.). See details in https://github.com/huggingface/transformers/pull/38824
Could you please provide us with the detailed procedures on how to create a new pull request to this transformers-community
?
Summary:
We want to merge https://huggingface.co/Gausson/sep_cache
[ICML 2025] into transformers-community/sep_cache
but we couldnโt find a pull request option on the Hugging Face Hubโcould you guide me on the correct procedure?
Best Regards
Hi @Gausson , amazing contribution! We'll invite you as a Contributor to the organization so you can transfer your repo using the "Rename or transfer this model" section of your repo's settings. This way, the model URL will be automatically redirected, and downloads and like counts will be preserved.
Hey @Gausson !
Veeery cool repo -- I'm impressed with the complexity of what you've build with custom_generate
, and with how well documented it is ๐ฅ
I'm going to invite you as a contributor
to transformers-community
, so you can move your repo there. In a nutshell, you would still have complete control over the Hub repo, but it gets higher visibility because of the org. After it is moved, we're going to create a new collection of community-contributed custom_generate
repos. Let us know if you have any questions! ๐ค
Meanwhile, two minor comments to the repo itself:
- On
transformers==4.54
, we've released a cache refator: caches are built as a composition of layers, as opposed to being isolated objects. Sadly, I think your current abstraction doesn't work with it. You might want to update references fromtransformers>=4.53
totransformers==4.53
; - On your demo script in
README.md
, I suggest addingtorch_dtype=torch.bfloat16
when loading the model, so that the demo is immediately compatible with 24GB GPUs.
Discussion centralized from this GH issue and this Hub issue.
Thank you very much for your reply.
I have merged sep_cache
into the transformers-community
repo. Next, I will modify my README.md
according to your suggestions later. When you have time, please kindly add sep_cache
to the collection of custom_generate
, etc. Thank you very much. ๐๐๐