Cosmos
Safetensors
NeMo
nvidia

Can anyone give me a hint on how google T5 model is involved in the generation process?

#3
by junyaoren - opened

Can anyone give me a hint on how google T5 model is involved in the generation process? Since during inference, this model was downloaded? Is it used for prompt upsampling?

The T5-XXL is used for the linguistic context and text conditioning of text inputs.
In the architecture, each transformer block uses a sequential self attention layer(for spatiotemporal tokens), followed by a cross-attention layer(here semantic context is integrated using T5-XXL), followed by a FFN.

You can refer the "Cross-attention for text conditioning" in the architecture section of the cosmos paper if you like

https://research.nvidia.com/publication/2025-01_cosmos-world-foundation-model-platform-physical-ai

Hope this helped. πŸ˜ƒ

Sign up or log in to comment