Tolga Cangöz's picture

Tolga Cangöz

tolgacangoz

AI & ML interests

AIGC

Recent Activity

reacted to Kseniase's post with ❤️ 1 day ago
8 types of RoPE As we always use Transformers, it's helpful to understand RoPE—Rotary Position Embedding. Since token order matters, RoPE encodes it by rotating token embeddings based on their position, so the model knows how to interpret which token comes first, second, and so on. Here are 8 types of RoPE that can be implemented in different cases: 1. Original RoPE -> https://huggingface.co/papers/2104.09864 Encodes token positions by rotating token embeddings in the complex plane via a position-based rotation matrix, thereby providing the self-attention mechanism with relative positional info. 2. LongRoPE -> https://huggingface.co/papers/2402.13753 Extends the context window of pre-trained LLMs to 2048k tokens, leveraging non-uniformities in positional interpolation with an efficient search. 3. LongRoPE2 -> https://huggingface.co/papers/2502.20082 Extends the effective context window of pre-trained LLMs to the target! length, rescaling RoPE guided by “needle-driven” perplexity. 4. Multimodal RoPE (MRoPE) -> https://huggingface.co/papers/2502.13923 Decomposes positional embedding into 3 components: temporal, height and width, so that positional features are aligned across modalities: text, images and videos. 5. Directional RoPE (DRoPE) -> https://huggingface.co/papers/2503.15029 Adds an identity scalar, improving how angles are handled without extra complexity. It helps balance accuracy, speed, and memory usage. 6. VideoRoPE -> https://huggingface.co/papers/2502.05173 Adapts RoPE for video, featuring 3D structure, low-frequency temporal allocation, diagonal layout, and adjustable spacing. 7. VRoPE -> https://huggingface.co/papers/2502.11664 An another RoPE for video, which restructures positional indices and balances encoding for uniform spatial focus. 8. XPos (Extrapolatable Position Embedding) -> https://huggingface.co/papers/2212.10 Introduces an exponential decay factor into the rotation matrix​, improving stability on long sequences.
reacted to Kseniase's post with 👀 1 day ago
8 types of RoPE As we always use Transformers, it's helpful to understand RoPE—Rotary Position Embedding. Since token order matters, RoPE encodes it by rotating token embeddings based on their position, so the model knows how to interpret which token comes first, second, and so on. Here are 8 types of RoPE that can be implemented in different cases: 1. Original RoPE -> https://huggingface.co/papers/2104.09864 Encodes token positions by rotating token embeddings in the complex plane via a position-based rotation matrix, thereby providing the self-attention mechanism with relative positional info. 2. LongRoPE -> https://huggingface.co/papers/2402.13753 Extends the context window of pre-trained LLMs to 2048k tokens, leveraging non-uniformities in positional interpolation with an efficient search. 3. LongRoPE2 -> https://huggingface.co/papers/2502.20082 Extends the effective context window of pre-trained LLMs to the target! length, rescaling RoPE guided by “needle-driven” perplexity. 4. Multimodal RoPE (MRoPE) -> https://huggingface.co/papers/2502.13923 Decomposes positional embedding into 3 components: temporal, height and width, so that positional features are aligned across modalities: text, images and videos. 5. Directional RoPE (DRoPE) -> https://huggingface.co/papers/2503.15029 Adds an identity scalar, improving how angles are handled without extra complexity. It helps balance accuracy, speed, and memory usage. 6. VideoRoPE -> https://huggingface.co/papers/2502.05173 Adapts RoPE for video, featuring 3D structure, low-frequency temporal allocation, diagonal layout, and adjustable spacing. 7. VRoPE -> https://huggingface.co/papers/2502.11664 An another RoPE for video, which restructures positional indices and balances encoding for uniform spatial focus. 8. XPos (Extrapolatable Position Embedding) -> https://huggingface.co/papers/2212.10 Introduces an exponential decay factor into the rotation matrix​, improving stability on long sequences.
liked a Space 11 days ago
Remade-AI/remade-effects
View all activity

Organizations

Spaces-explorers's profile picture Blog-explorers's profile picture open/ acc's profile picture

tolgacangoz's activity

New activity in tencent/HunyuanVideo-I2V 12 days ago
New activity in modelscope/AnyText 27 days ago

Runtime error

2
#8 opened 28 days ago by
tolgacangoz
New activity in pcuenq/mdm 5 months ago
New activity in madebyollin/megalith-10m 6 months ago

Update README.md

1
#7 opened 6 months ago by
tolgacangoz
New activity in pcuenq/mdm-flickr-64 6 months ago

Update the license to MIT

#2 opened 6 months ago by
tolgacangoz
New activity in pcuenq/mdm-flickr-256 6 months ago

Update the license to MIT

#1 opened 6 months ago by
tolgacangoz
New activity in pcuenq/mdm-flickr-1024 6 months ago

Update the license to MIT

#1 opened 6 months ago by
tolgacangoz

FP16 vs FP32?

1
#48 opened 7 months ago by
tolgacangoz
New activity in diffusers/controlnet-zoe-depth-sdxl-1.0 7 months ago

Fix cpu offloading

3
#5 opened 7 months ago by
tolgacangoz
New activity in lllyasviel/sd-controlnet-canny 7 months ago

Update the bird's url

1
#6 opened almost 2 years ago by
lz1oceani
New activity in diffusers/controlnet-zoe-depth-sdxl-1.0 7 months ago

Update README.md

#6 opened 7 months ago by
tolgacangoz
New activity in a-r-r-o-w/AnyText 9 months ago

Fix `eps`

2
#1 opened 9 months ago by
tolgacangoz
New activity in ali-vilab/i2vgen-xl 10 months ago

The link is broken.

1
#14 opened 10 months ago by
tolgacangoz
New activity in diffusers/controlnet-depth-sdxl-1.0 11 months ago

Fix higher vRAM usage

1
#10 opened 11 months ago by
tolgacangoz
New activity in mfidabel/controlnet-segment-anything over 1 year ago

Runtime Error

#2 opened over 1 year ago by
tolgacangoz