Ksenia Se

Kseniase

AI & ML interests

None yet

Recent Activity

reacted to their post with 🚀 about 20 hours ago
8 types of RoPE As we always use Transformers, it's helpful to understand RoPE—Rotary Position Embedding. Since token order matters, RoPE encodes it by rotating token embeddings based on their position, so the model knows how to interpret which token comes first, second, and so on. Here are 8 types of RoPE that can be implemented in different cases: 1. Original RoPE -> https://huggingface.co/papers/2104.09864 Encodes token positions by rotating token embeddings in the complex plane via a position-based rotation matrix, thereby providing the self-attention mechanism with relative positional info. 2. LongRoPE -> https://huggingface.co/papers/2402.13753 Extends the context window of pre-trained LLMs to 2048k tokens, leveraging non-uniformities in positional interpolation with an efficient search. 3. LongRoPE2 -> https://huggingface.co/papers/2502.20082 Extends the effective context window of pre-trained LLMs to the target! length, rescaling RoPE guided by “needle-driven” perplexity. 4. Multimodal RoPE (MRoPE) -> https://huggingface.co/papers/2502.13923 Decomposes positional embedding into 3 components: temporal, height and width, so that positional features are aligned across modalities: text, images and videos. 5. Directional RoPE (DRoPE) -> https://huggingface.co/papers/2503.15029 Adds an identity scalar, improving how angles are handled without extra complexity. It helps balance accuracy, speed, and memory usage. 6. VideoRoPE -> https://huggingface.co/papers/2502.05173 Adapts RoPE for video, featuring 3D structure, low-frequency temporal allocation, diagonal layout, and adjustable spacing. 7. VRoPE -> https://huggingface.co/papers/2502.11664 An another RoPE for video, which restructures positional indices and balances encoding for uniform spatial focus. 8. XPos (Extrapolatable Position Embedding) -> https://huggingface.co/papers/2212.10 Introduces an exponential decay factor into the rotation matrix​, improving stability on long sequences.
View all activity

Organizations

Turing Post's profile picture Journalists on Hugging Face's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture Sandbox's profile picture

Kseniase's activity

published an article 6 days ago
view article
Article

What is Qwen-Agent framework? Inside the Qwen family

By Kseniase and 1 other
6
published an article 8 days ago
view article
Article

🌁#92: Fight for Developers and the Year of Orchestration

By Kseniase
5
published an article 9 days ago
view article
Article

🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?

By Kseniase
64
published an article 13 days ago
view article
Article

How to Reduce Memory Use in Reasoning Models

By Kseniase and 1 other
11
published an article 16 days ago
view article
Article

🌁#91: We are failing in AI literacy

By Kseniase and 1 other
3
published an article 16 days ago
view article
Article

🌁#90: Why AI’s Reasoning Tests Keep Failing Us

By Kseniase
9
published an article 16 days ago
view article
Article

🦸🏻#13: Action! How AI Agents Execute Tasks with UI and API Tools

By Kseniase
8
published an article 17 days ago
view article
Article

🦸🏻#12: How Do Agents Learn from Their Own Mistakes? The Role of Reflection in AI

By Kseniase
5
published an article 20 days ago
view article
Article

Everything You Need to Know about Knowledge Distillation

By Kseniase and 1 other
18
published an article 27 days ago
published an article 29 days ago
view article
Article

🌁#89: AI in Action: How AI Engineers, Self-Optimizing Models, and Humanoid Robots Are Reshaping 2025

By Kseniase
4
published an article about 1 month ago
view article
Article

🦸🏻#11: How Do Agents Plan and Reason?

By Kseniase
10
published an article about 1 month ago
published an article about 1 month ago
published an article about 1 month ago
published an article about 1 month ago
view article
Article

Topic 27: What are Chain-of-Agents and Chain-of-RAG?

By Kseniase and 1 other
12
published an article about 1 month ago
published an article about 2 months ago
view article
Article

What is test-time compute and how to scale it?

By Kseniase and 1 other
63