Ksenia Se's picture

1 99 3

Ksenia Se

Kseniase

·

https://www.turingpost.com/

AI & ML interests

None yet

Recent Activity

replied to their post 4 days ago

9 new policy optimization techniques Reinforcement Learning (RL) won't stuck in the same old PPO loop - in the last two months alone, researchers have introduced a new wave of techniques, reshaping how we train and fine-tune LLMs, VLMs, and agents. Here are 9 fresh policy optimization techniques worth knowing: 1. GSPO: Group Sequence Policy Optimization → https://huggingface.co/papers/2507.18071 Shifts from token-level to sequence-level optimization, clipping, and rewarding to capture the full picture and increase stability compared to GRPO. GSPO-token variation also allows token-level fine-tuning. 2. LAPO: Length-Adaptive Policy Optimization → https://huggingface.co/papers/2507.15758 A two-stage RL framework that trains models to adaptively control reasoning length by learning typical solution lengths for shorter and more efficient reasoning. 3. HBPO: Hierarchical Budget Policy Optimization → https://huggingface.co/papers/2507.15844 This one trains model to adapt reasoning depth based on problem complexity. It divides training samples into subgroups with different token budgets, using budget-aware rewards to align reasoning effort with task difficulty. 4. SOPHIA: Semi-off-policy reinforcement learning → https://huggingface.co/papers/2507.16814 Combines on-policy visual understanding from the Vision Language Models (VLMs) with off-policy reasoning from an LM, assigning outcome-based rewards and propagating visual rewards backward through the reasoning steps. 5. RePO: Replay-Enhanced Policy Optimization → https://huggingface.co/papers/2506.09340 Introduces a replay buffer into on-policy RL for LLMs, retrieving diverse off-policy samples for each prompt to broaden the training data per prompt Read further below ⬇️ If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

posted an update 4 days ago

9 new policy optimization techniques Reinforcement Learning (RL) won't stuck in the same old PPO loop - in the last two months alone, researchers have introduced a new wave of techniques, reshaping how we train and fine-tune LLMs, VLMs, and agents. Here are 9 fresh policy optimization techniques worth knowing: 1. GSPO: Group Sequence Policy Optimization → https://huggingface.co/papers/2507.18071 Shifts from token-level to sequence-level optimization, clipping, and rewarding to capture the full picture and increase stability compared to GRPO. GSPO-token variation also allows token-level fine-tuning. 2. LAPO: Length-Adaptive Policy Optimization → https://huggingface.co/papers/2507.15758 A two-stage RL framework that trains models to adaptively control reasoning length by learning typical solution lengths for shorter and more efficient reasoning. 3. HBPO: Hierarchical Budget Policy Optimization → https://huggingface.co/papers/2507.15844 This one trains model to adapt reasoning depth based on problem complexity. It divides training samples into subgroups with different token budgets, using budget-aware rewards to align reasoning effort with task difficulty. 4. SOPHIA: Semi-off-policy reinforcement learning → https://huggingface.co/papers/2507.16814 Combines on-policy visual understanding from the Vision Language Models (VLMs) with off-policy reasoning from an LM, assigning outcome-based rewards and propagating visual rewards backward through the reasoning steps. 5. RePO: Replay-Enhanced Policy Optimization → https://huggingface.co/papers/2506.09340 Introduces a replay buffer into on-policy RL for LLMs, retrieving diverse off-policy samples for each prompt to broaden the training data per prompt Read further below ⬇️ If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

posted an update 11 days ago

6 Essential Reads on core AI/ML topics: Time to look at some free useful resources that can help you upgrade your knowledge of AI and machine learning! Today we offer you these 6 must-read surveys that can be your perfect guides to the major fields and techniques: 1. Foundations of Large Language Models by Tong Xiao and Jingbo Zhu → https://arxiv.org/abs/2501.09223 Many recommend this 270-page book as a good resource to focus on fundamental concepts, such as pre-training, generative models, prompting, alignment, and inference 2. Large Language Models Post-Training: Surveying Techniques from Alignment to Reasoning -> https://huggingface.co/papers/2503.06072 Read this to master policy optimization (RLHF, DPO, GRPO), supervised and parameter-efficient fine-tuning, reasoning, integration, and adaptation techniques 3. Agentic Large Language Models, a survey by Leiden University → https://arxiv.org/abs/2503.23037 Surveys agentic LLMs across reasoning, tools, and multi-agent collaboration, highlighting their synergy. It also explores their promise, risks and applications in medicine, finance, science. 4. A Survey of Context Engineering for Large Language Models → https://huggingface.co/papers/2507.13334 Defines Context Engineering as systematic info design for LLMs beyond prompting, covering retrieval, processing, management, and architectures like RAG and multi-agent systems 5. A Survey of Generative Categories and Techniques in Multimodal Large Language Models → https://arxiv.org/abs/2506.10016 Covers multimodal models, exploring six generative modalities, key techniques (SSL, RLHF, CoT), architectural trends, and challenges 6. Large Language models for Time Series Analysis: Techniques, Applications, and Challenges → https://arxiv.org/abs/2506.11040 Explains how LLMs transform time series analysis by enhancing pattern recognition and long-term dependency handling + shows how to build them Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe

View all activity

Organizations

published an article about 1 month ago

Article

What Coding Agent Wins?

By

and 1 other •

Jun 26

• 7

published an article 3 months ago

Article

🦸🏻#17: What is A2A and why is it – still! – underappreciated?

By

•

May 7

• 13

published an article 3 months ago

Article

What is MoE 2.0? Update Your Knowledge about Mixture-of-experts

By

and 1 other •

Apr 27

• 9

published an article 4 months ago

Article

Topic 33: Slim Attention, KArAt, XAttention and Multi-Token Attention Explained – What’s Really Changing in Transformers?

By

and 1 other •

Apr 4

• 14

published an article 4 months ago

Article

FOD#93: When AI meant Ambient Intelligence

By

•

Mar 25

• 1

published an article 4 months ago

Article

🎙️🧩 TP/Inference: Sharon Zhou on AI Hallucinations, Agents Hype, and Giving Developers the Keys to GenAI

By

•

Mar 24

• 2

published an article 4 months ago

Article

What is Qwen-Agent framework? Inside the Qwen family

By

and 1 other •

Mar 20

• 12

published an article 4 months ago

Article

🌁#92: Fight for Developers and the Year of Orchestration

By

•

Mar 18

• 5

published an article 5 months ago

Article

🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?

By

•

Mar 17

• 322

published an article 5 months ago

Article

How to Reduce Memory Use in Reasoning Models

By

and 1 other •

Mar 13

• 14

published an article 5 months ago

Article

🌁#91: We are failing in AI literacy

By

and 1 other •

Mar 10

• 3

published an article 5 months ago

Article

🌁#90: Why AI’s Reasoning Tests Keep Failing Us

By

•

Mar 3

• 9

published an article 5 months ago

Article

🦸🏻#13: Action! How AI Agents Execute Tasks with UI and API Tools

By

•

Mar 10

• 9

published an article 5 months ago

Article

🦸🏻#12: How Do Agents Learn from Their Own Mistakes? The Role of Reflection in AI

By

•

Mar 9

• 8

published an article 5 months ago

Article

Everything You Need to Know about Knowledge Distillation

By

and 1 other •

Mar 6

• 38

published an article 5 months ago

Article

Inside the family of Smol models

By

and 1 other •

Feb 27

• 13

published an article 5 months ago

Article

🌁#89: AI in Action: How AI Engineers, Self-Optimizing Models, and Humanoid Robots Are Reshaping 2025

By

•

Feb 25

• 4

published an article 5 months ago

Article

🦸🏻#11: How Do Agents Plan and Reason?

By

•

Feb 24

• 15

published an article 5 months ago

Article

Topic 28: What is Mixture-of-Mamba?

By

and 1 other •

Feb 20

• 3

published an article 5 months ago

Article

🌁#88: Can DeepSeek Inspire Global Collaboration?

By

•

Feb 17

• 3