Reinforcement Learning (RL) won't stuck in the same old PPO loop - in the last two months alone, researchers have introduced a new wave of techniques, reshaping how we train and fine-tune LLMs, VLMs, and agents.
Here are 9 fresh policy optimization techniques worth knowing:
1. GSPO: Group Sequence Policy Optimization → Group Sequence Policy Optimization (2507.18071) Shifts from token-level to sequence-level optimization, clipping, and rewarding to capture the full picture and increase stability compared to GRPO. GSPO-token variation also allows token-level fine-tuning.
3. HBPO: Hierarchical Budget Policy Optimization → Hierarchical Budget Policy Optimization for Adaptive Reasoning (2507.15844) This one trains model to adapt reasoning depth based on problem complexity. It divides training samples into subgroups with different token budgets, using budget-aware rewards to align reasoning effort with task difficulty.
5. RePO: Replay-Enhanced Policy Optimization → RePO: Replay-Enhanced Policy Optimization (2506.09340) Introduces a replay buffer into on-policy RL for LLMs, retrieving diverse off-policy samples for each prompt to broaden the training data per prompt
Time to look at some free useful resources that can help you upgrade your knowledge of AI and machine learning! Today we offer you these 6 must-read surveys that can be your perfect guides to the major fields and techniques:
1. Foundations of Large Language Models by Tong Xiao and Jingbo Zhu → https://arxiv.org/abs/2501.09223 Many recommend this 270-page book as a good resource to focus on fundamental concepts, such as pre-training, generative models, prompting, alignment, and inference
2. Large Language Models Post-Training: Surveying Techniques from Alignment to Reasoning -> A Survey on Post-training of Large Language Models (2503.06072) Read this to master policy optimization (RLHF, DPO, GRPO), supervised and parameter-efficient fine-tuning, reasoning, integration, and adaptation techniques
3. Agentic Large Language Models, a survey by Leiden University → https://arxiv.org/abs/2503.23037 Surveys agentic LLMs across reasoning, tools, and multi-agent collaboration, highlighting their synergy. It also explores their promise, risks and applications in medicine, finance, science.
4. A Survey of Context Engineering for Large Language Models → A Survey of Context Engineering for Large Language Models (2507.13334) Defines Context Engineering as systematic info design for LLMs beyond prompting, covering retrieval, processing, management, and architectures like RAG and multi-agent systems
5. A Survey of Generative Categories and Techniques in Multimodal Large Language Models → https://arxiv.org/abs/2506.10016 Covers multimodal models, exploring six generative modalities, key techniques (SSL, RLHF, CoT), architectural trends, and challenges
6. Large Language models for Time Series Analysis: Techniques, Applications, and Challenges → https://arxiv.org/abs/2506.11040 Explains how LLMs transform time series analysis by enhancing pattern recognition and long-term dependency handling + shows how to build them
LoRA (Low-Rank Adaptation) is a popular lightweight method for fine-tuning AI models. It doesn't update the full model, it adds small trainable components, low-rank matrices, while keeping the original weights frozen. Only these adapters are trained.
Recently, many interesting new LoRA variations came out, so it’s a great time to take a look at these 13 clever approaches:
2. SingLoRA → SingLoRA: Low Rank Adaptation Using a Single Matrix (2507.05566) Simplifies LoRA by using only one small matrix instead of usual two, and multiplying it by its own transpose (like A × Aᵀ). It uses half the parameters of LoRA and avoids scale mismatch between different matrices
MCP is redefining how AI assistants connect to the world of data and tools, so no wonder MCP servers are in high demand now. That’s why we’ve curated 13 cool MCP servers to upgrade your workflow:
1. Hugging Face Official MCP Server -> https://github.com/evalstate/hf-mcp-server Provides an access and interaction with Hugging Face models, datasets, and Gradio Spaces for dynamic tool integration and configuration across environments.
2. Browser MCP -> https://browsermcp.io/ An MCP server +Chrome extension. It allows to automate your browser with AI apps like VS Code, Claude, Cursor, and Windsurf.
3. Bright Data MCP -> https://github.com/brightdata/brightdata-mcp This one is for working with data in real-time: searching the web, navigating websites, taking action and retrieving data.
5. Octagon Deep Research MCP -> https://github.com/OctagonAI/octagon-deep-research-mcp Allows for deep research via AI agents, integrating seamlessly with MCP clients like Claude Desktop and Cursor for powerful, unlimited research capabilities.
Deep Research agents are quickly becoming our daily co-workers — built for complex investigations, not just chat. With modular architecture, advanced tool use and real web access, they go far beyond typical AI. While big-name agents get the spotlight, we want to highlight some powerful recent open-source alternatives:
1. DeerFlow -> https://github.com/bytedance/deer-flow A modular multi-agent system combining LMs and tools for automated research and code analysis. It links a coordinator, planner, team of specialized agent, and reporter, and converts reports to speech via Text-to-Speech (TTS)
2. Alita -> https://github.com/CharlesQ9/Alita Uses a single problem-solving module for scalable reasoning through simplicity. It self-evolves by generating and reusing Model Context Protocols (MCPs) from open-source tools to build external capabilities for diverse tasks
3. WebThinker -> https://github.com/RUC-NLPIR/WebThinker Lets reasoning models autonomously search the web and navigate pages. Deep Web Explorer allows interaction with links and follow-up searches. Through a Think-Search-and-Draft process models generate and refine reports in real time. RL training with preference pairs improves the workflow
4. SimpleDeepSearcher -> https://github.com/RUCAIBox/SimpleDeepSearcher A lightweight framework showing that supervised fine-tuning is a real alternative to complex RL, using simulated web interactions and multi-criteria curation to generate high-quality training data
5. AgenticSeek -> https://github.com/Fosowl/agenticSeek A private, on-device assistant that picks the best agent expert for browsing, coding, or planning—no cloud needed. Includes voice input via speech-to-text
6. Suna -> https://github.com/kortix-ai/suna Offers web browsing, file and doc handling, CLI execution, site deployment, and API/service integration—all in one assistant
Subscribe to the Turing Post:https://www.turingpost.com/subscribe Read further ⬇️
Everyone’s chasing top reasoning, but sometimes it's still the bottleneck for many real-world tasks. This week, let's spotlight some powerful techniques that have shown promise in helping LLMs achieve more consistent logic, planning, and depth:
3. Visual Scratchpads, or multimodal reasoning support -> Imagine while Reasoning in Space: Multimodal Visualization-of-Thought (2501.07542) Using structured visual inputs or sketchable intermediate steps (diagrams, grids, trees) boosts performance in tasks like planning, geometry, and multi-agent simulation. In real practice thanks to this GPT-4o, Claude, and Gemini show marked improvement
4. System 1 vs System 2 Prompt switching -> Adaptive Deep Reasoning: Triggering Deep Thinking When Needed (2505.20101) Changing a fast, intuitive response prompt with a slow, deliberate reasoning mode is among the most popular AI trends. E.g., models tend to respond more reliably when explicitly instructed to “think like a researcher.” This can also reduce hallucinations in open-ended generation and debate tasks
5. Adversarial Self-Chat Fine-Tuning -> Self-playing Adversarial Language Game Enhances LLM Reasoning (2404.10642) Generate debates between model variants or model vs human, then fine-tune on the winner’s response. It helps models learn to better defend their reasoning. Used in Claude’s Constitutional AI and SPPO-style tuning
Since Meta released the newest V-JEPA 2 this week, we thought it's a good time to revisit a few other interesting JEPA variants. JEPA, or Joint Embedding Predictive Architecture, a self-supervised learning framework that predicts the latent representation of a missing part of the input.
Here are 11 JEPA types that you should know about:
3. Denoising JEPA (D-JEPA) -> Denoising with a Joint-Embedding Predictive Architecture (2410.03755) Combines JEPA with diffusion techniques. By treating JEPA as masked image modeling and next-token prediction, D-JEPA generates data auto-regressively, incorporating diffusion and flow-matching losses
Let’s refresh some fundamentals today to stay fluent in the what we all work with. Here are some of the most popular model types that shape the vast world of AI (with examples in the brackets):
1. LLM - Large Language Model (GPT, LLaMA) -> Large Language Models: A Survey (2402.06196) + history of LLMs: https://www.turingpost.com/t/The%20History%20of%20LLMs It's trained on massive text datasets to understand and generate human language. They are mostly build on Transformer architecture, predicting the next token. LLMs scale by increasing overall parameter count across all components (layers, attention heads, MLPs, etc.) 2. SLM - Small Language Model (TinyLLaMA, Phi models, SmolLM) A Survey of Small Language Models (2410.20011) Lightweight LM optimized for efficiency, low memory use, fast inference, and edge use. SLMs work using the same principles as LLMs
3. VLM - Vision-Language Model (CLIP, Flamingo) -> An Introduction to Vision-Language Modeling (2405.17247) Processes and understands both images and text. VLMs map images and text into a shared embedding space or generate captions/descriptions from both
4. MLLM - Multimodal Large Language Model (Gemini) -> A Survey on Multimodal Large Language Models (2306.13549) A large-scale model that can understand and process multiple types of data (modalities) — usually text + other formats, like images, videos, audio, structured data, 3D or spatial inputs. MLLMs can be LLMs extended with modality adapters or trained jointly across vision, text, audio, etc.
5. LAM - Large Action Model (InstructDiffusion, RT-2) -> Large Action Models: From Inception to Implementation (2412.10047) Understands and generates action sequences by predicting action tokens (discrete/continuous instructions) that guide agents. Trained on behavior datasets, LAMs generalize across tasks, environments, and modalities - video, sensor data, etc.
Read about LRM, MoE, SSM, RNN, CNN, SAM and LNN below👇
After writing the most read explanation of MCP on Hugging Face (https://huggingface.co/blog/Kseniase/mcp), we chose this 13 awesome MCP servers that you can work with:
1. Agentset MCP -> https://github.com/agentset-ai/mcp-server For efficient and quick building of intelligent, doc-based apps using open-source Agentset platform for RAG
2. GitHub MCP Server -> https://github.com/github/github-mcp-server Integrates GitHub APIs into your workflow, allowing to build AI tools and apps that interact with GitHub's ecosystem
5. Safe Local Python Executor -> https://github.com/maxim-saplin/mcp_safe_local_python_executor A lightweight tool for running LLM-generated Python code locally, using Hugging Face’s LocalPythonExecutor (from smolagents framework) and exposing it via MCP for AI assistant integration
7. Basic Memory -> https://memory.basicmachines.co/docs/introduction This knowledge management system connects to LLMs and lets you build a persistent semantic graph from AI conversations with AI agents
JEPA, or Joint Embedding Predictive Architecture, is an approach to building AI models introduced by Yann LeCun. It differs from transformers by predicting the representation of a missing or future part of the input, rather than the next token or pixel. This encourages conceptual understanding, not just low-level pattern matching. So JEPA allows teaching AI to reason abstractly.
7 Free resources to master Multi-Agent Systems (MAS)
Collective intelligence is the future of AI. Sometimes, a single agent isn't enough — a team of simpler, specialized agents working together to solve a task can be a much better option. Building Multi-Agent Systems (MAS) isn’t easy, that's why today we’re offering you a list of sources that may help you master MAS:
1. CrewAI tutorials -> https://docs.crewai.com/introduction#ready-to-start-building%3F At the end of the page you'll find a guide on how to build a crew of agents that research and analyze a topic, and create a report. Also, there are useful guides on how to build a single CrewAI agent and a workflow
2. Building with CAMEL multi-agent framework -> https://github.com/camel-ai/camel Offers guides, cookbooks and other useful information to build even million agent societies, explore and work with MAS
4. "Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations" by Yoav Shoham and Kevin Leyton-Brown -> https://www.masfoundations.org/download.html This book explains learning between agents, how multiple agents solve shared problems and communicate with focus on theory, practical examples and algorithms, diving into the game theory and logical approaches
Also, check out The Turing Post article about MAS -> https://www.turingpost.com/p/mas Our article can be a good starting guide for you to explore what MAS is, its components, architectures, types, top recent developments and current trends
When we need to align models' behavior with the desired objectives, we rely on specialized algorithms that support helpfulness, accuracy, reasoning, safety, and alignment with user preferences. Much of a model’s usefulness comes from post-training optimization methods.
Here are the main optimization algorithms (both classic and new) in one place:
1. PPO (Proximal Policy Optimization) -> Proximal Policy Optimization Algorithms (1707.06347) Clips the probability ratio to prevent the new policy from diverging too far from the old one. It helps keep everything stable
CoT has long been one of the hottest techniques in AI thanks to its effectiveness and compelling core idea: encouraging models to solve complex problems through explicit intermediate reasoning steps. But usually researchers modify original CoT approach, finding tips that further improve LLMs' reasoning. That's what we're going to talk about today.
Here's a list of 10 latest enhanced CoT approaches:
RL now is where the real action is, it's the engine behind autonomous tech, robots, and the next wave of AI that thinks, moves and solves problems on its own. To stay up to date with what’s happening in RL, we offer some fresh materials on it:
1. "Reinforcement Learning from Human Feedback" by Nathan Lambert -> https://rlhfbook.com/ It's a short introduction to RLHF, explaining instruction tuning, reward modeling, alignment methods, synthetic data, evaluation, and more
2. "A Course in Reinforcement Learning (2nd Edition)" by Dimitri P. Bertsekas -> https://www.mit.edu/~dimitrib/RLbook.html Explains dynamic programming (DP) and RL, diving into rollout algorithms, neural networks, policy learning, etc. It’s packed with solved exercises and real-world examples
4. "Multi-Agent Reinforcement Learning" by Stefano V. Albrecht, Filippos Christianos, and Lukas Schäfer -> https://www.marl-book.com/ Covers models, core ideas of multi-agent RL (MARL) and modern approaches to combining it with deep learning
5. "Reinforcement Learning: A Comprehensive Overview" by Kevin P. Murphy -> https://arxiv.org/pdf/2412.05265 Explains RL and sequential decision making, covering value-based, policy-gradient, model-based, multi-agent RL methods, RL+LLMs, and RL+inference and other topics
RAG is evolving fast, keeping pace with cutting-edge AI trends. Today it becomes more agentic and smarter at navigating complex structures like hypergraphs.
For the last couple of weeks a large amount of studies on inference-time scaling has emerged. And it's so cool, because each new paper adds a trick to the toolbox, making LLMs more capable without needing to scale parameter count of the models.
So here are 13 new methods + 3 comprehensive studies on test-time scaling:
3. Z1: Efficient Test-time Scaling with Code (2504.00810) Proposes to train LLMs on code-based reasoning paths to make test-time scaling more efficient, limiting unnecessary tokens with a special dataset and a Shifted Thinking Window
AI inference refers to the process when AI models generate predictions, classifications, or decisions based on input data and pre-trained models. It encompasses a wide range of approaches with different computational methods and deployment.
Firstly, here are 5 inference types, based on how the model reasons:
1. Probabilistic inference -> https://arxiv.org/pdf/2502.05244 Uses probability theory to reason under uncertainty. The system maintains degrees of belief over hypotheses and updates them as evidence comes in.
3. Logical inference -> https://arxiv.org/abs/2009.03393 Uses formal logic to draw conclusions that are guaranteed true if the premises are. It supports theorem proving, logic programming, and tasks needing correctness, like software verification.
How Chain-of-Thought (CoT) prompting can unlock models' full potential across images, video, audio and more? Finding special multimodal CoT techniques is the answer.
Here are 9 methods of Multimodal Chain-of-Thought (MCoT). Most of them are open-source: