1 99 3

Ksenia Se

Kseniase

https://www.turingpost.com/

AI & ML interests

None yet

Recent Activity

replied to their post 4 days ago

9 new policy optimization techniques Reinforcement Learning (RL) won't stuck in the same old PPO loop - in the last two months alone, researchers have introduced a new wave of techniques, reshaping how we train and fine-tune LLMs, VLMs, and agents. Here are 9 fresh policy optimization techniques worth knowing: 1. GSPO: Group Sequence Policy Optimization → https://huggingface.co/papers/2507.18071 Shifts from token-level to sequence-level optimization, clipping, and rewarding to capture the full picture and increase stability compared to GRPO. GSPO-token variation also allows token-level fine-tuning. 2. LAPO: Length-Adaptive Policy Optimization → https://huggingface.co/papers/2507.15758 A two-stage RL framework that trains models to adaptively control reasoning length by learning typical solution lengths for shorter and more efficient reasoning. 3. HBPO: Hierarchical Budget Policy Optimization → https://huggingface.co/papers/2507.15844 This one trains model to adapt reasoning depth based on problem complexity. It divides training samples into subgroups with different token budgets, using budget-aware rewards to align reasoning effort with task difficulty. 4. SOPHIA: Semi-off-policy reinforcement learning → https://huggingface.co/papers/2507.16814 Combines on-policy visual understanding from the Vision Language Models (VLMs) with off-policy reasoning from an LM, assigning outcome-based rewards and propagating visual rewards backward through the reasoning steps. 5. RePO: Replay-Enhanced Policy Optimization → https://huggingface.co/papers/2506.09340 Introduces a replay buffer into on-policy RL for LLMs, retrieving diverse off-policy samples for each prompt to broaden the training data per prompt Read further below ⬇️ If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

posted an update 4 days ago

posted an update 11 days ago

6 Essential Reads on core AI/ML topics: Time to look at some free useful resources that can help you upgrade your knowledge of AI and machine learning! Today we offer you these 6 must-read surveys that can be your perfect guides to the major fields and techniques: 1. Foundations of Large Language Models by Tong Xiao and Jingbo Zhu → https://arxiv.org/abs/2501.09223 Many recommend this 270-page book as a good resource to focus on fundamental concepts, such as pre-training, generative models, prompting, alignment, and inference 2. Large Language Models Post-Training: Surveying Techniques from Alignment to Reasoning -> https://huggingface.co/papers/2503.06072 Read this to master policy optimization (RLHF, DPO, GRPO), supervised and parameter-efficient fine-tuning, reasoning, integration, and adaptation techniques 3. Agentic Large Language Models, a survey by Leiden University → https://arxiv.org/abs/2503.23037 Surveys agentic LLMs across reasoning, tools, and multi-agent collaboration, highlighting their synergy. It also explores their promise, risks and applications in medicine, finance, science. 4. A Survey of Context Engineering for Large Language Models → https://huggingface.co/papers/2507.13334 Defines Context Engineering as systematic info design for LLMs beyond prompting, covering retrieval, processing, management, and architectures like RAG and multi-agent systems 5. A Survey of Generative Categories and Techniques in Multimodal Large Language Models → https://arxiv.org/abs/2506.10016 Covers multimodal models, exploring six generative modalities, key techniques (SSL, RLHF, CoT), architectural trends, and challenges 6. Large Language models for Time Series Analysis: Techniques, Applications, and Challenges → https://arxiv.org/abs/2506.11040 Explains how LLMs transform time series analysis by enhancing pattern recognition and long-term dependency handling + shows how to build them Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe

View all activity

Organizations

replied to their post 4 days ago

CISPO: Clipped Importance Sampling Policy Optimization →
https://huggingface.co/papers/2506.13585
This RL algorithm from the MiniMax-M1 project clips importance-sampling weights instead of per-token updates. This lets all tokens (even rare but crucial ones) contribute to learning, avoiding the token-level clipping. CISPO also avoids KL penalties and uses group relative advantage like GRPO.
PAPO: Perception-Aware Policy Optimization → https://huggingface.co/papers/2507.06448
Enhances RL in vision-language tasks by adding a KL-based perception loss to the GRPO objective for better visual alignment during training. It boosts accuracy by 4–8% and reduces perception errors by ~30%.
OPO: On-Policy RL with Optimal Baseline → https://huggingface.co/papers/2505.23585
A simplified RL algorithm from Microsoft that enforces strict on-policy training by using freshly sampled outputs from the current policy for every update, minimizing off-policy drift. It minimizes gradient variance, avoiding auxiliary models and regularization.
EXPO: Expressive Policy Optimization → https://huggingface.co/papers/2507.07986
Trains complex policies by pairing a large base model with a lightweight edit policy that suggests better actions, selecting the best of both without backpropagating through the base.

posted an update 4 days ago

Post

4743

9 new policy optimization techniques

Reinforcement Learning (RL) won't stuck in the same old PPO loop - in the last two months alone, researchers have introduced a new wave of techniques, reshaping how we train and fine-tune LLMs, VLMs, and agents.

Here are 9 fresh policy optimization techniques worth knowing:

1. GSPO: Group Sequence Policy Optimization → Group Sequence Policy Optimization (2507.18071)
Shifts from token-level to sequence-level optimization, clipping, and rewarding to capture the full picture and increase stability compared to GRPO. GSPO-token variation also allows token-level fine-tuning.

2. LAPO: Length-Adaptive Policy Optimization → LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization (2507.15758)
A two-stage RL framework that trains models to adaptively control reasoning length by learning typical solution lengths for shorter and more efficient reasoning.

3. HBPO: Hierarchical Budget Policy Optimization → Hierarchical Budget Policy Optimization for Adaptive Reasoning (2507.15844)
This one trains model to adapt reasoning depth based on problem complexity. It divides training samples into subgroups with different token budgets, using budget-aware rewards to align reasoning effort with task difficulty.

4. SOPHIA: Semi-off-policy reinforcement learning → Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning (2507.16814)
Combines on-policy visual understanding from the Vision Language Models (VLMs) with off-policy reasoning from an LM, assigning outcome-based rewards and propagating visual rewards backward through the reasoning steps.

5. RePO: Replay-Enhanced Policy Optimization → RePO: Replay-Enhanced Policy Optimization (2506.09340)
Introduces a replay buffer into on-policy RL for LLMs, retrieving diverse off-policy samples for each prompt to broaden the training data per prompt

Read further below ⬇️
If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

1 reply

posted an update 11 days ago

Post

6092

6 Essential Reads on core AI/ML topics:

Time to look at some free useful resources that can help you upgrade your knowledge of AI and machine learning!
Today we offer you these 6 must-read surveys that can be your perfect guides to the major fields and techniques:

1. Foundations of Large Language Models by Tong Xiao and Jingbo Zhu → https://arxiv.org/abs/2501.09223
Many recommend this 270-page book as a good resource to focus on fundamental concepts, such as pre-training, generative models, prompting, alignment, and inference

2. Large Language Models Post-Training: Surveying Techniques from Alignment to Reasoning -> A Survey on Post-training of Large Language Models (2503.06072)
Read this to master policy optimization (RLHF, DPO, GRPO), supervised and parameter-efficient fine-tuning, reasoning, integration, and adaptation techniques

3. Agentic Large Language Models, a survey by Leiden University → https://arxiv.org/abs/2503.23037
Surveys agentic LLMs across reasoning, tools, and multi-agent collaboration, highlighting their synergy. It also explores their promise, risks and applications in medicine, finance, science.

4. A Survey of Context Engineering for Large Language Models → A Survey of Context Engineering for Large Language Models (2507.13334)
Defines Context Engineering as systematic info design for LLMs beyond prompting, covering retrieval, processing, management, and architectures like RAG and multi-agent systems

5. A Survey of Generative Categories and Techniques in Multimodal Large Language Models → https://arxiv.org/abs/2506.10016
Covers multimodal models, exploring six generative modalities, key techniques (SSL, RLHF, CoT), architectural trends, and challenges

6. Large Language models for Time Series Analysis: Techniques, Applications, and Challenges → https://arxiv.org/abs/2506.11040
Explains how LLMs transform time series analysis by enhancing pattern recognition and long-term dependency handling + shows how to build them

Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe

1 reply

replied to their post 18 days ago

FreeLoRA → https://huggingface.co/papers/2507.01792
Enables training-free image generation with multiple subjects by fine-tuning each LoRA module on one subject. During inference, subject-aware activation applies modules only to their target tokens, ensuring clean, interference-free fusion.
LoRA-Augmented Generation (LAG) → https://huggingface.co/papers/2507.05346
Uses large collections of task-specific LoRA adapters without needing extra training or data. It selects and applies the most relevant adapters at each layer and token, exceling in knowledge-intensive tasks.
ARD-LoRA (Adaptive Rank Dynamic LoRA) → https://huggingface.co/papers/2506.18267
Adjusts the rank of LoRA adapters dynamically across transformer layers and heads by learning per-head scaling factors through a meta-objective. It balances performance, efficiency, using fewer parameters and reducing memory use.
WaRA → https://huggingface.co/papers/2506.24092
Designed for vision tasks, it uses wavelet transforms and decomposes weight updates into multiple resolutions, capturing both coarse and detailed patterns.
BayesLoRA → https://huggingface.co/papers/2506.22809
Adds uncertainty estimation to LoRA adapters using MC-Dropout, helping models gauge confidence in unfamiliar situations. It detects variance outside fine-tuned distributions, supporting more cautious and adaptive behavior of models.
Dual LoRA Learning (DLoRAL) → https://huggingface.co/papers/2506.15591
Trains two LoRA branches: C-LoRA captures temporal coherence from degraded input, while D-LoRA improves visual detail. It's used for video super-resolution that enhances both spatial detail and temporal consistency.
Safe Pruning LoRA (SPLoRA) → https://huggingface.co/papers/2506.18931
Improves the safety of LoRA-tuned LMs by selectively removing LoRA layers that reduce alignment, using a new E-DIEM metric to detect safety-related shifts without relying on data labels.
PLoP (Precise LoRA Placement) → https://huggingface.co/papers/2506.20629
A lightweight method that automatically selects optimal LoRA adapter placement during fine-tuning based on the model and task

posted an update 18 days ago

Post

5076

13 New types of LoRA

LoRA (Low-Rank Adaptation) is a popular lightweight method for fine-tuning AI models. It doesn't update the full model, it adds small trainable components, low-rank matrices, while keeping the original weights frozen. Only these adapters are trained.

Recently, many interesting new LoRA variations came out, so it’s a great time to take a look at these 13 clever approaches:

1. T-LoRA → T-LoRA: Single Image Diffusion Model Customization Without Overfitting (2507.05964)
A timestep-dependent LoRA method for adapting diffusion models with a single image. It dynamically adjusts updates and uses orthogonal initialization to reduce overlap, achieving better fidelity–alignment balance than standard LoRA

2. SingLoRA → SingLoRA: Low Rank Adaptation Using a Single Matrix (2507.05566)
Simplifies LoRA by using only one small matrix instead of usual two, and multiplying it by its own transpose (like A × Aᵀ). It uses half the parameters of LoRA and avoids scale mismatch between different matrices

3. LiON-LoRA → LiON-LoRA: Rethinking LoRA Fusion to Unify Controllable Spatial and Temporal Generation for Video Diffusion (2507.05678)
Improves control and precision in video diffusion models when training data is limited. It builds on LoRA, adding 3 key principles: linear scalability, orthogonality, and norm consistency. A controllable token and modified self-attention enables smooth adjustment of motion

4. LoRA-Mixer → LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing (2507.00029)
Combines LoRA and mixture-of-experts (MoE) to adapt LLMs for multiple tasks. It dynamically routes task-specific LoRA experts into linear projections of attention modules, supporting both joint training and frozen expert reuse

5. QR-LoRA → QR-LoRA: Efficient and Disentangled Fine-tuning via QR Decomposition for Customized Generation (2507.04599)
Separates content and style when combining multiple LoRA adapters. It implements QR decomposition to structure parameter updates, where the orthogonal Q matrix reduces interference between features, and the R matrix captures specific transformations

Read further in the comments 👇

If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

1 reply

replied to their post 25 days ago

AllVoiceLab MCP Server -> https://github.com/allvoicelab/AllVoiceLab-MCP
Enables AI agents to access advanced text-to-speech, voice conversion, and video translation APIs, powering use cases like global content localization, AI audiobooks, and voice-driven media production.
MCP Email Server -> https://github.com/Shy2593666979/mcp-server-email
For email functionality: write and send emails with multiple recipients, add and search files within specified directories.
Google Admin MCP Server -> https://github.com/securityfortech/google-admin-mcp
Manage Google Workspace users through the Admin Directory API (list, create, get info about users, etc.)
Android MCP Server -> https://github.com/minhalvp/android-mcp-server
Provides programmatic control over Android devices through ADB (Android Debug Bridge).
DeepView MCP -> https://github.com/ai-1st/deepview-mcp
Enables IDEs (Cursor, Windsurf, etc.) to analyze large codebases using Gemini's extensive context window.
Calculator MCP Server -> https://github.com/githejie/mcp-server-calculator
May sound easy, but it's essential for precise numerical calculations within LLMs
MCP Aggregator -> https://github.com/nazar256/combine-mcp
Combines multiple MCP servers into a single interface for more convenient use

posted an update 25 days ago

Post

6445

13 Outstanding MCP Servers

MCP is redefining how AI assistants connect to the world of data and tools, so no wonder MCP servers are in high demand now. That’s why we’ve curated 13 cool MCP servers to upgrade your workflow:

1. Hugging Face Official MCP Server -> https://github.com/evalstate/hf-mcp-server
Provides an access and interaction with Hugging Face models, datasets, and Gradio Spaces for dynamic tool integration and configuration across environments.

2. Browser MCP -> https://browsermcp.io/
An MCP server +Chrome extension. It allows to automate your browser with AI apps like VS Code, Claude, Cursor, and Windsurf.

3. Bright Data MCP -> https://github.com/brightdata/brightdata-mcp
This one is for working with data in real-time: searching the web, navigating websites, taking action and retrieving data.

4. JSON MCP -> https://github.com/VadimNastoyashchy/json-mcp
Interact with JSON files: split, merge, find specific data, and validate content within them.

5. Octagon Deep Research MCP -> https://github.com/OctagonAI/octagon-deep-research-mcp
Allows for deep research via AI agents, integrating seamlessly with MCP clients like Claude Desktop and Cursor for powerful, unlimited research capabilities.

6. VLM Run MCP Server -> https://docs.vlm.run/mcp/introduction
Provides an agent the ability to see, understand and process visual content.

Read further in the comments 👇

P.S.:
Our most read explanation of MCP on Hugging Face https://huggingface.co/blog/Kseniase/mcp

Our first list of 13 awesome MCP servers: https://huggingface.co/posts/Kseniase/204958200717570

If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

1 reply

replied to their post about 1 month ago

DeepResearcher -> https://github.com/GAIR-NLP/DeepResearcher
An RL framework for training deep research agents end-to-end in real-world environments with web search, exhibiting emergent behaviour like planning, multi-source validation, self-reflection, and honest defining when the agent doesn't know the answer
Search-R1 -> https://github.com/PeterGriffinJin/Search-R1
Features interleaved search access and an open-source RL training pipeline supporting various algorithms (PPO, GRPO, etc.), LLMs (LLaMA3, Qwen2.5, etc.), and search engines (online, local, retrievers)
ReCall -> https://github.com/Agent-RL/ReCall
Trains LLMs to reason with tools via RL, no supervised tool-use data needed. It enables agentic use of tools like OpenAI o3 and supports synthetic data generation across diverse environments and multi-step tasks
OWL -> https://github.com/camel-ai/owl
A framework built on CAMEL-AI framework enabling dynamic multi-agent collaboration for task automation across diverse domains

Here's an awesome study exploring the entire roadmap of Deep Research assistants. Don't forget to check it out -> https://huggingface.co/papers/2506.18096

posted an update about 1 month ago

Post

3553

10 Open-source Deep Research assistants

Deep Research agents are quickly becoming our daily co-workers — built for complex investigations, not just chat. With modular architecture, advanced tool use and real web access, they go far beyond typical AI. While big-name agents get the spotlight, we want to highlight some powerful recent open-source alternatives:

1. DeerFlow -> https://github.com/bytedance/deer-flow
A modular multi-agent system combining LMs and tools for automated research and code analysis. It links a coordinator, planner, team of specialized agent, and reporter, and converts reports to speech via Text-to-Speech (TTS)

2. Alita -> https://github.com/CharlesQ9/Alita
Uses a single problem-solving module for scalable reasoning through simplicity. It self-evolves by generating and reusing Model Context Protocols (MCPs) from open-source tools to build external capabilities for diverse tasks

3. WebThinker -> https://github.com/RUC-NLPIR/WebThinker
Lets reasoning models autonomously search the web and navigate pages. Deep Web Explorer allows interaction with links and follow-up searches. Through a Think-Search-and-Draft process models generate and refine reports in real time. RL training with preference pairs improves the workflow

4. SimpleDeepSearcher -> https://github.com/RUCAIBox/SimpleDeepSearcher
A lightweight framework showing that supervised fine-tuning is a real alternative to complex RL, using simulated web interactions and multi-criteria curation to generate high-quality training data

5. AgenticSeek -> https://github.com/Fosowl/agenticSeek
A private, on-device assistant that picks the best agent expert for browsing, coding, or planning—no cloud needed. Includes voice input via speech-to-text

6. Suna -> https://github.com/kortix-ai/suna
Offers web browsing, file and doc handling, CLI execution, site deployment, and API/service integration—all in one assistant

Subscribe to the Turing Post:https://www.turingpost.com/subscribe
Read further ⬇️

2 replies

upvoted 2 articles about 1 month ago

Article

Accidentally Building an AI Reasoning Research Ecosystem (Or: Can AI Stop Thinking?)

•

Jun 26

• 3

Article

What Coding Agent Wins?

and 1 other •

Jun 26

• 7

published an article about 1 month ago

Article

What Coding Agent Wins?

and 1 other •

Jun 26

• 7

replied to their post about 1 month ago

Constraint-Based Decoding -> https://huggingface.co/papers/2502.05111
Guide generation using hard constraints, like context-free grammar (CFG) rules. This keeps outputs aligned with task goals, especially in structured prediction or planning. Can be combined with symbolic solvers or logic-checking agents
Exploration Prompts (Explore-then-Pick) -> https://huggingface.co/papers/2506.09014
Generate multiple diverse responses via sampling, then use a learned Sample Set Aggregator (SSA), trained with reinforcement learning, to pick the best answer. Similar to “draft → verify” strategies, but the final selection is done via a trained model, not heuristics.
Prompt Perturbation Sampling for Inference -> https://huggingface.co/papers/2502.11027
From a pool of diverse model responses sampled with prompt perturbation, distill only the most elegant, logically consistent outputs to improve metrics like Pass@10. This is a post‑generation inference technique.
Prompt Ordering via Embedding Clustering -> https://openreview.net/pdf?id=1Iu2Yte5N6
Uncovers that few-shot prompt permutations form clusters in the model’s embedding space — especially by first demonstration — and uses this to design a cluster-based ordering method for generating strong in-context example sequences.
Controlled Prompting Variations -> https://huggingface.co/papers/2504.02111
Controlled “bad” prompts (like irrelevant info, misleading framing) expose fragilities in model reasoning. So use light adversarial prompting in evaluations to find breaking points. Plus remove irrelevant info to reduce confusion and improve focus; standardize format to minimize inconsistency and hallucination; and implement explicitly prompt reasoning to boost accuracy and transparency

posted an update about 1 month ago

Post

5400

10 Techniques for Boosting LLM Reasoning in 2025

Everyone’s chasing top reasoning, but sometimes it's still the bottleneck for many real-world tasks. This week, let's spotlight some powerful techniques that have shown promise in helping LLMs achieve more consistent logic, planning, and depth:

1. Retrieval-Augmented CoT Chaining (RAG+CoT) -> CoT-RAG: Integrating Chain of Thought and Retrieval-Augmented Generation to Enhance Reasoning in Large Language Models (2504.13534)
Combines Chain-of-Thought prompting with retrieval augmentation at intermediate steps. Relevant documents are fetched after each reasoning subgoal, updating context dynamically. Great for open-domain QA, math, logic and multi-hop fact-checking

2. Tool-use by example injection -> Self-Training Large Language Models for Tool-Use Without Demonstrations (2502.05867)
Injects few-shot tool interaction examples during training to implicitly teach calling patterns. Helps in plug-and-play tool use without training new architectures

3. Visual Scratchpads, or multimodal reasoning support -> Imagine while Reasoning in Space: Multimodal Visualization-of-Thought (2501.07542)
Using structured visual inputs or sketchable intermediate steps (diagrams, grids, trees) boosts performance in tasks like planning, geometry, and multi-agent simulation. In real practice thanks to this GPT-4o, Claude, and Gemini show marked improvement

4. System 1 vs System 2 Prompt switching -> Adaptive Deep Reasoning: Triggering Deep Thinking When Needed (2505.20101)
Changing a fast, intuitive response prompt with a slow, deliberate reasoning mode is among the most popular AI trends. E.g., models tend to respond more reliably when explicitly instructed to “think like a researcher.” This can also reduce hallucinations in open-ended generation and debate tasks

5. Adversarial Self-Chat Fine-Tuning -> Self-playing Adversarial Language Game Enhances LLM Reasoning (2404.10642)
Generate debates between model variants or model vs human, then fine-tune on the winner’s response. It helps models learn to better defend their reasoning. Used in Claude’s Constitutional AI and SPPO-style tuning

Read further below👇

Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe

2 replies

reacted to their post with 👍 about 1 month ago

Post

3531

11 Types of JEPA

Since Meta released the newest V-JEPA 2 this week, we thought it's a good time to revisit a few other interesting JEPA variants. JEPA, or Joint Embedding Predictive Architecture, a self-supervised learning framework that predicts the latent representation of a missing part of the input.

Here are 11 JEPA types that you should know about:

1. V-JEPA 2 -> V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning (2506.09985)
Trained on 1M+ hours of internet videos and a little bit of robot interaction data, V-JEPA 2 can watch, understand, answer questions, and help robots plan and act in physical world

2. Time-Series-JEPA (TS-JEPA) -> Time-Series JEPA for Predictive Remote Control under Capacity-Limited Networks (2406.04853)
It's a time-series predictive model that learns compact, meaningful representations. A self-supervised semantic actor then uses them to generate control commands without raw data

3. Denoising JEPA (D-JEPA) -> Denoising with a Joint-Embedding Predictive Architecture (2410.03755)
Combines JEPA with diffusion techniques. By treating JEPA as masked image modeling and next-token prediction, D-JEPA generates data auto-regressively, incorporating diffusion and flow-matching losses

4. CNN-JEPA -> CNN-JEPA: Self-Supervised Pretraining Convolutional Neural Networks Using Joint Embedding Predictive Architecture (2408.07514)
This SSL approach applies JEPA idea to CNNs using a sparse encoder, depthwise separable convolutions, and improved masking. On ImageNet-100, CNN-JEPA outperforms I-JEPA with 73.3% accuracy

5. Stem-JEPA -> Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem Compatibility Estimation (2408.02514)
Identifies instrument stems by mapping mixes and stems into a shared space using an encoder and predictor. It captures timbre, harmony, and rhythm for tasks like stem retrieval, alignment, and genre or key estimation

6. DMT-JEPA (Discriminative Masked Targets JEPA) -> DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture (2405.17995)
Improves discriminative power by generating masked targets from semantically similar neighboring patches and uses lightweight cross-attention for aggregation

Read further below👇

Also, subscribe to the Turing Post -> https://www.turingpost.com/subscribe

1 reply

replied to their post about 2 months ago

seq-JEPA -> https://huggingface.co/papers/2505.03176
A world modeling framework that learns invariant and equivariant representations from view sequences and transformations, using a transformer to predict future states. Excels in sequence-based tasks
AD-L-JEPA -> https://huggingface.co/papers/2501.04969
Learns spatial world models via Bird’s Eye View (BEV) embeddings without explicit generation or manual pair creation, simplifying training and boosting representation quality. Excels in LiDAR 3D object detection and transfer learning
SAR-JEPA -> https://huggingface.co/papers/2311.15153
Predicts multi-scale Synthetic Aperture Radar (SAR) gradient features from locally masked patches. SAR-JEPA handles small targets and speckle noise and integrates domain-specific features to improve SSL signals
HEP-JEPA -> https://huggingface.co/papers/2502.03933
A transformer-based foundation model for high-energy collider tasks. Using the JetClass dataset of 100M jets, it predicts embeddings of unseen jet constituents from partial context
ECG-JEPA -> https://huggingface.co/papers/2410.13867
JEPA for self-supervised ECG representation learning designed to excel at ECG-based heart arrhythmia diagnosis

Check out more types of JEPA here -> https://huggingface.co/posts/Kseniase/646284586461230

posted an update about 2 months ago

Post

3531

1 reply

upvoted a paper about 2 months ago

A Tale of Tails: Model Collapse as a Change of Scaling Laws

Paper • 2402.07043 • Published Feb 10, 2024 • 16

replied to their post about 2 months ago

LRM - Large Reasoning Model (DeepSeek-R1, OpenAI's o3) -> https://huggingface.co/papers/2501.09686
Advanced AI systems specifically optimized for multi-step logical reasoning, complex problem-solving, and structured thinking. LRMs incorporate test-time scaling, Chain-of-Thought reasoning, tool use, external memory, strong math and code capabilities, and more modular design for reliable decision-making
MoE - Mixture of Experts (e.g. Mixtral) -> https://www.turingpost.com/p/moe
Uses many sub-networks called experts, but activates only a few per input, enabling massive scaling with sparse computation
SSM - State Space Model (Mamba, RetNet) -> https://huggingface.co/papers/2111.00396

our overview of SSMs and Mamba: https://www.turingpost.com/p/mamba
A neural network that defines the sequence as a continuous dynamical system, modeling how hidden state vectors change in response to inputs over time. SSMs are parallelizable and efficient for long contexts

RNN - Recurrent Neural Network (advanced variants: LSTM, GRU) -> https://huggingface.co/papers/1912.05911

detailed article about LSTM: https://www.turingpost.com/p/xlstm
Processes sequences one step at a time, passing information through a hidden state that acts as memory. RNNs were widely used in early NLP and time-series tasks but struggle with long-range dependencies compared to newer architectures

CNN - Convolutional Neural Network (MobileNet, EfficientNet) -> https://huggingface.co/papers/1511.08458
Automatically learns patterns from visual data. It uses convolutional layers to detect features like edges, textures, or shapes. Not so popular now, but still used in edge applications and visual processing
SAM - Segment Anything Model (developed by Meta AI) -> https://huggingface.co/papers/2304.02643
A foundation model trained on over 1 billion segmentation masks. Given a prompt (like a point or box), it segments the relevant object
LNN – Liquid Neural Network (LFMs - Liquid Foundation Models by Liquid AI) -> https://arxiv.org/pdf/2006.04439

more about LFMs https://www.turingpost.com/p/liquidhyena
LNNs use differential equations to model neuronal dynamics to adapt their behavior in real-time. They continuously update their internal state, which is great for time-series data, robotics, and real-world decision making

posted an update about 2 months ago

Post

6242

12 Foundational AI Model Types

Let’s refresh some fundamentals today to stay fluent in the what we all work with. Here are some of the most popular model types that shape the vast world of AI (with examples in the brackets):

1. LLM - Large Language Model (GPT, LLaMA) -> Large Language Models: A Survey (2402.06196)
+ history of LLMs: https://www.turingpost.com/t/The%20History%20of%20LLMs
It's trained on massive text datasets to understand and generate human language. They are mostly build on Transformer architecture, predicting the next token. LLMs scale by increasing overall parameter count across all components (layers, attention heads, MLPs, etc.)

2. SLM - Small Language Model (TinyLLaMA, Phi models, SmolLM) A Survey of Small Language Models (2410.20011)
Lightweight LM optimized for efficiency, low memory use, fast inference, and edge use. SLMs work using the same principles as LLMs

3. VLM - Vision-Language Model (CLIP, Flamingo) -> An Introduction to Vision-Language Modeling (2405.17247)
Processes and understands both images and text. VLMs map images and text into a shared embedding space or generate captions/descriptions from both

4. MLLM - Multimodal Large Language Model (Gemini) -> A Survey on Multimodal Large Language Models (2306.13549)
A large-scale model that can understand and process multiple types of data (modalities) — usually text + other formats, like images, videos, audio, structured data, 3D or spatial inputs. MLLMs can be LLMs extended with modality adapters or trained jointly across vision, text, audio, etc.

5. LAM - Large Action Model (InstructDiffusion, RT-2) -> Large Action Models: From Inception to Implementation (2412.10047)
Understands and generates action sequences by predicting action tokens (discrete/continuous instructions) that guide agents. Trained on behavior datasets, LAMs generalize across tasks, environments, and modalities - video, sensor data, etc.

Read about LRM, MoE, SSM, RNN, CNN, SAM and LNN below👇

Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe

2 replies

Ksenia Se

AI & ML interests

Recent Activity

Organizations

Kseniase's activity

Accidentally Building an AI Reasoning Research Ecosystem (Or: Can AI Stop Thinking?)

What Coding Agent Wins?

What Coding Agent Wins?