Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
6
240
138
rotem israeli
irotem98
Follow
ltim's profile picture
BK-Lee's profile picture
Mi6paulino's profile picture
7 followers
·
10 following
https://rotem154154.github.io
rotem154154
AI & ML interests
None yet
Recent Activity
liked
a model
about 2 hours ago
trillionlabs/Tri-7B
reacted
to
Kseniase
's
post
with 👍
about 5 hours ago
9 new policy optimization techniques Reinforcement Learning (RL) won't stuck in the same old PPO loop - in the last two months alone, researchers have introduced a new wave of techniques, reshaping how we train and fine-tune LLMs, VLMs, and agents. Here are 9 fresh policy optimization techniques worth knowing: 1. GSPO: Group Sequence Policy Optimization → https://huggingface.co/papers/2507.18071 Shifts from token-level to sequence-level optimization, clipping, and rewarding to capture the full picture and increase stability compared to GRPO. GSPO-token variation also allows token-level fine-tuning. 2. LAPO: Length-Adaptive Policy Optimization → https://huggingface.co/papers/2507.15758 A two-stage RL framework that trains models to adaptively control reasoning length by learning typical solution lengths for shorter and more efficient reasoning. 3. HBPO: Hierarchical Budget Policy Optimization → https://huggingface.co/papers/2507.15844 This one trains model to adapt reasoning depth based on problem complexity. It divides training samples into subgroups with different token budgets, using budget-aware rewards to align reasoning effort with task difficulty. 4. SOPHIA: Semi-off-policy reinforcement learning → https://huggingface.co/papers/2507.16814 Combines on-policy visual understanding from the Vision Language Models (VLMs) with off-policy reasoning from an LM, assigning outcome-based rewards and propagating visual rewards backward through the reasoning steps. 5. RePO: Replay-Enhanced Policy Optimization → https://huggingface.co/papers/2506.09340 Introduces a replay buffer into on-policy RL for LLMs, retrieving diverse off-policy samples for each prompt to broaden the training data per prompt Read further below ⬇️ If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe
upvoted
a
paper
about 10 hours ago
Group Sequence Policy Optimization
View all activity
Organizations
None yet
irotem98
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
a model
about 2 hours ago
trillionlabs/Tri-7B
Text Generation
•
8B
•
Updated
3 days ago
•
20.7k
•
12
liked
6 models
1 day ago
Qwen/Qwen3-4B
Text Generation
•
4B
•
Updated
2 days ago
•
1.05M
•
•
324
Qwen/Qwen3-8B-FP8
Text Generation
•
8B
•
Updated
2 days ago
•
22.3k
•
37
trillionlabs/Tri-21B
Text Generation
•
21B
•
Updated
3 days ago
•
7.89k
•
22
mistralai/Magistral-Small-2507-GGUF
Text Generation
•
24B
•
Updated
2 days ago
•
757
•
5
mistralai/Devstral-Small-2507
24B
•
Updated
3 days ago
•
31.8k
•
292
mistralai/Magistral-Small-2507
24B
•
Updated
2 days ago
•
654
•
63
liked
a model
2 days ago
Qwen/Qwen3-30B-A3B-GPTQ-Int4
Text Generation
•
5B
•
Updated
May 21
•
27.6k
•
19
liked
2 models
4 days ago
Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8
Text Generation
•
480B
•
Updated
4 days ago
•
12.7k
•
•
69
Qwen/Qwen3-Coder-480B-A35B-Instruct
Text Generation
•
480B
•
Updated
4 days ago
•
7.73k
•
•
775
liked
a model
5 days ago
LGAI-EXAONE/EXAONE-Deep-2.4B
Text Generation
•
2B
•
Updated
Mar 22
•
16.1k
•
97
liked
4 models
6 days ago
timm/PE-Core-T-16-384
Zero-Shot Image Classification
•
Updated
3 days ago
•
34
•
1
Tesslate/UIGEN-T3-4B-Preview-MAX
Text Generation
•
4B
•
Updated
Jun 10
•
125
•
7
Tesslate/UIGEN-X-8B
Text Generation
•
8B
•
Updated
10 days ago
•
262
•
•
53
Tesslate/Tessa-T1-14B
Text Generation
•
15B
•
Updated
Mar 24
•
22
•
•
13
liked
a model
7 days ago
VortexHunter23/LeoPARD-Coder-0.8.5-pt1.1
15B
•
Updated
about 16 hours ago
•
28
•
1
liked
3 models
8 days ago
LGAI-EXAONE/EXAONE-4.0-1.2B
Text Generation
•
1B
•
Updated
about 14 hours ago
•
11.7k
•
79
merve/smol-vision
Image-Text-to-Text
•
Updated
5 days ago
•
91
ByteDance-Seed/Seed-X-Instruct-7B
Translation
•
Updated
4 days ago
•
8.31k
•
108
liked
a model
10 days ago
jinaai/jina-embeddings-v2-base-code
Feature Extraction
•
0.2B
•
Updated
Jan 6
•
78k
•
114
Load more