rotem israeli's picture

rotem israeli

irotem98

·

https://rotem154154.github.io

rotem154154

AI & ML interests

None yet

Recent Activity

liked a model about 2 hours ago

trillionlabs/Tri-7B

reacted to Kseniase's post with 👍 about 5 hours ago

9 new policy optimization techniques Reinforcement Learning (RL) won't stuck in the same old PPO loop - in the last two months alone, researchers have introduced a new wave of techniques, reshaping how we train and fine-tune LLMs, VLMs, and agents. Here are 9 fresh policy optimization techniques worth knowing: 1. GSPO: Group Sequence Policy Optimization → https://huggingface.co/papers/2507.18071 Shifts from token-level to sequence-level optimization, clipping, and rewarding to capture the full picture and increase stability compared to GRPO. GSPO-token variation also allows token-level fine-tuning. 2. LAPO: Length-Adaptive Policy Optimization → https://huggingface.co/papers/2507.15758 A two-stage RL framework that trains models to adaptively control reasoning length by learning typical solution lengths for shorter and more efficient reasoning. 3. HBPO: Hierarchical Budget Policy Optimization → https://huggingface.co/papers/2507.15844 This one trains model to adapt reasoning depth based on problem complexity. It divides training samples into subgroups with different token budgets, using budget-aware rewards to align reasoning effort with task difficulty. 4. SOPHIA: Semi-off-policy reinforcement learning → https://huggingface.co/papers/2507.16814 Combines on-policy visual understanding from the Vision Language Models (VLMs) with off-policy reasoning from an LM, assigning outcome-based rewards and propagating visual rewards backward through the reasoning steps. 5. RePO: Replay-Enhanced Policy Optimization → https://huggingface.co/papers/2506.09340 Introduces a replay buffer into on-policy RL for LLMs, retrieving diverse off-policy samples for each prompt to broaden the training data per prompt Read further below ⬇️ If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

upvoted a paper about 10 hours ago

Group Sequence Policy Optimization

View all activity

Organizations

None yet

liked a model about 2 hours ago

trillionlabs/Tri-7B

Text Generation • 8B • Updated 3 days ago • 20.7k • 12

liked 6 models 1 day ago

Qwen/Qwen3-4B

Text Generation • 4B • Updated 2 days ago • 1.05M • • 324

Qwen/Qwen3-8B-FP8

Text Generation • 8B • Updated 2 days ago • 22.3k • 37

trillionlabs/Tri-21B

Text Generation • 21B • Updated 3 days ago • 7.89k • 22

mistralai/Magistral-Small-2507-GGUF

Text Generation • 24B • Updated 2 days ago • 757 • 5

mistralai/Devstral-Small-2507

24B • Updated 3 days ago • 31.8k • 292

mistralai/Magistral-Small-2507

24B • Updated 2 days ago • 654 • 63

liked a model 2 days ago

Qwen/Qwen3-30B-A3B-GPTQ-Int4

Text Generation • 5B • Updated May 21 • 27.6k • 19

liked 2 models 4 days ago

Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8

Text Generation • 480B • Updated 4 days ago • 12.7k • • 69

Qwen/Qwen3-Coder-480B-A35B-Instruct

Text Generation • 480B • Updated 4 days ago • 7.73k • • 775

liked a model 5 days ago

LGAI-EXAONE/EXAONE-Deep-2.4B

Text Generation • 2B • Updated Mar 22 • 16.1k • 97

liked 4 models 6 days ago

timm/PE-Core-T-16-384

Zero-Shot Image Classification • Updated 3 days ago • 34 • 1

Tesslate/UIGEN-T3-4B-Preview-MAX

Text Generation • 4B • Updated Jun 10 • 125 • 7

Tesslate/UIGEN-X-8B

Text Generation • 8B • Updated 10 days ago • 262 • • 53

Tesslate/Tessa-T1-14B

Text Generation • 15B • Updated Mar 24 • 22 • • 13

liked a model 7 days ago

VortexHunter23/LeoPARD-Coder-0.8.5-pt1.1

15B • Updated about 16 hours ago • 28 • 1

liked 3 models 8 days ago

LGAI-EXAONE/EXAONE-4.0-1.2B

Text Generation • 1B • Updated about 14 hours ago • 11.7k • 79

merve/smol-vision

Image-Text-to-Text • Updated 5 days ago • 91

ByteDance-Seed/Seed-X-Instruct-7B

Translation • Updated 4 days ago • 8.31k • 108

liked a model 10 days ago

jinaai/jina-embeddings-v2-base-code

Feature Extraction • 0.2B • Updated Jan 6 • 78k • 114