Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
4669.5
TFLOPS
3790
176
1308
Maziyar Panahi
PRO
MaziyarPanahi
Follow
Alfaxad's profile picture
harisec's profile picture
Ryu-m0m's profile picture
3785 followers
·
67 following
MaziyarPanahi
MaziyarPanahi
MaziyarPanahi
maziyarpanahi.bsky.social
AI & ML interests
RLHF, RL, Model Merging, Quantizations, Synthetic datasets, Health x AI
Recent Activity
reacted
to
Kseniase
's
post
with 🔥
about 19 hours ago
10 Latest Preference Optimization Techniques Models need feedback on what makes outputs “good” or “bad.” Policy optimization (PO) turns preferences and rewards into actual training signals. This field is evolving quickly, moving far beyond classics like PPO and GRPO. So here is our overview of 10 newest PO methods: 1. Pref-GRPO → https://huggingface.co/papers/2508.20751 Stabilizes text-to-image reinforcement learning (RL) with pairwise preference rewards and a unified UNIGENBENCH benchmark 2. PVPO (Policy with Value Preference Optimization) → https://huggingface.co/papers/2508.21104 This critic-free RL method uses a pre-trained model as a reference anchor to reduce bias and guide learning, selecting high-value examples through data pre-sampling 3. DCPO (Dynamic Clipping Policy Optimization) → https://huggingface.co/papers/2509.02333 Uses dynamic clipping, which adjusts probability limits per token for better token exploration, and smooth reward standardization to balance rewards over training steps and prevent wasted updates 4. ARPO (Agentic Reinforced Policy Optimization) → https://huggingface.co/papers/2507.19849 Optimizes multi-turn LLM agents that use external tools. It uses an entropy-based adaptive rollout to explore post-tool use and an advantage attribution method to better assign credit across steps, leading to more efficient tool use with fewer resources 5. GRPO-RoC (Group Relative Policy Optimization with Resampling-on-Correct) → https://huggingface.co/papers/2508.20722 Oversamples rollouts, then resamples them to keep diverse mistakes and only the highest-quality correct answers. It reduces noises and ends up with stronger reasoning in a code environment Read further below ⬇️ If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe
liked
a Space
1 day ago
huggingface/ai-deadlines
View all activity
Organizations
MaziyarPanahi
's datasets
49
Sort: Recently updated
MaziyarPanahi/magpie-ultra-v0.1-sharegpt
Viewer
•
Updated
Aug 7, 2024
•
50k
•
16
•
5
MaziyarPanahi/persian-qa-translated-sharegpt
Viewer
•
Updated
Jul 4, 2024
•
153k
•
5
•
1
MaziyarPanahi/persian-conversational-sharegpt
Viewer
•
Updated
Jul 4, 2024
•
266k
•
4
•
2
MaziyarPanahi/legalkit_with_input_sharegpt
Viewer
•
Updated
Jul 3, 2024
•
53k
•
4
•
1
MaziyarPanahi/legalkit_sharegpt
Viewer
•
Updated
Jul 2, 2024
•
53k
•
4
•
2
MaziyarPanahi/aya_collection_persian_instruct
Viewer
•
Updated
Jul 1, 2024
•
164k
•
42
•
4
MaziyarPanahi/ultrafeedback_binarized_sft_sharegpt
Viewer
•
Updated
Jun 29, 2024
•
61.1k
•
7
•
1
MaziyarPanahi/arxflix-dataset-dup-12290-alpaca
Viewer
•
Updated
Jun 14, 2024
•
12.3k
•
4
•
1
MaziyarPanahi/arxflix-dataset-01062024-1229-alpaca
Viewer
•
Updated
Jun 7, 2024
•
1.23k
•
1
•
2
MaziyarPanahi/arxflix-dataset-01062024-1229
Viewer
•
Updated
Jun 1, 2024
•
1.23k
•
1
•
2
MaziyarPanahi/arxiv_mixtral_markdown-v3-2678
Viewer
•
Updated
May 31, 2024
•
2.68k
•
23
•
1
MaziyarPanahi/arxiv_mixtral_markdown-1000
Viewer
•
Updated
May 30, 2024
•
1.01k
•
13
MaziyarPanahi/arxiv_mixtral_markdown-v2-1255
Viewer
•
Updated
May 29, 2024
•
1.26k
•
1
MaziyarPanahi/arxiv_mixtral_markdown-552
Viewer
•
Updated
May 29, 2024
•
552
•
4
MaziyarPanahi/arxiv_mixtral_markdown-v2-1100
Viewer
•
Updated
May 29, 2024
•
1.11k
•
8
MaziyarPanahi/arxiv_mixed_markdown-v0-1188
Viewer
•
Updated
May 26, 2024
•
1.19k
•
14
MaziyarPanahi/arxiv_mixtral_markdown-v3-2821
Viewer
•
Updated
May 26, 2024
•
2.82k
•
2
MaziyarPanahi/truthy-dpo-v0.1-axolotl
Viewer
•
Updated
May 2, 2024
•
1.02k
•
17
•
8
MaziyarPanahi/WizardLM_evol_instruct_V2_196k
Viewer
•
Updated
Apr 23, 2024
•
286k
•
40
•
53
Previous
1
2
Next