Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
2
41
simpx
simpx
Follow
0 followers
·
11 following
AI & ML interests
None yet
Recent Activity
liked
a model
about 1 month ago
Qwen/Qwen3-0.6B
reacted
to
Kseniase
's
post
with ❤️
about 1 month ago
11 Alignment and Optimization Algorithms for LLMs When we need to align models' behavior with the desired objectives, we rely on specialized algorithms that support helpfulness, accuracy, reasoning, safety, and alignment with user preferences. Much of a model’s usefulness comes from post-training optimization methods. Here are the main optimization algorithms (both classic and new) in one place: 1. PPO (Proximal Policy Optimization) -> https://huggingface.co/papers/1707.06347 Clips the probability ratio to prevent the new policy from diverging too far from the old one. It helps keep everything stable 2. DPO (Direct Preference Optimization) -> https://huggingface.co/papers/2305.18290 It's a non RL method, where an LM is an implicit reward model. It uses a simple loss to boost the preferred answer’s probability over the less preferred one 3. GRPO (Group Relative Policy Optimization) -> https://huggingface.co/papers/2402.03300 An RL method that compares a group of model outputs for the same input and updates the policy based on relative rankings. It doesn't need a separate critic model It's latest application is Flow-GRPO which adds online RL into flow matching models -> https://huggingface.co/papers/2505.05470 4. DAPO (Decoupled Clip and Dynamic sAmpling Policy Optimization) -> https://huggingface.co/papers/2503.14476 Decouples the clipping bounds for flexibility, introducing 4 key techniques: clip-higher (to maintain exploration), dynamic sampling (to ensure gradient updates), token-level loss (to balance learning across long outputs), and overlong reward shaping (to handle long, truncated answers) 5. Supervised Fine-Tuning (SFT) -> https://huggingface.co/papers/2203.02155 Often the first post-pretraining step. A model is fine-tuned on a dataset of high-quality human-written input-output pairs to directly teach desired behaviors More in the comments 👇 If you liked it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe
liked
a model
about 1 month ago
ACE-Step/ACE-Step-v1-3.5B
View all activity
Organizations
None yet
simpx
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
2 models
about 1 month ago
Qwen/Qwen3-0.6B
Text Generation
•
Updated
29 days ago
•
1,000k
•
•
370
ACE-Step/ACE-Step-v1-3.5B
Text-to-Audio
•
Updated
28 days ago
•
516
liked
2 datasets
about 1 month ago
bigcode/starcoderdata
Viewer
•
Updated
May 16, 2023
•
207M
•
2.21k
•
442
cerebras/SlimPajama-627B
Preview
•
Updated
Jul 7, 2023
•
51.7k
•
476
liked
2 models
about 1 month ago
TinyLlama/TinyLlama-1.1B-Chat-v1.0
Text Generation
•
Updated
Mar 17, 2024
•
1.33M
•
1.3k
Qwen/Qwen3-235B-A22B
Text Generation
•
Updated
29 days ago
•
138k
•
•
946
liked
9 datasets
about 1 month ago
transformersbook/codeparrot
Viewer
•
Updated
Feb 5, 2022
•
18.7M
•
396
•
58
Pain-Killer/nier2b-dataset
Viewer
•
Updated
Mar 7
•
10
•
22
•
1
openai/gsm8k
Viewer
•
Updated
Jan 4, 2024
•
17.6k
•
503k
•
770
wikimedia/wikipedia
Viewer
•
Updated
Jan 9, 2024
•
61.6M
•
81.5k
•
847
databricks/databricks-dolly-15k
Viewer
•
Updated
Jun 30, 2023
•
15k
•
13.4k
•
825
tiiuae/falcon-refinedweb
Viewer
•
Updated
Jun 20, 2023
•
968M
•
13k
•
857
OpenAssistant/oasst1
Viewer
•
Updated
May 2, 2023
•
88.8k
•
6.65k
•
1.4k
Open-Orca/OpenOrca
Viewer
•
Updated
Feb 19
•
2.94M
•
9.55k
•
1.41k
HuggingFaceFW/fineweb
Viewer
•
Updated
Jan 31
•
25B
•
278k
•
2.2k
liked
a model
about 1 month ago
open-r1/OpenR1-Qwen-7B
Text Generation
•
Updated
22 days ago
•
2.63k
•
52
liked
4 datasets
about 1 month ago
open-r1/codeforces-cots
Viewer
•
Updated
Mar 28
•
254k
•
3.71k
•
176
open-thoughts/OpenThoughts-114k
Viewer
•
Updated
14 days ago
•
228k
•
39.8k
•
714
Anthropic/hh-rlhf
Viewer
•
Updated
May 26, 2023
•
169k
•
10.8k
•
1.36k
fka/awesome-chatgpt-prompts
Viewer
•
Updated
Jan 6
•
203
•
22.8k
•
7.97k
Load more