Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
BounharAbdelaziz
's Collections
RLHF
Moroccan Darija LLMs
Moroccan Darija Embeddings Models & Datasets
Moroccan Speech Models & Datasets
Moroccan Darija Datasets
Translation Models & Datasets
Arabic (MSA) Language Models & Datasets
Arabic (MSA) Summarization Models & Datasets
RLHF
updated
24 days ago
Some RLHF experiments using GRPO and DPO.
Upvote
-
BounharAbdelaziz/Qwen2.5-3B-GRPO-Math-GSM8K
Text Generation
•
3B
•
Updated
24 days ago
•
9
BounharAbdelaziz/Qwen2.5-0.5B-DPO-English-Orca
Text Generation
•
0.5B
•
Updated
24 days ago
•
11
BounharAbdelaziz/Qwen2.5-0.5B-DPO-French-Orca
Text Generation
•
0.5B
•
Updated
24 days ago
•
12
Upvote
-
Share collection
View history
Collection guide
Browse collections