Elie Bakouch

eliebak

PrimeIntellect

·

AI & ML interests

Training LLM's @ 🤗

Recent Activity

liked a dataset 4 days ago

HuggingFaceCode/stack-v3-train

liked a model 7 days ago

Motif-Technologies/Motif-3-Beta

updated a model 10 days ago

PrimeIntellect/Laguna-XS.2-jlens

View all activity

Organizations

commented a paper 6 months ago

Kimi K2.5: Visual Agentic Intelligence

Paper • 2602.02276 • Published Feb 2 • 278 •

New activity in skt/A.X-K1 7 months ago

looking forward for the release!!

#1 opened 7 months ago by

commented a paper 9 months ago

Motif 2 12.7B technical report

Paper • 2511.07464 • Published Nov 7, 2025 • 41 •

New activity in marin-community/marin-32b-base 9 months ago

fix link to retrospective

#1 opened 9 months ago by

commented a paper 9 months ago

DeepSeek-OCR: Contexts Optical Compression

Paper • 2510.18234 • Published Oct 21, 2025 • 95 •

commented a paper 10 months ago

Paris: A Decentralized Trained Open-Weight Diffusion Model

Paper • 2510.03434 • Published Oct 3, 2025 • 4 •

New activity in community-spotlight/README 11 months ago

Nominate a community champion

#4 opened 11 months ago by

New activity in Kwai-Klear/Klear-46B-A2.5B-Base 11 months ago

tech report link broken

#1 opened 11 months ago by

commented a paper 11 months ago

Fantastic Pretraining Optimizers and Where to Find Them

Paper • 2509.02046 • Published Sep 2, 2025 • 14 •

New activity in xai-org/grok-2 11 months ago

add rope_type:yarn

#7 opened 11 months ago by

New activity in huggingface/InferenceSupport 11 months ago

deepseek-ai/DeepSeek-V3.1

#4282 opened 11 months ago by

ByteDance-Seed/Seed-OSS-36B-Instruct

#4275 opened 11 months ago by

commented a paper 11 months ago

BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining

Paper • 2508.10975 • Published Aug 14, 2025 • 60 •

commented a paper 12 months ago

$μ$-Parametrization for Mixture of Experts

Paper • 2508.09752 • Published Aug 13, 2025 • 10 •

New activity in HuggingFaceTB/SmolLM3-3B 12 months ago

SmolLM3 RL results

#33 opened 12 months ago by

commented a paper 12 months ago

Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding

Paper • 2507.19427 • Published Jul 25, 2025 • 22 •

commented a paper about 1 year ago

Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24, 2025 • 320 •

New activity in HuggingFaceTB/SmolLM3-3B-Base about 1 year ago

Release Intermediate Checkpoints?

#2 opened about 1 year ago by

xuanxiang-chatting

New activity in HuggingFaceTB/SmolLM3-3B about 1 year ago

Multi-head latent attention (MLA) instead of Grouped query attention (GQA)

#18 opened about 1 year ago by

Add This Model To the French Understanding Leaderboard By the French Government

#28 opened about 1 year ago by