Elie Bakouch's picture

Elie Bakouch PRO

eliebak

·

AI & ML interests

Training LLM's @ 🤗

Recent Activity

upvoted a collection 1 day ago

Reproducing-TRM

upvoted an article 1 day ago

Building the Open Agent Ecosystem Together: Introducing OpenEnv

commented on a paper 3 days ago

DeepSeek-OCR: Contexts Optical Compression

View all activity

Organizations

upvoted a collection 1 day ago

Reproducing-TRM

3 items • Updated 2 days ago • 4

upvoted an article 1 day ago

Article

Building the Open Agent Ecosystem Together: Introducing OpenEnv

2 days ago

• 71

upvoted a paper 16 days ago

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published 19 days ago • 439

upvoted a paper about 1 month ago

M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14 • 15

upvoted a collection about 1 month ago

Tiny Language Model Datasets

Collection of Synthetic Datasets that can be used in pretraining of any the Tiny Language Model • 14 items • Updated Sep 21 • 29

upvoted 4 papers about 2 months ago

Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

Paper • 2508.18672 • Published Aug 26 • 10

Fantastic Pretraining Optimizers and Where to Find Them

Paper • 2509.02046 • Published Sep 2 • 12

AWorld: Orchestrating the Training Recipe for Agentic AI

Paper • 2508.20404 • Published Aug 28 • 38

Motif 2.6B Technical Report

Paper • 2508.09148 • Published Aug 2 • 4

upvoted a paper 2 months ago

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 91

upvoted 2 articles 2 months ago

Article

Say hello to `hf`: a faster, friendlier Hugging Face CLI ✨

Jul 25

• 83

Article

NVIDIA Releases 6 Million Multi-Lingual Reasoning Dataset

By

and 4 others •

Aug 20

• 18

upvoted 2 collections 2 months ago

Seed-OSS

Seed-OSS Open-Source Models • 3 items • Updated Aug 20 • 58

DeepSeek-V3.1

4 items • Updated Sep 22 • 240

upvoted an article 2 months ago

Article

MCP for Research: How to Connect AI to Research Tools

Aug 18

• 61

upvoted 3 papers 2 months ago

BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining

Paper • 2508.10975 • Published Aug 14 • 59

EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes

Paper • 2507.11407 • Published Jul 15 • 57

μ-Parametrization for Mixture of Experts

Paper • 2508.09752 • Published Aug 13 • 10

upvoted 2 articles 2 months ago

Article

How to train a Language Model with Megatron-LM

Sep 7, 2022

• 19

Article

NVIDIA Releases 3 Million Sample Dataset for OCR, Visual Question Answering, and Captioning Tasks

By

and 4 others •

Aug 11

• 74