arshadshk (Arshad S)

upvoted an article 4 months ago

Article

Jupyter Agents: training LLMs to reason with notebooks

+1

Sep 10, 2025

•

59

upvoted 2 papers 4 months ago

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2, 2025 • 228

Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling

Paper • 2508.16745 • Published Aug 22, 2025 • 29

upvoted 3 papers 5 months ago

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published Aug 6, 2025 • 129

Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5, 2025 • 122

ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent

Paper • 2312.10003 • Published Dec 15, 2023 • 44

upvoted a collection 5 months ago

🤖 Agents

Collection

21 items • Updated Dec 31, 2024 • 172

upvoted 2 articles 7 months ago

Article

Halo: Open Source Health Tracking with Wearables

Nov 19, 2024

•

117

Article

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

+4

Jun 3, 2025

•

96

upvoted an article 9 months ago

Article

LeRobot goes to driving school: World’s largest open-source self-driving dataset

Mar 11, 2025

•

103

upvoted a collection over 1 year ago

Meta Llama 3

Collection

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Dec 6, 2024 • 877

upvoted 2 articles over 1 year ago

Article

Mergoo: Efficiently Build Your Own MoE LLM

Jun 3, 2024

•

48

Article

Orchestration of Experts: The First-Principle Multi-Model System

May 30, 2024

•

15

upvoted 6 papers almost 2 years ago

RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15, 2024 • 72

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Paper • 2403.07816 • Published Mar 12, 2024 • 44

Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU

Paper • 2403.06504 • Published Mar 11, 2024 • 56

Arshad S

AI & ML interests

Organizations

Jupyter Agents: training LLMs to reason with notebooks

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Agent Lightning: Train ANY AI Agents with Reinforcement Learning

ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent

🤖 Agents

Halo: Open Source Health Tracking with Wearables

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

LeRobot goes to driving school: World’s largest open-source self-driving dataset

Meta Llama 3

Mergoo: Efficiently Build Your Own MoE LLM

Orchestration of Experts: The First-Principle Multi-Model System

RAFT: Adapting Language Model to Domain Specific RAG

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU

SaulLM-7B: A pioneering Large Language Model for Law

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Self-Discover: Large Language Models Self-Compose Reasoning Structures

Arshad S

AI & ML interests

Organizations

arshadshk's activity

Jupyter Agents: training LLMs to reason with notebooks

Halo: Open Source Health Tracking with Wearables

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

LeRobot goes to driving school: World’s largest open-source self-driving dataset

Mergoo: Efficiently Build Your Own MoE LLM

Orchestration of Experts: The First-Principle Multi-Model System