Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Abdel-Dayane Marcos's picture

Abdel-Dayane Marcos

admarcosai

easamd's profile picture

sulabh-research's profile picture

victor's profile picture

·

AI & ML interests

Natural Language Processing, Graph Neural Networks, Reinforcement Learning

Organizations

None yet

admarcosai 's collections 59

Function Calling Datasets

driaforall/pythonic-function-calling

Viewer • Updated Feb 6, 2025 • 81.8k • 91 • 31
AymanTarig/function-calling-v0.2-with-r1-cot

Viewer • Updated Feb 3, 2025 • 58k • 68 • 44

Video Creation by Demonstration

Paper • 2412.09551 • Published Dec 12, 2024 • 9
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published Dec 10, 2024 • 48
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Paper • 2412.06531 • Published Dec 9, 2024 • 72
APOLLO: SGD-like Memory, AdamW-level Performance

Paper • 2412.05270 • Published Dec 6, 2024 • 37

OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain

Paper • 2412.13018 • Published Dec 17, 2024 • 41

Position Papers

How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 53
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

Paper • 2412.14161 • Published Dec 18, 2024 • 51
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published Dec 18, 2024 • 25
The Open Source Advantage in Large Language Models (LLMs)

Paper • 2412.12004 • Published Dec 16, 2024 • 10

Reasoning | Planning

Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation

Paper • 2310.18628 • Published Oct 28, 2023 • 8
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise

Paper • 2310.19019 • Published Oct 29, 2023 • 9
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

Paper • 2311.02262 • Published Nov 3, 2023 • 14
Thread of Thought Unraveling Chaotic Contexts

Paper • 2311.08734 • Published Nov 15, 2023 • 7

Data Efficiency

Ziya2: Data-centric Learning is All LLMs Need

Paper • 2311.03301 • Published Nov 6, 2023 • 20
Memory Augmented Language Models through Mixture of Word Experts

Paper • 2311.10768 • Published Nov 15, 2023 • 19
TinyGSM: achieving >80% on GSM8k with small language models

Paper • 2312.09241 • Published Dec 14, 2023 • 39
Time is Encoded in the Weights of Finetuned Language Models

Paper • 2312.13401 • Published Dec 20, 2023 • 20

Efficient Inference

Prompt Cache: Modular Attention Reuse for Low-Latency Inference

Paper • 2311.04934 • Published Nov 7, 2023 • 32
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models

Paper • 2311.08692 • Published Nov 15, 2023 • 13
Exponentially Faster Language Modelling

Paper • 2311.10770 • Published Nov 15, 2023 • 119
Memory Augmented Language Models through Mixture of Word Experts

Paper • 2311.10768 • Published Nov 15, 2023 • 19

GRIM: GRaph-based Interactive narrative visualization for gaMes

Paper • 2311.09213 • Published Nov 15, 2023 • 13
LARP: Language-Agent Role Play for Open-World Games

Paper • 2312.17653 • Published Dec 24, 2023 • 33
Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23, 2024 • 72

Libraries and Framworks

Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning

Paper • 2311.11077 • Published Nov 18, 2023 • 29
Multi-line AI-assisted Code Authoring

Paper • 2402.04141 • Published Feb 6, 2024 • 10
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models

Paper • 2402.10524 • Published Feb 16, 2024 • 23
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

Paper • 2402.10379 • Published Feb 16, 2024 • 31

Preference Dataset

lvwerra/stack-exchange-paired

Viewer • Updated Mar 13, 2023 • 31.3M • 1.82k • 149
Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses

Paper • 2312.00763 • Published Dec 1, 2023 • 23
TOFU: A Task of Fictitious Unlearning for LLMs

Paper • 2401.06121 • Published Jan 11, 2024 • 20

FudanSELab/ClassEval

Viewer • Updated Aug 26, 2024 • 100 • 3.9k • 12
code-search-net/code_search_net

Viewer • Updated Feb 23 • 4.14M • 19.6k • 326
open-phi/programming_books_llama

Viewer • Updated Oct 4, 2023 • 111k • 143 • 36
FudanSELab/CodeGen4Libs

Updated Oct 5, 2023 • 66 • 5

Function Calling Dataset

gorilla-llm/gorilla-openfunctions-v1

Text Generation • Updated Nov 21, 2023 • 18 • 92
rizerphe/glaive-function-calling-v2-zephyr

Viewer • Updated Oct 17, 2023 • 101k • 18 • 12
glaiveai/glaive-function-calling-v2

Viewer • Updated Sep 27, 2023 • 113k • 15.5k • 498
Trelis/function_calling_extended

Viewer • Updated Dec 4, 2023 • 76 • 155 • 51

The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

Paper • 2312.01552 • Published Dec 4, 2023 • 31
Alignment faking in large language models

Paper • 2412.14093 • Published Dec 18, 2024 • 10

Pearl: A Production-ready Reinforcement Learning Agent

Paper • 2312.03814 • Published Dec 6, 2023 • 15
Secrets of RLHF in Large Language Models Part II: Reward Modeling

Paper • 2401.06080 • Published Jan 11, 2024 • 27
Contrastive Prefence Learning: Learning from Human Feedback without RL

Paper • 2310.13639 • Published Oct 20, 2023 • 25
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

Paper • 2402.01391 • Published Feb 2, 2024 • 43

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Paper • 2312.06585 • Published Dec 11, 2023 • 29
TinyGSM: achieving >80% on GSM8k with small language models

Paper • 2312.09241 • Published Dec 14, 2023 • 39
SciPhi/AgentSearch-V1

Viewer • Updated Jan 14, 2024 • 70k • 1.8k • 92
Data Filtering Networks

Paper • 2309.17425 • Published Sep 29, 2023 • 6

OneLLM: One Framework to Align All Modalities with Language

Paper • 2312.03700 • Published Dec 6, 2023 • 24
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion

Paper • 2402.03162 • Published Feb 5, 2024 • 19
Rolling Diffusion Models

Paper • 2402.09470 • Published Feb 12, 2024 • 13
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Paper • 2402.12226 • Published Feb 19, 2024 • 45

openchat/openchat-3.5-1210

Text Generation • 7B • Updated May 18, 2024 • 2.07k • 278
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Paper • 2401.04081 • Published Jan 8, 2024 • 74
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 142
Babelscape/rebel-large

0.4B • Updated Jun 20, 2023 • 160k • 235

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Paper • 2401.05566 • Published Jan 10, 2024 • 31

Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages

Paper • 2401.05811 • Published Jan 11, 2024 • 9

ParadigmShift-Inquiry

Transformers are Multi-State RNNs

Paper • 2401.06104 • Published Jan 11, 2024 • 39
Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1, 2024 • 24
Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 24

abacusai/MetaMathFewshot

Viewer • Updated Jan 17, 2024 • 395k • 71 • 28
math-ai/StackMathQA

Viewer • Updated Nov 20, 2025 • 6.2M • 1.91k • 102
meta-math/MetaMathQA

Viewer • Updated Dec 21, 2023 • 395k • 26.8k • 451
argilla/distilabel-math-preference-dpo

Viewer • Updated Jul 16, 2024 • 2.42k • 353 • 88

Zero Bubble Pipeline Parallelism

Paper • 2401.10241 • Published Nov 30, 2023 • 25

Efficient Training

Rethinking Optimization and Architecture for Tiny Language Models

Paper • 2402.02791 • Published Feb 5, 2024 • 13
Specialized Language Models with Cheap Inference from Limited Domain Data

Paper • 2402.01093 • Published Feb 2, 2024 • 47
Scavenging Hyena: Distilling Transformers into Long Convolution Models

Paper • 2401.17574 • Published Jan 31, 2024 • 17
Understanding LLMs: A Comprehensive Overview from Training to Inference

Paper • 2401.02038 • Published Jan 4, 2024 • 65

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

Paper • 2401.06951 • Published Jan 13, 2024 • 26
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

Paper • 2401.01325 • Published Jan 2, 2024 • 27
Extending LLMs' Context Window with 100 Samples

Paper • 2401.07004 • Published Jan 13, 2024 • 16
LongAlign: A Recipe for Long Context Alignment of Large Language Models

Paper • 2401.18058 • Published Jan 31, 2024 • 24

Quantization | Compression

BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Paper • 2402.04291 • Published Feb 6, 2024 • 50
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Paper • 2401.18079 • Published Jan 31, 2024 • 8
Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

Paper • 2402.08958 • Published Feb 14, 2024 • 5
OneBit: Towards Extremely Low-bit Large Language Models

Paper • 2402.11295 • Published Feb 17, 2024 • 24

GhostWriter: Augmenting Collaborative Human-AI Writing Experiences Through Personalization and Agency

Paper • 2402.08855 • Published Feb 13, 2024 • 14

LLM x Animation

Keyframer: Empowering Animation Design using Large Language Models

Paper • 2402.06071 • Published Feb 8, 2024 • 13
Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text

Paper • 2311.07446 • Published Nov 13, 2023 • 29
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

Paper • 2311.17117 • Published Nov 28, 2023 • 6

A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts

Paper • 2402.09727 • Published Feb 15, 2024 • 38

3D Gaussian Splatting for Real-Time Radiance Field Rendering

Paper • 2308.04079 • Published Aug 8, 2023 • 199
DreamGaussian4D: Generative 4D Gaussian Splatting

Paper • 2312.17142 • Published Dec 28, 2023 • 19
LangSplat: 3D Language Gaussian Splatting

Paper • 2312.16084 • Published Dec 26, 2023 • 16

Learning and Leveraging World Models in Visual Representation Learning

Paper • 2403.00504 • Published Mar 1, 2024 • 33
Word Sense Linking: Disambiguating Outside the Sandbox

Paper • 2412.09370 • Published Dec 12, 2024 • 10

How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?

Paper • 2412.18495 • Published Dec 24, 2024 • 9
Ultra-Sparse Memory Network

Paper • 2411.12364 • Published Nov 19, 2024 • 23
Effective and Efficient Conversation Retrieval for Dialogue State Tracking with Implicit Text Summaries

Paper • 2402.13043 • Published Feb 20, 2024 • 2
Agent Workflow Memory

Paper • 2409.07429 • Published Sep 11, 2024 • 32

Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 108
Large Action Models: From Inception to Implementation

Paper • 2412.10047 • Published Dec 13, 2024 • 36

GUI Agents: A Survey

Paper • 2412.13501 • Published Dec 18, 2024 • 30

Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation

Paper • 2310.18628 • Published Oct 28, 2023 • 8
ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation

Paper • 2311.00272 • Published Nov 1, 2023 • 11
Magicoder: Source Code Is All You Need

Paper • 2312.02120 • Published Dec 4, 2023 • 82
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

Paper • 2312.04474 • Published Dec 7, 2023 • 34

Alignment: FineTuning-Preference

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Paper • 2311.03285 • Published Nov 6, 2023 • 30
Tailoring Self-Rationalizers with Multi-Reward Distillation

Paper • 2311.02805 • Published Nov 6, 2023 • 6
Ultra-Long Sequence Distributed Transformer

Paper • 2311.02382 • Published Nov 4, 2023 • 6
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

Paper • 2309.11235 • Published Sep 20, 2023 • 15

Levels of AGI for Operationalizing Progress on the Path to AGI

Paper • 2311.02462 • Published Nov 4, 2023 • 36
Ultra-Long Sequence Distributed Transformer

Paper • 2311.02382 • Published Nov 4, 2023 • 6
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code

Paper • 2311.07989 • Published Nov 14, 2023 • 26
GRIM: GRaph-based Interactive narrative visualization for gaMes

Paper • 2311.09213 • Published Nov 15, 2023 • 13

GRIM: GRaph-based Interactive narrative visualization for gaMes

Paper • 2311.09213 • Published Nov 15, 2023 • 13
Invariant Graph Transformer

Paper • 2312.07859 • Published Dec 13, 2023 • 9

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

Paper • 2311.12022 • Published Nov 20, 2023 • 36
GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 246
gorilla-llm/APIBench

Updated May 29, 2023 • 1.33k • 74
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models

Paper • 2312.04724 • Published Dec 7, 2023 • 21

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 246
ToolTalk: Evaluating Tool-Usage in a Conversational Setting

Paper • 2311.10775 • Published Nov 15, 2023 • 9
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems

Paper • 2311.11315 • Published Nov 19, 2023 • 7
An Embodied Generalist Agent in 3D World

Paper • 2311.12871 • Published Nov 18, 2023 • 8

LDJnr/Verified-Camel

Viewer • Updated Jun 3, 2024 • 127 • 49 • 43
LDJnr/LessWrong-Amplify-Instruct

Viewer • Updated Jun 3, 2024 • 663 • 65 • 46
allenai/qasper

Viewer • Updated Oct 7, 2022 • 1.59k • 5.89k • 97
Digital Socrates: Evaluating LLMs through explanation critiques

Paper • 2311.09613 • Published Nov 16, 2023 • 1

Digital Socrates: Evaluating LLMs through explanation critiques

Paper • 2311.09613 • Published Nov 16, 2023 • 1
gorilla-llm/APIBench

Updated May 29, 2023 • 1.33k • 74
PromptBench: A Unified Library for Evaluation of Large Language Models

Paper • 2312.07910 • Published Dec 13, 2023 • 16
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models

Paper • 2412.09645 • Published Dec 10, 2024 • 36

allenai/soda

Viewer • Updated Jan 4, 2023 • 1.49M • 1.06k • 153
allenai/prosocial-dialog

Viewer • Updated Feb 3, 2023 • 166k • 787 • 118
Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk

Paper • 2401.05033 • Published Jan 10, 2024 • 18

Model Architectures

togethercomputer/StripedHyena-Hessian-7B

Text Generation • 8B • Updated Mar 27, 2024 • 51 • 66
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention

Paper • 2312.08618 • Published Dec 14, 2023 • 13
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

Paper • 2312.07987 • Published Dec 13, 2023 • 41
LLM360: Towards Fully Transparent Open-Source LLMs

Paper • 2312.06550 • Published Dec 11, 2023 • 57

Distributed Inference and Fine-tuning of Large Language Models Over The Internet

Paper • 2312.08361 • Published Dec 13, 2023 • 27
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes

Paper • 2312.06353 • Published Dec 11, 2023 • 7
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache

Paper • 2401.02669 • Published Jan 5, 2024 • 17
LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 263

Context Tuning for Retrieval Augmented Generation

Paper • 2312.05708 • Published Dec 9, 2023 • 16
Dense X Retrieval: What Retrieval Granularity Should We Use?

Paper • 2312.06648 • Published Dec 11, 2023 • 1
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

Paper • 2401.18059 • Published Jan 31, 2024 • 48
Retrieval-Augmented Generation for Large Language Models: A Survey

Paper • 2312.10997 • Published Dec 18, 2023 • 12

LLM Pretraining

Scaling Data-Constrained Language Models

Paper • 2305.16264 • Published May 25, 2023 • 16
Ziya2: Data-centric Learning is All LLMs Need

Paper • 2311.03301 • Published Nov 6, 2023 • 20
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

Paper • 2401.16380 • Published Jan 29, 2024 • 53

Self-Learning AI

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Paper • 2401.01335 • Published Jan 2, 2024 • 69

Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models

Paper • 2401.06102 • Published Jan 11, 2024 • 21
Rethinking Interpretability in the Era of Large Language Models

Paper • 2402.01761 • Published Jan 30, 2024 • 23

Efficient-Continuous Training

LLaMA Pro: Progressive LLaMA with Block Expansion

Paper • 2401.02415 • Published Jan 4, 2024 • 54
Scaling Laws for Downstream Task Performance of Large Language Models

Paper • 2402.04177 • Published Feb 6, 2024 • 20

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 59
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

Paper • 2006.16668 • Published Jun 30, 2020 • 4
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

Paper • 2402.01739 • Published Jan 29, 2024 • 28
BlackMamba: Mixture of Experts for State-Space Models

Paper • 2402.01771 • Published Feb 1, 2024 • 25

Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation

Paper • 2401.10838 • Published Jan 19, 2024 • 9

InContext Learning

In-Context Language Learning: Architectures and Algorithms

Paper • 2401.12973 • Published Jan 23, 2024 • 4
Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 24
Transformers Can Achieve Length Generalization But Not Robustly

Paper • 2402.09371 • Published Feb 14, 2024 • 14
Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers

Paper • 2412.12276 • Published Dec 16, 2024 • 15

LLM x Symbolics

SymbolicAI: A framework for logic-based approaches combining generative models and solvers

Paper • 2402.00854 • Published Feb 1, 2024 • 21
Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion

Paper • 2401.12947 • Published Jan 23, 2024 • 4

Tool Use | Function Calling

Efficient Tool Use with Chain-of-Abstraction Reasoning

Paper • 2401.17464 • Published Jan 30, 2024 • 21

Computing Power and the Governance of Artificial Intelligence

Paper • 2402.08797 • Published Feb 13, 2024 • 15

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

Paper • 2402.06332 • Published Feb 9, 2024 • 19
Augmenting Math Word Problems via Iterative Question Composing

Paper • 2401.09003 • Published Jan 17, 2024 • 2
MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline

Paper • 2401.08190 • Published Jan 16, 2024
Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent

Paper • 2312.08926 • Published Dec 14, 2023 • 9

GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering

Paper • 2402.10128 • Published Feb 15, 2024 • 17

Modality: Video

LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing

Paper • 2402.10294 • Published Feb 15, 2024 • 27
Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23, 2024 • 72
Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 147
GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 98

Mambas and LLM-AltArch

Graph Mamba: Towards Learning on Graphs with State Space Models

Paper • 2402.08678 • Published Feb 13, 2024 • 17
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

Paper • 2402.04248 • Published Feb 6, 2024 • 32
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24, 2024 • 59
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Paper • 2401.09417 • Published Jan 17, 2024 • 62

Function Calling Datasets

driaforall/pythonic-function-calling

Viewer • Updated Feb 6, 2025 • 81.8k • 91 • 31
AymanTarig/function-calling-v0.2-with-r1-cot

Viewer • Updated Feb 3, 2025 • 58k • 68 • 44

How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?

Paper • 2412.18495 • Published Dec 24, 2024 • 9
Ultra-Sparse Memory Network

Paper • 2411.12364 • Published Nov 19, 2024 • 23
Effective and Efficient Conversation Retrieval for Dialogue State Tracking with Implicit Text Summaries

Paper • 2402.13043 • Published Feb 20, 2024 • 2
Agent Workflow Memory

Paper • 2409.07429 • Published Sep 11, 2024 • 32

Video Creation by Demonstration

Paper • 2412.09551 • Published Dec 12, 2024 • 9
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published Dec 10, 2024 • 48
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Paper • 2412.06531 • Published Dec 9, 2024 • 72
APOLLO: SGD-like Memory, AdamW-level Performance

Paper • 2412.05270 • Published Dec 6, 2024 • 37

Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 108
Large Action Models: From Inception to Implementation

Paper • 2412.10047 • Published Dec 13, 2024 • 36

OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain

Paper • 2412.13018 • Published Dec 17, 2024 • 41

GUI Agents: A Survey

Paper • 2412.13501 • Published Dec 18, 2024 • 30

Position Papers

How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 53
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

Paper • 2412.14161 • Published Dec 18, 2024 • 51
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published Dec 18, 2024 • 25
The Open Source Advantage in Large Language Models (LLMs)

Paper • 2412.12004 • Published Dec 16, 2024 • 10

Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation

Paper • 2310.18628 • Published Oct 28, 2023 • 8
ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation

Paper • 2311.00272 • Published Nov 1, 2023 • 11
Magicoder: Source Code Is All You Need

Paper • 2312.02120 • Published Dec 4, 2023 • 82
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

Paper • 2312.04474 • Published Dec 7, 2023 • 34

Reasoning | Planning

Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation

Paper • 2310.18628 • Published Oct 28, 2023 • 8
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise

Paper • 2310.19019 • Published Oct 29, 2023 • 9
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

Paper • 2311.02262 • Published Nov 3, 2023 • 14
Thread of Thought Unraveling Chaotic Contexts

Paper • 2311.08734 • Published Nov 15, 2023 • 7

Alignment: FineTuning-Preference

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Paper • 2311.03285 • Published Nov 6, 2023 • 30
Tailoring Self-Rationalizers with Multi-Reward Distillation

Paper • 2311.02805 • Published Nov 6, 2023 • 6
Ultra-Long Sequence Distributed Transformer

Paper • 2311.02382 • Published Nov 4, 2023 • 6
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

Paper • 2309.11235 • Published Sep 20, 2023 • 15

Data Efficiency

Ziya2: Data-centric Learning is All LLMs Need

Paper • 2311.03301 • Published Nov 6, 2023 • 20
Memory Augmented Language Models through Mixture of Word Experts

Paper • 2311.10768 • Published Nov 15, 2023 • 19
TinyGSM: achieving >80% on GSM8k with small language models

Paper • 2312.09241 • Published Dec 14, 2023 • 39
Time is Encoded in the Weights of Finetuned Language Models

Paper • 2312.13401 • Published Dec 20, 2023 • 20

Levels of AGI for Operationalizing Progress on the Path to AGI

Paper • 2311.02462 • Published Nov 4, 2023 • 36
Ultra-Long Sequence Distributed Transformer

Paper • 2311.02382 • Published Nov 4, 2023 • 6
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code

Paper • 2311.07989 • Published Nov 14, 2023 • 26
GRIM: GRaph-based Interactive narrative visualization for gaMes

Paper • 2311.09213 • Published Nov 15, 2023 • 13

Efficient Inference

Prompt Cache: Modular Attention Reuse for Low-Latency Inference

Paper • 2311.04934 • Published Nov 7, 2023 • 32
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models

Paper • 2311.08692 • Published Nov 15, 2023 • 13
Exponentially Faster Language Modelling

Paper • 2311.10770 • Published Nov 15, 2023 • 119
Memory Augmented Language Models through Mixture of Word Experts

Paper • 2311.10768 • Published Nov 15, 2023 • 19

GRIM: GRaph-based Interactive narrative visualization for gaMes

Paper • 2311.09213 • Published Nov 15, 2023 • 13
Invariant Graph Transformer

Paper • 2312.07859 • Published Dec 13, 2023 • 9

GRIM: GRaph-based Interactive narrative visualization for gaMes

Paper • 2311.09213 • Published Nov 15, 2023 • 13
LARP: Language-Agent Role Play for Open-World Games

Paper • 2312.17653 • Published Dec 24, 2023 • 33
Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23, 2024 • 72

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

Paper • 2311.12022 • Published Nov 20, 2023 • 36
GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 246
gorilla-llm/APIBench

Updated May 29, 2023 • 1.33k • 74
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models

Paper • 2312.04724 • Published Dec 7, 2023 • 21

Libraries and Framworks

Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning

Paper • 2311.11077 • Published Nov 18, 2023 • 29
Multi-line AI-assisted Code Authoring

Paper • 2402.04141 • Published Feb 6, 2024 • 10
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models

Paper • 2402.10524 • Published Feb 16, 2024 • 23
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

Paper • 2402.10379 • Published Feb 16, 2024 • 31

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 246
ToolTalk: Evaluating Tool-Usage in a Conversational Setting

Paper • 2311.10775 • Published Nov 15, 2023 • 9
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems

Paper • 2311.11315 • Published Nov 19, 2023 • 7
An Embodied Generalist Agent in 3D World

Paper • 2311.12871 • Published Nov 18, 2023 • 8

Preference Dataset

lvwerra/stack-exchange-paired

Viewer • Updated Mar 13, 2023 • 31.3M • 1.82k • 149
Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses

Paper • 2312.00763 • Published Dec 1, 2023 • 23
TOFU: A Task of Fictitious Unlearning for LLMs

Paper • 2401.06121 • Published Jan 11, 2024 • 20

LDJnr/Verified-Camel

Viewer • Updated Jun 3, 2024 • 127 • 49 • 43
LDJnr/LessWrong-Amplify-Instruct

Viewer • Updated Jun 3, 2024 • 663 • 65 • 46
allenai/qasper

Viewer • Updated Oct 7, 2022 • 1.59k • 5.89k • 97
Digital Socrates: Evaluating LLMs through explanation critiques

Paper • 2311.09613 • Published Nov 16, 2023 • 1

FudanSELab/ClassEval

Viewer • Updated Aug 26, 2024 • 100 • 3.9k • 12
code-search-net/code_search_net

Viewer • Updated Feb 23 • 4.14M • 19.6k • 326
open-phi/programming_books_llama

Viewer • Updated Oct 4, 2023 • 111k • 143 • 36
FudanSELab/CodeGen4Libs

Updated Oct 5, 2023 • 66 • 5

Digital Socrates: Evaluating LLMs through explanation critiques

Paper • 2311.09613 • Published Nov 16, 2023 • 1
gorilla-llm/APIBench

Updated May 29, 2023 • 1.33k • 74
PromptBench: A Unified Library for Evaluation of Large Language Models

Paper • 2312.07910 • Published Dec 13, 2023 • 16
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models

Paper • 2412.09645 • Published Dec 10, 2024 • 36

Function Calling Dataset

gorilla-llm/gorilla-openfunctions-v1

Text Generation • Updated Nov 21, 2023 • 18 • 92
rizerphe/glaive-function-calling-v2-zephyr

Viewer • Updated Oct 17, 2023 • 101k • 18 • 12
glaiveai/glaive-function-calling-v2

Viewer • Updated Sep 27, 2023 • 113k • 15.5k • 498
Trelis/function_calling_extended

Viewer • Updated Dec 4, 2023 • 76 • 155 • 51

allenai/soda

Viewer • Updated Jan 4, 2023 • 1.49M • 1.06k • 153
allenai/prosocial-dialog

Viewer • Updated Feb 3, 2023 • 166k • 787 • 118
Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk

Paper • 2401.05033 • Published Jan 10, 2024 • 18

The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

Paper • 2312.01552 • Published Dec 4, 2023 • 31
Alignment faking in large language models

Paper • 2412.14093 • Published Dec 18, 2024 • 10

Model Architectures

togethercomputer/StripedHyena-Hessian-7B

Text Generation • 8B • Updated Mar 27, 2024 • 51 • 66
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention

Paper • 2312.08618 • Published Dec 14, 2023 • 13
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

Paper • 2312.07987 • Published Dec 13, 2023 • 41
LLM360: Towards Fully Transparent Open-Source LLMs

Paper • 2312.06550 • Published Dec 11, 2023 • 57

Pearl: A Production-ready Reinforcement Learning Agent

Paper • 2312.03814 • Published Dec 6, 2023 • 15
Secrets of RLHF in Large Language Models Part II: Reward Modeling

Paper • 2401.06080 • Published Jan 11, 2024 • 27
Contrastive Prefence Learning: Learning from Human Feedback without RL

Paper • 2310.13639 • Published Oct 20, 2023 • 25
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

Paper • 2402.01391 • Published Feb 2, 2024 • 43

Distributed Inference and Fine-tuning of Large Language Models Over The Internet

Paper • 2312.08361 • Published Dec 13, 2023 • 27
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes

Paper • 2312.06353 • Published Dec 11, 2023 • 7
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache

Paper • 2401.02669 • Published Jan 5, 2024 • 17
LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 263

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Paper • 2312.06585 • Published Dec 11, 2023 • 29
TinyGSM: achieving >80% on GSM8k with small language models

Paper • 2312.09241 • Published Dec 14, 2023 • 39
SciPhi/AgentSearch-V1

Viewer • Updated Jan 14, 2024 • 70k • 1.8k • 92
Data Filtering Networks

Paper • 2309.17425 • Published Sep 29, 2023 • 6

Context Tuning for Retrieval Augmented Generation

Paper • 2312.05708 • Published Dec 9, 2023 • 16
Dense X Retrieval: What Retrieval Granularity Should We Use?

Paper • 2312.06648 • Published Dec 11, 2023 • 1
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

Paper • 2401.18059 • Published Jan 31, 2024 • 48
Retrieval-Augmented Generation for Large Language Models: A Survey

Paper • 2312.10997 • Published Dec 18, 2023 • 12

OneLLM: One Framework to Align All Modalities with Language

Paper • 2312.03700 • Published Dec 6, 2023 • 24
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion

Paper • 2402.03162 • Published Feb 5, 2024 • 19
Rolling Diffusion Models

Paper • 2402.09470 • Published Feb 12, 2024 • 13
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

Paper • 2402.12226 • Published Feb 19, 2024 • 45

LLM Pretraining

Scaling Data-Constrained Language Models

Paper • 2305.16264 • Published May 25, 2023 • 16
Ziya2: Data-centric Learning is All LLMs Need

Paper • 2311.03301 • Published Nov 6, 2023 • 20
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

Paper • 2401.16380 • Published Jan 29, 2024 • 53

openchat/openchat-3.5-1210

Text Generation • 7B • Updated May 18, 2024 • 2.07k • 278
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Paper • 2401.04081 • Published Jan 8, 2024 • 74
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 142
Babelscape/rebel-large

0.4B • Updated Jun 20, 2023 • 160k • 235

Self-Learning AI

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Paper • 2401.01335 • Published Jan 2, 2024 • 69

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Paper • 2401.05566 • Published Jan 10, 2024 • 31

Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models

Paper • 2401.06102 • Published Jan 11, 2024 • 21
Rethinking Interpretability in the Era of Large Language Models

Paper • 2402.01761 • Published Jan 30, 2024 • 23

Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages

Paper • 2401.05811 • Published Jan 11, 2024 • 9

Efficient-Continuous Training

LLaMA Pro: Progressive LLaMA with Block Expansion

Paper • 2401.02415 • Published Jan 4, 2024 • 54
Scaling Laws for Downstream Task Performance of Large Language Models

Paper • 2402.04177 • Published Feb 6, 2024 • 20

ParadigmShift-Inquiry

Transformers are Multi-State RNNs

Paper • 2401.06104 • Published Jan 11, 2024 • 39
Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1, 2024 • 24
Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 24

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 59
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

Paper • 2006.16668 • Published Jun 30, 2020 • 4
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

Paper • 2402.01739 • Published Jan 29, 2024 • 28
BlackMamba: Mixture of Experts for State-Space Models

Paper • 2402.01771 • Published Feb 1, 2024 • 25

abacusai/MetaMathFewshot

Viewer • Updated Jan 17, 2024 • 395k • 71 • 28
math-ai/StackMathQA

Viewer • Updated Nov 20, 2025 • 6.2M • 1.91k • 102
meta-math/MetaMathQA

Viewer • Updated Dec 21, 2023 • 395k • 26.8k • 451
argilla/distilabel-math-preference-dpo

Viewer • Updated Jul 16, 2024 • 2.42k • 353 • 88

Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation

Paper • 2401.10838 • Published Jan 19, 2024 • 9

Zero Bubble Pipeline Parallelism

Paper • 2401.10241 • Published Nov 30, 2023 • 25

InContext Learning

In-Context Language Learning: Architectures and Algorithms

Paper • 2401.12973 • Published Jan 23, 2024 • 4
Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 24
Transformers Can Achieve Length Generalization But Not Robustly

Paper • 2402.09371 • Published Feb 14, 2024 • 14
Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers

Paper • 2412.12276 • Published Dec 16, 2024 • 15

Efficient Training

Rethinking Optimization and Architecture for Tiny Language Models

Paper • 2402.02791 • Published Feb 5, 2024 • 13
Specialized Language Models with Cheap Inference from Limited Domain Data

Paper • 2402.01093 • Published Feb 2, 2024 • 47
Scavenging Hyena: Distilling Transformers into Long Convolution Models

Paper • 2401.17574 • Published Jan 31, 2024 • 17
Understanding LLMs: A Comprehensive Overview from Training to Inference

Paper • 2401.02038 • Published Jan 4, 2024 • 65

LLM x Symbolics

SymbolicAI: A framework for logic-based approaches combining generative models and solvers

Paper • 2402.00854 • Published Feb 1, 2024 • 21
Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion

Paper • 2401.12947 • Published Jan 23, 2024 • 4

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

Paper • 2401.06951 • Published Jan 13, 2024 • 26
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

Paper • 2401.01325 • Published Jan 2, 2024 • 27
Extending LLMs' Context Window with 100 Samples

Paper • 2401.07004 • Published Jan 13, 2024 • 16
LongAlign: A Recipe for Long Context Alignment of Large Language Models

Paper • 2401.18058 • Published Jan 31, 2024 • 24

Tool Use | Function Calling

Efficient Tool Use with Chain-of-Abstraction Reasoning

Paper • 2401.17464 • Published Jan 30, 2024 • 21

Quantization | Compression

BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Paper • 2402.04291 • Published Feb 6, 2024 • 50
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Paper • 2401.18079 • Published Jan 31, 2024 • 8
Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

Paper • 2402.08958 • Published Feb 14, 2024 • 5
OneBit: Towards Extremely Low-bit Large Language Models

Paper • 2402.11295 • Published Feb 17, 2024 • 24

Computing Power and the Governance of Artificial Intelligence

Paper • 2402.08797 • Published Feb 13, 2024 • 15

GhostWriter: Augmenting Collaborative Human-AI Writing Experiences Through Personalization and Agency

Paper • 2402.08855 • Published Feb 13, 2024 • 14

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

Paper • 2402.06332 • Published Feb 9, 2024 • 19
Augmenting Math Word Problems via Iterative Question Composing

Paper • 2401.09003 • Published Jan 17, 2024 • 2
MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline

Paper • 2401.08190 • Published Jan 16, 2024
Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent

Paper • 2312.08926 • Published Dec 14, 2023 • 9

LLM x Animation

Keyframer: Empowering Animation Design using Large Language Models

Paper • 2402.06071 • Published Feb 8, 2024 • 13
Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text

Paper • 2311.07446 • Published Nov 13, 2023 • 29
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

Paper • 2311.17117 • Published Nov 28, 2023 • 6

GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering

Paper • 2402.10128 • Published Feb 15, 2024 • 17

A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts

Paper • 2402.09727 • Published Feb 15, 2024 • 38

Modality: Video

LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing

Paper • 2402.10294 • Published Feb 15, 2024 • 27
Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23, 2024 • 72
Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 147
GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 98

3D Gaussian Splatting for Real-Time Radiance Field Rendering

Paper • 2308.04079 • Published Aug 8, 2023 • 199
DreamGaussian4D: Generative 4D Gaussian Splatting

Paper • 2312.17142 • Published Dec 28, 2023 • 19
LangSplat: 3D Language Gaussian Splatting

Paper • 2312.16084 • Published Dec 26, 2023 • 16

Mambas and LLM-AltArch

Graph Mamba: Towards Learning on Graphs with State Space Models

Paper • 2402.08678 • Published Feb 13, 2024 • 17
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

Paper • 2402.04248 • Published Feb 6, 2024 • 32
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24, 2024 • 59
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Paper • 2401.09417 • Published Jan 17, 2024 • 62

Learning and Leveraging World Models in Visual Representation Learning

Paper • 2403.00504 • Published Mar 1, 2024 • 33
Word Sense Linking: Disambiguating Outside the Sandbox

Paper • 2412.09370 • Published Dec 12, 2024 • 10

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs