Abdel-Dayane Marcos
admarcosai
AI & ML interests
Natural Language Processing, Graph Neural Networks, Reinforcement Learning
Organizations
None yet
Pending Papers
-
Video Creation by Demonstration
Paper • 2412.09551 • Published • 9 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 49 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 73 -
APOLLO: SGD-like Memory, AdamW-level Performance
Paper • 2412.05270 • Published • 39
LLM x Finance
Position Papers
-
How to Synthesize Text Data without Model Collapse?
Paper • 2412.14689 • Published • 53 -
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Paper • 2412.14161 • Published • 52 -
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Paper • 2412.14171 • Published • 24 -
The Open Source Advantage in Large Language Models (LLMs)
Paper • 2412.12004 • Published • 10
Reasoning | Planning
-
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation
Paper • 2310.18628 • Published • 8 -
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
Paper • 2310.19019 • Published • 9 -
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
Paper • 2311.02262 • Published • 15 -
Thread of Thought Unraveling Chaotic Contexts
Paper • 2311.08734 • Published • 7
Data Efficiency
-
Ziya2: Data-centric Learning is All LLMs Need
Paper • 2311.03301 • Published • 20 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 18 -
TinyGSM: achieving >80% on GSM8k with small language models
Paper • 2312.09241 • Published • 39 -
Time is Encoded in the Weights of Finetuned Language Models
Paper • 2312.13401 • Published • 21
Efficient Inference
-
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 34 -
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models
Paper • 2311.08692 • Published • 13 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 119 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 18
AI x GAMES
Libraries and Framworks
-
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
Paper • 2311.11077 • Published • 28 -
Multi-line AI-assisted Code Authoring
Paper • 2402.04141 • Published • 10 -
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
Paper • 2402.10524 • Published • 24 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 32
Preference Dataset
Coding Dataset
Function Calling Dataset
Alignment
LLM x RL
-
Pearl: A Production-ready Reinforcement Learning Agent
Paper • 2312.03814 • Published • 16 -
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper • 2401.06080 • Published • 29 -
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper • 2310.13639 • Published • 25 -
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
Paper • 2402.01391 • Published • 44
Datasets
-
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Paper • 2312.06585 • Published • 29 -
TinyGSM: achieving >80% on GSM8k with small language models
Paper • 2312.09241 • Published • 39 -
SciPhi/AgentSearch-V1
Viewer • Updated • 70k • 8.49k • 86 -
Data Filtering Networks
Paper • 2309.17425 • Published • 6
LMMM
-
OneLLM: One Framework to Align All Modalities with Language
Paper • 2312.03700 • Published • 24 -
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion
Paper • 2402.03162 • Published • 19 -
Rolling Diffusion Models
Paper • 2402.09470 • Published • 14 -
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Paper • 2402.12226 • Published • 45
Models
-
openchat/openchat-3.5-1210
Text Generation • Updated • 8.56k • 277 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 72 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 123 -
Babelscape/rebel-large
Text2Text Generation • Updated • 24.2k • 223
LLM-Security
MultiLingual
ParadigmShift-Inquiry
Math Datasets
Parallellism
Efficient Training
-
Rethinking Optimization and Architecture for Tiny Language Models
Paper • 2402.02791 • Published • 13 -
Specialized Language Models with Cheap Inference from Limited Domain Data
Paper • 2402.01093 • Published • 48 -
Scavenging Hyena: Distilling Transformers into Long Convolution Models
Paper • 2401.17574 • Published • 17 -
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 66
Long Context
-
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper • 2401.06951 • Published • 27 -
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper • 2401.01325 • Published • 28 -
Extending LLMs' Context Window with 100 Samples
Paper • 2401.07004 • Published • 16 -
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper • 2401.18058 • Published • 23
Quantization | Compression
-
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper • 2402.04291 • Published • 51 -
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Paper • 2401.18079 • Published • 7 -
Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers
Paper • 2402.08958 • Published • 6 -
OneBit: Towards Extremely Low-bit Large Language Models
Paper • 2402.11295 • Published • 25
LLM | Writing
LLM x Animation
-
Keyframer: Empowering Animation Design using Large Language Models
Paper • 2402.06071 • Published • 13 -
Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text
Paper • 2311.07446 • Published • 29 -
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
Paper • 2311.17117 • Published • 6
Memory
3D - AI
World Models
Pending 2
-
How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?
Paper • 2412.18495 • Published • 9 -
Ultra-Sparse Memory Network
Paper • 2411.12364 • Published • 24 -
Effective and Efficient Conversation Retrieval for Dialogue State Tracking with Implicit Text Summaries
Paper • 2402.13043 • Published • 2 -
Agent Workflow Memory
Paper • 2409.07429 • Published • 32
Architectures
HCI
Coding
-
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation
Paper • 2310.18628 • Published • 8 -
ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation
Paper • 2311.00272 • Published • 11 -
Magicoder: Source Code Is All You Need
Paper • 2312.02120 • Published • 82 -
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Paper • 2312.04474 • Published • 33
Alignment: FineTuning-Preference
-
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper • 2311.03285 • Published • 32 -
Tailoring Self-Rationalizers with Multi-Reward Distillation
Paper • 2311.02805 • Published • 7 -
Ultra-Long Sequence Distributed Transformer
Paper • 2311.02382 • Published • 6 -
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Paper • 2309.11235 • Published • 15
Survey
-
Levels of AGI for Operationalizing Progress on the Path to AGI
Paper • 2311.02462 • Published • 37 -
Ultra-Long Sequence Distributed Transformer
Paper • 2311.02382 • Published • 6 -
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code
Paper • 2311.07989 • Published • 25 -
GRIM: GRaph-based Interactive narrative visualization for gaMes
Paper • 2311.09213 • Published • 13
LLM x GRAPHS
Benchmarks
-
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
Paper • 2311.12022 • Published • 31 -
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 220 -
gorilla-llm/APIBench
Updated • 116 • 69 -
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models
Paper • 2312.04724 • Published • 21
Agentics
-
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 220 -
ToolTalk: Evaluating Tool-Usage in a Conversational Setting
Paper • 2311.10775 • Published • 10 -
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems
Paper • 2311.11315 • Published • 8 -
An Embodied Generalist Agent in 3D World
Paper • 2311.12871 • Published • 8
QA Dataset
LLM Evaluation
-
Digital Socrates: Evaluating LLMs through explanation critiques
Paper • 2311.09613 • Published • 1 -
gorilla-llm/APIBench
Updated • 116 • 69 -
PromptBench: A Unified Library for Evaluation of Large Language Models
Paper • 2312.07910 • Published • 19 -
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
Paper • 2412.09645 • Published • 37
Conversation
Model Architectures
-
togethercomputer/StripedHyena-Hessian-7B
Text Generation • Updated • 66 • 66 -
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
Paper • 2312.08618 • Published • 15 -
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper • 2312.07987 • Published • 41 -
LLM360: Towards Fully Transparent Open-Source LLMs
Paper • 2312.06550 • Published • 57
Serving
-
Distributed Inference and Fine-tuning of Large Language Models Over The Internet
Paper • 2312.08361 • Published • 28 -
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes
Paper • 2312.06353 • Published • 7 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 16 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 257
LLM x RAG
-
Context Tuning for Retrieval Augmented Generation
Paper • 2312.05708 • Published • 16 -
Dense X Retrieval: What Retrieval Granularity Should We Use?
Paper • 2312.06648 • Published • 1 -
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Paper • 2401.18059 • Published • 46 -
Retrieval-Augmented Generation for Large Language Models: A Survey
Paper • 2312.10997 • Published • 11
LLM Pretraining
Self-Learning AI
XAI
Efficient-Continuous Training
Sparsity
-
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 55 -
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Paper • 2006.16668 • Published • 3 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 29 -
BlackMamba: Mixture of Experts for State-Space Models
Paper • 2402.01771 • Published • 26
AI UX
InContext Learning
-
In-Context Language Learning: Architectures and Algorithms
Paper • 2401.12973 • Published • 4 -
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15 -
Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers
Paper • 2412.12276 • Published • 15
LLM x Symbolics
Tool Use | Function Calling
Regulation
Math
-
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning
Paper • 2402.06332 • Published • 20 -
Augmenting Math Word Problems via Iterative Question Composing
Paper • 2401.09003 • Published • 2 -
MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline
Paper • 2401.08190 • Published -
Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent
Paper • 2312.08926 • Published • 10
3D Generation
Modality: Video
-
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing
Paper • 2402.10294 • Published • 27 -
Genie: Generative Interactive Environments
Paper • 2402.15391 • Published • 73 -
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper • 2412.10360 • Published • 146 -
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 97
Mambas and LLM-AltArch
-
Graph Mamba: Towards Learning on Graphs with State Space Models
Paper • 2402.08678 • Published • 17 -
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Paper • 2402.04248 • Published • 33 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 59 -
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Paper • 2401.09417 • Published • 62
Function Calling Datasets
Pending 2
-
How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?
Paper • 2412.18495 • Published • 9 -
Ultra-Sparse Memory Network
Paper • 2411.12364 • Published • 24 -
Effective and Efficient Conversation Retrieval for Dialogue State Tracking with Implicit Text Summaries
Paper • 2402.13043 • Published • 2 -
Agent Workflow Memory
Paper • 2409.07429 • Published • 32
Pending Papers
-
Video Creation by Demonstration
Paper • 2412.09551 • Published • 9 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 49 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 73 -
APOLLO: SGD-like Memory, AdamW-level Performance
Paper • 2412.05270 • Published • 39
Architectures
LLM x Finance
HCI
Position Papers
-
How to Synthesize Text Data without Model Collapse?
Paper • 2412.14689 • Published • 53 -
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Paper • 2412.14161 • Published • 52 -
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Paper • 2412.14171 • Published • 24 -
The Open Source Advantage in Large Language Models (LLMs)
Paper • 2412.12004 • Published • 10
Coding
-
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation
Paper • 2310.18628 • Published • 8 -
ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation
Paper • 2311.00272 • Published • 11 -
Magicoder: Source Code Is All You Need
Paper • 2312.02120 • Published • 82 -
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Paper • 2312.04474 • Published • 33
Reasoning | Planning
-
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation
Paper • 2310.18628 • Published • 8 -
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
Paper • 2310.19019 • Published • 9 -
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
Paper • 2311.02262 • Published • 15 -
Thread of Thought Unraveling Chaotic Contexts
Paper • 2311.08734 • Published • 7
Alignment: FineTuning-Preference
-
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper • 2311.03285 • Published • 32 -
Tailoring Self-Rationalizers with Multi-Reward Distillation
Paper • 2311.02805 • Published • 7 -
Ultra-Long Sequence Distributed Transformer
Paper • 2311.02382 • Published • 6 -
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Paper • 2309.11235 • Published • 15
Data Efficiency
-
Ziya2: Data-centric Learning is All LLMs Need
Paper • 2311.03301 • Published • 20 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 18 -
TinyGSM: achieving >80% on GSM8k with small language models
Paper • 2312.09241 • Published • 39 -
Time is Encoded in the Weights of Finetuned Language Models
Paper • 2312.13401 • Published • 21
Survey
-
Levels of AGI for Operationalizing Progress on the Path to AGI
Paper • 2311.02462 • Published • 37 -
Ultra-Long Sequence Distributed Transformer
Paper • 2311.02382 • Published • 6 -
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code
Paper • 2311.07989 • Published • 25 -
GRIM: GRaph-based Interactive narrative visualization for gaMes
Paper • 2311.09213 • Published • 13
Efficient Inference
-
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 34 -
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models
Paper • 2311.08692 • Published • 13 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 119 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 18
LLM x GRAPHS
AI x GAMES
Benchmarks
-
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
Paper • 2311.12022 • Published • 31 -
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 220 -
gorilla-llm/APIBench
Updated • 116 • 69 -
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models
Paper • 2312.04724 • Published • 21
Libraries and Framworks
-
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
Paper • 2311.11077 • Published • 28 -
Multi-line AI-assisted Code Authoring
Paper • 2402.04141 • Published • 10 -
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
Paper • 2402.10524 • Published • 24 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 32
Agentics
-
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 220 -
ToolTalk: Evaluating Tool-Usage in a Conversational Setting
Paper • 2311.10775 • Published • 10 -
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems
Paper • 2311.11315 • Published • 8 -
An Embodied Generalist Agent in 3D World
Paper • 2311.12871 • Published • 8
Preference Dataset
QA Dataset
Coding Dataset
LLM Evaluation
-
Digital Socrates: Evaluating LLMs through explanation critiques
Paper • 2311.09613 • Published • 1 -
gorilla-llm/APIBench
Updated • 116 • 69 -
PromptBench: A Unified Library for Evaluation of Large Language Models
Paper • 2312.07910 • Published • 19 -
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
Paper • 2412.09645 • Published • 37
Function Calling Dataset
Conversation
Alignment
Model Architectures
-
togethercomputer/StripedHyena-Hessian-7B
Text Generation • Updated • 66 • 66 -
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
Paper • 2312.08618 • Published • 15 -
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper • 2312.07987 • Published • 41 -
LLM360: Towards Fully Transparent Open-Source LLMs
Paper • 2312.06550 • Published • 57
LLM x RL
-
Pearl: A Production-ready Reinforcement Learning Agent
Paper • 2312.03814 • Published • 16 -
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper • 2401.06080 • Published • 29 -
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper • 2310.13639 • Published • 25 -
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
Paper • 2402.01391 • Published • 44
Serving
-
Distributed Inference and Fine-tuning of Large Language Models Over The Internet
Paper • 2312.08361 • Published • 28 -
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes
Paper • 2312.06353 • Published • 7 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 16 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 257
Datasets
-
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Paper • 2312.06585 • Published • 29 -
TinyGSM: achieving >80% on GSM8k with small language models
Paper • 2312.09241 • Published • 39 -
SciPhi/AgentSearch-V1
Viewer • Updated • 70k • 8.49k • 86 -
Data Filtering Networks
Paper • 2309.17425 • Published • 6
LLM x RAG
-
Context Tuning for Retrieval Augmented Generation
Paper • 2312.05708 • Published • 16 -
Dense X Retrieval: What Retrieval Granularity Should We Use?
Paper • 2312.06648 • Published • 1 -
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Paper • 2401.18059 • Published • 46 -
Retrieval-Augmented Generation for Large Language Models: A Survey
Paper • 2312.10997 • Published • 11
LMMM
-
OneLLM: One Framework to Align All Modalities with Language
Paper • 2312.03700 • Published • 24 -
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion
Paper • 2402.03162 • Published • 19 -
Rolling Diffusion Models
Paper • 2402.09470 • Published • 14 -
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Paper • 2402.12226 • Published • 45
LLM Pretraining
Models
-
openchat/openchat-3.5-1210
Text Generation • Updated • 8.56k • 277 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 72 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 123 -
Babelscape/rebel-large
Text2Text Generation • Updated • 24.2k • 223
Self-Learning AI
LLM-Security
XAI
MultiLingual
Efficient-Continuous Training
ParadigmShift-Inquiry
Sparsity
-
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 55 -
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Paper • 2006.16668 • Published • 3 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 29 -
BlackMamba: Mixture of Experts for State-Space Models
Paper • 2402.01771 • Published • 26
Math Datasets
AI UX
Parallellism
InContext Learning
-
In-Context Language Learning: Architectures and Algorithms
Paper • 2401.12973 • Published • 4 -
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15 -
Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers
Paper • 2412.12276 • Published • 15
Efficient Training
-
Rethinking Optimization and Architecture for Tiny Language Models
Paper • 2402.02791 • Published • 13 -
Specialized Language Models with Cheap Inference from Limited Domain Data
Paper • 2402.01093 • Published • 48 -
Scavenging Hyena: Distilling Transformers into Long Convolution Models
Paper • 2401.17574 • Published • 17 -
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 66
LLM x Symbolics
Long Context
-
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper • 2401.06951 • Published • 27 -
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper • 2401.01325 • Published • 28 -
Extending LLMs' Context Window with 100 Samples
Paper • 2401.07004 • Published • 16 -
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper • 2401.18058 • Published • 23
Tool Use | Function Calling
Quantization | Compression
-
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper • 2402.04291 • Published • 51 -
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Paper • 2401.18079 • Published • 7 -
Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers
Paper • 2402.08958 • Published • 6 -
OneBit: Towards Extremely Low-bit Large Language Models
Paper • 2402.11295 • Published • 25
Regulation
LLM | Writing
Math
-
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning
Paper • 2402.06332 • Published • 20 -
Augmenting Math Word Problems via Iterative Question Composing
Paper • 2401.09003 • Published • 2 -
MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline
Paper • 2401.08190 • Published -
Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent
Paper • 2312.08926 • Published • 10
LLM x Animation
-
Keyframer: Empowering Animation Design using Large Language Models
Paper • 2402.06071 • Published • 13 -
Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text
Paper • 2311.07446 • Published • 29 -
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
Paper • 2311.17117 • Published • 6
3D Generation
Memory
Modality: Video
-
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing
Paper • 2402.10294 • Published • 27 -
Genie: Generative Interactive Environments
Paper • 2402.15391 • Published • 73 -
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper • 2412.10360 • Published • 146 -
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 97
3D - AI
Mambas and LLM-AltArch
-
Graph Mamba: Towards Learning on Graphs with State Space Models
Paper • 2402.08678 • Published • 17 -
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Paper • 2402.04248 • Published • 33 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 59 -
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Paper • 2401.09417 • Published • 62
World Models