)))?!?(((
stereoplegic
AI & ML interests
None yet
Recent Activity
updated
a collection
2 days ago
Coding
updated
a collection
2 days ago
Math
updated
a collection
2 days ago
Reasoning
Organizations
Hallucination
-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 17 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 27 -
MAF: Multi-Aspect Feedback for Improving Reasoning in Large Language Models
Paper • 2310.12426 • Published • 1 -
Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration
Paper • 2310.00280 • Published • 3
Multilingual
-
Dissecting In-Context Learning of Translations in GPTs
Paper • 2310.15987 • Published • 6 -
Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca
Paper • 2309.08958 • Published • 2 -
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Paper • 2305.04160 • Published • 2 -
Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning
Paper • 2310.08166 • Published • 1
RAG
-
KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval
Paper • 2310.15511 • Published • 5 -
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search
Paper • 2310.13227 • Published • 13 -
Reverse Chain: A Generic-Rule for LLMs to Master Multi-API Planning
Paper • 2310.04474 • Published • 2 -
AgentTuning: Enabling Generalized Agent Abilities for LLMs
Paper • 2310.12823 • Published • 36
Long context
-
TRAMS: Training-free Memory Selection for Long-range Language Modeling
Paper • 2310.15494 • Published • 2 -
A Long Way to Go: Investigating Length Correlations in RLHF
Paper • 2310.03716 • Published • 10 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 71 -
Giraffe: Adventures in Expanding Context Lengths in LLMs
Paper • 2308.10882 • Published • 1
Shared params
-
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 43 -
SortedNet, a Place for Every Network and Every Network in its Place: Towards a Generalized Solution for Training Many-in-One Neural Networks
Paper • 2309.00255 • Published • 1 -
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)
Paper • 2309.08968 • Published • 23 -
Matryoshka Representation Learning
Paper • 2205.13147 • Published • 16
Instruct
-
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper • 2310.13961 • Published • 5 -
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Paper • 2309.09582 • Published • 4 -
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models
Paper • 2310.13127 • Published • 12 -
Evaluating the Robustness to Instructions of Large Language Models
Paper • 2308.14306 • Published • 1
CoT
-
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Paper • 2310.15123 • Published • 8 -
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Paper • 2310.13671 • Published • 19 -
Self-Convinced Prompting: Few-Shot Question Answering with Repeated Introspection
Paper • 2310.05035 • Published • 1 -
Chain-of-Thought Reasoning is a Policy Improvement Operator
Paper • 2309.08589 • Published • 2
Agent
-
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Paper • 2310.15123 • Published • 8 -
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search
Paper • 2310.13227 • Published • 13 -
LASER: LLM Agent with State-Space Exploration for Web Navigation
Paper • 2309.08172 • Published • 13 -
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 10
Reasoning
-
Ada-Instruct: Adapting Instruction Generators for Complex Reasoning
Paper • 2310.04484 • Published • 5 -
Diversity of Thought Improves Reasoning Abilities of Large Language Models
Paper • 2310.07088 • Published • 5 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 79 -
Democratizing Reasoning Ability: Tailored Learning from Large Language Model
Paper • 2310.13332 • Published • 16
Datasets
-
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 1 -
Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering
Paper • 2308.13259 • Published • 2 -
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning
Paper • 2309.05653 • Published • 10 -
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Paper • 2309.12284 • Published • 18
Coding
-
Creative Robot Tool Use with Large Language Models
Paper • 2310.13065 • Published • 9 -
CodeCoT and Beyond: Learning to Program and Test like a Developer
Paper • 2308.08784 • Published • 5 -
Lemur: Harmonizing Natural Language and Code for Language Agents
Paper • 2310.06830 • Published • 34 -
CodePlan: Repository-level Coding using LLMs and Planning
Paper • 2309.12499 • Published • 78
Speculative
-
AutoMix: Automatically Mixing Language Models
Paper • 2310.12963 • Published • 14 -
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning
Paper • 2310.03094 • Published • 13 -
MatFormer: Nested Transformer for Elastic Inference
Paper • 2310.07707 • Published • 2 -
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Paper • 2310.08461 • Published • 1
AutoML/NAS
-
AutoML-GPT: Large Language Model for AutoML
Paper • 2309.01125 • Published • 1 -
SAI: Solving AI Tasks with Systematic Artificial Intelligence in Communication Network
Paper • 2310.09049 • Published • 1 -
Prompt2Model: Generating Deployable Models from Natural Language Instructions
Paper • 2308.12261 • Published • 1 -
LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization
Paper • 2306.01102 • Published • 1
Tabular
-
Effective Distillation of Table-based Reasoning Ability from LLMs
Paper • 2309.13182 • Published • 1 -
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 41 -
Tab-CoT: Zero-shot Tabular Chain of Thought
Paper • 2305.17812 • Published • 2 -
GitTables: A Large-Scale Corpus of Relational Tables
Paper • 2106.07258 • Published • 1
Writing
-
EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation
Paper • 2310.08185 • Published • 8 -
GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of Evidence
Paper • 2310.05388 • Published • 4 -
PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers
Paper • 2311.09180 • Published • 8 -
Weaver: Foundation Models for Creative Writing
Paper • 2401.17268 • Published • 45
Quantization
-
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Paper • 2310.08659 • Published • 28 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 44 -
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models
Paper • 2309.02784 • Published • 2 -
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers
Paper • 2309.16119 • Published • 1
Continual learning
-
CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization
Paper • 2310.10134 • Published • 1 -
TiC-CLIP: Continual Training of CLIP Models
Paper • 2310.16226 • Published • 9 -
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Paper • 2310.10638 • Published • 30 -
Controlled Decoding from Language Models
Paper • 2310.17022 • Published • 15
Tokenizer
-
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Paper • 2310.05737 • Published • 4 -
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
Paper • 2308.16692 • Published • 1 -
Towards General Text Embeddings with Multi-stage Contrastive Learning
Paper • 2308.03281 • Published • 2 -
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings
Paper • 2305.11554 • Published • 2
Speech
-
Large-Scale Automatic Audiobook Creation
Paper • 2309.03926 • Published • 54 -
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts
Paper • 2309.11977 • Published • 2 -
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
Paper • 2308.16692 • Published • 1 -
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Paper • 2308.05734 • Published • 37
Emergent
-
Are Emergent Abilities in Large Language Models just In-Context Learning?
Paper • 2309.01809 • Published • 3 -
Commonsense Knowledge Transfer for Pre-trained Language Models
Paper • 2306.02388 • Published • 1 -
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Paper • 2305.01610 • Published • 2 -
Schema-learning and rebinding as mechanisms of in-context learning and emergence
Paper • 2307.01201 • Published • 2
Layout
-
UI Layout Generation with LLMs Guided by UI Grammar
Paper • 2310.15455 • Published • 3 -
You Only Look at Screens: Multimodal Chain-of-Action Agents
Paper • 2309.11436 • Published • 1 -
Never-ending Learning of User Interfaces
Paper • 2308.08726 • Published • 2 -
LMDX: Language Model-based Document Information Extraction and Localization
Paper • 2309.10952 • Published • 66
MoE
-
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper • 2310.16795 • Published • 27 -
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Paper • 2308.12066 • Published • 4 -
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Paper • 2303.06182 • Published • 1 -
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate
Paper • 2112.14397 • Published • 1
Pruning
-
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Paper • 2310.17157 • Published • 14 -
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Paper • 2305.15805 • Published • 1 -
Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt
Paper • 2305.11186 • Published • 1 -
Composable Sparse Fine-Tuning for Cross-Lingual Transfer
Paper • 2110.07560 • Published • 2
KV Cache
-
S^{3}: Increasing GPU Utilization during Generative Inference for Higher Throughput
Paper • 2306.06000 • Published • 1 -
PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference
Paper • 2405.12532 • Published -
SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget
Paper • 2404.04793 • Published • 1 -
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
Paper • 2405.14366 • Published • 2
Merging
-
Experts Weights Averaging: A New General Training Scheme for Vision Transformers
Paper • 2308.06093 • Published • 2 -
Platypus: Quick, Cheap, and Powerful Refinement of LLMs
Paper • 2308.07317 • Published • 24 -
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers
Paper • 2211.11315 • Published • 1 -
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
Paper • 2307.13269 • Published • 32
Reranking
-
Natural Logic-guided Autoregressive Multi-hop Document Retrieval for Fact Verification
Paper • 2212.05276 • Published • 1 -
Hybrid and Collaborative Passage Reranking
Paper • 2305.09313 • Published • 1 -
Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual Retrieval
Paper • 2204.02292 • Published • 1 -
Discrete Prompt Optimization via Constrained Generation for Zero-shot Re-ranker
Paper • 2305.13729 • Published • 1
Memory
-
Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering
Paper • 2204.04581 • Published • 1 -
Retrieval-Augmented Multimodal Language Modeling
Paper • 2211.12561 • Published • 1 -
When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories
Paper • 2212.10511 • Published • 1 -
Memorizing Transformers
Paper • 2203.08913 • Published • 2
Summarization
-
Zero-Shot Cross-Lingual Summarization via Large Language Models
Paper • 2302.14229 • Published • 1 -
Towards Unifying Multi-Lingual and Cross-Lingual Summarization
Paper • 2305.09220 • Published • 1 -
GPT Self-Supervision for a Better Data Annotator
Paper • 2306.04349 • Published • 1 -
Few-shot training LLMs for project-specific code-summarization
Paper • 2207.04237 • Published • 1
Text classification
-
Comparison between parameter-efficient techniques and full fine-tuning: A case study on multilingual news article classification
Paper • 2308.07282 • Published • 1 -
PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation
Paper • 2310.14192 • Published • 2 -
Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers
Paper • 2103.14453 • Published • 1 -
Guiding Generative Language Models for Data Augmentation in Few-Shot Text Classification
Paper • 2111.09064 • Published • 1
Adversarial
-
LTD: Low Temperature Distillation for Robust Adversarial Training
Paper • 2111.02331 • Published • 1 -
Interpolated Adversarial Training: Achieving Robust Neural Networks without Sacrificing Too Much Accuracy
Paper • 1906.06784 • Published • 1 -
Pruning Adversarially Robust Neural Networks without Adversarial Examples
Paper • 2210.04311 • Published • 1 -
Mitigating the Accuracy-Robustness Trade-off via Multi-Teacher Adversarial Distillation
Paper • 2306.16170 • Published • 1
Distributed
-
A Unified View of Long-Sequence Models towards Modeling Million-Scale Dependencies
Paper • 2302.06218 • Published • 1 -
ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Paper • 2306.10209 • Published • 2 -
SE-MoE: A Scalable and Efficient Mixture-of-Experts Distributed Training and Inference System
Paper • 2205.10034 • Published • 1 -
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training
Paper • 2303.06318 • Published • 1
Blockwise
Concept
-
Concept-Oriented Deep Learning with Large Language Models
Paper • 2306.17089 • Published • 1 -
Extracting Mathematical Concepts with Large Language Models
Paper • 2309.00642 • Published • 1 -
An Image is Worth Multiple Words: Learning Object Level Concepts using Multi-Concept Prompt Learning
Paper • 2310.12274 • Published • 13 -
COPEN: Probing Conceptual Knowledge in Pre-trained Language Models
Paper • 2211.04079 • Published • 1
Modular
Positional embeddings
-
Cure the headache of Transformers via Collinear Constrained Attention
Paper • 2309.08646 • Published • 13 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 71 -
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
Paper • 2309.10400 • Published • 26 -
Dynamically Relative Position Encoding-Based Transformer for Automatic Code Edit
Paper • 2205.13522 • Published • 1
Embodied
-
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
Paper • 2309.10150 • Published • 25 -
Code as Policies: Language Model Programs for Embodied Control
Paper • 2209.07753 • Published • 1 -
Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling
Paper • 2402.10211 • Published • 14
Hyperparameters
Sampling
Batched decoding
Sentiment analysis
-
Chinese Fine-Grained Financial Sentiment Analysis with Large Language Models
Paper • 2306.14096 • Published • 1 -
Instruct-FinGPT: Financial Sentiment Analysis by Instruction Tuning of General-Purpose Large Language Models
Paper • 2306.12659 • Published • 1 -
Transforming Sentiment Analysis in the Financial Domain with ChatGPT
Paper • 2308.07935 • Published • 1 -
Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks
Paper • 2305.05862 • Published • 4
Bias
-
Soft-prompt Tuning for Large Language Models to Evaluate Bias
Paper • 2306.04735 • Published • 1 -
Mitigating Popularity Bias in Recommendation with Unbalanced Interactions: A Gradient Perspective
Paper • 2211.01154 • Published • 1 -
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning
Paper • 2212.08061 • Published • 1 -
Bias Assessment and Mitigation in LLM-based Code Generation
Paper • 2309.14345 • Published • 1
Paraphrase
-
LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning
Paper • 2305.18169 • Published • 1 -
Quick Starting Dialog Systems with Paraphrase Generation
Paper • 2204.02546 • Published • 1 -
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper • 2401.16380 • Published • 51
Validation
Document parsing
-
DSG: An End-to-End Document Structure Generator
Paper • 2310.09118 • Published • 2 -
OCR-free Document Understanding Transformer
Paper • 2111.15664 • Published • 3 -
DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents
Paper • 2304.12484 • Published • 1 -
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Paper • 2309.01131 • Published • 1
Reflection
-
A Zero-Shot Language Agent for Computer Control with Structured Reflection
Paper • 2310.08740 • Published • 16 -
ExpeL: LLM Agents Are Experiential Learners
Paper • 2308.10144 • Published • 3 -
Demystifying GPT Self-Repair for Code Generation
Paper • 2306.09896 • Published • 19 -
Large Language Models are Better Reasoners with Self-Verification
Paper • 2212.09561 • Published • 1
Clarify
Evolutionary Algorithms
Grammar
-
Evaluating the Capability of Large-scale Language Models on Chinese Grammatical Error Correction Task
Paper • 2307.03972 • Published • 1 -
GrammarGPT: Exploring Open-Source LLMs for Native Chinese Grammatical Error Correction with Supervised Fine-Tuning
Paper • 2307.13923 • Published • 1 -
Evaluating GPT-3.5 and GPT-4 on Grammatical Error Correction for Brazilian Portuguese
Paper • 2306.15788 • Published • 2 -
Are Pre-trained Language Models Useful for Model Ensemble in Chinese Grammatical Error Correction?
Paper • 2305.15183 • Published • 1
Mental health
Recommendation
-
A Bi-Step Grounding Paradigm for Large Language Models in Recommendation Systems
Paper • 2308.08434 • Published • 1 -
Large Language Models for Generative Recommendation: A Survey and Visionary Discussions
Paper • 2309.01157 • Published • 1 -
LLM-Rec: Personalized Recommendation via Prompting Large Language Models
Paper • 2307.15780 • Published • 27 -
Leveraging Large Language Models for Pre-trained Recommender Systems
Paper • 2308.10837 • Published • 1
ASR
-
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models
Paper • 2309.15701 • Published • 2 -
CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders
Paper • 2309.07707 • Published • 1 -
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Paper • 2311.00430 • Published • 58 -
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Paper • 2309.13876 • Published • 1
Multi task
-
Challenges and Opportunities of Using Transformer-Based Multi-Task Learning in NLP Through ML Lifecycle: A Survey
Paper • 2308.08234 • Published • 1 -
Understanding and Improving Information Transfer in Multi-Task Learning
Paper • 2005.00944 • Published • 1 -
Improving Multi-task Learning via Seeking Task-based Flat Regions
Paper • 2211.13723 • Published • 2 -
Improvable Gap Balancing for Multi-Task Learning
Paper • 2307.15429 • Published • 1
Time series
-
LLM4TS: Two-Stage Fine-Tuning for Time-Series Forecasting with Pre-Trained LLMs
Paper • 2308.08469 • Published • 2 -
TEST: Text Prototype Aligned Embedding to Activate LLM's Ability for Time Series
Paper • 2308.08241 • Published • 2 -
Are Large Language Models Temporally Grounded?
Paper • 2311.08398 • Published • 1 -
NL2TL: Transforming Natural Languages to Temporal Logics using Large Language Models
Paper • 2305.07766 • Published • 1
Hypernetwork
-
Magnitude Invariant Parametrizations Improve Hypernetwork Learning
Paper • 2304.07645 • Published • 1 -
HyperShot: Few-Shot Learning by Kernel HyperNetworks
Paper • 2203.11378 • Published • 1 -
Hypernetworks for Zero-shot Transfer in Reinforcement Learning
Paper • 2211.15457 • Published • 1 -
Continual Learning with Dependency Preserving Hypernetworks
Paper • 2209.07712 • Published • 1
No backprop
-
Fine-Tuning Language Models with Just Forward Passes
Paper • 2305.17333 • Published • 3 -
HyperTuning: Toward Adapting Large Language Models without Back-propagation
Paper • 2211.12485 • Published • 1 -
Is Complexity Required for Neural Network Pruning? A Case Study on Global Magnitude Pruning
Paper • 2209.14624 • Published • 1 -
Backpropagation-free Training of Deep Physical Neural Networks
Paper • 2304.11042 • Published • 1
Factuality
-
Fine-tuning Language Models for Factuality
Paper • 2311.08401 • Published • 30 -
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies
Paper • 2308.03188 • Published • 2 -
Trusted Source Alignment in Large Language Models
Paper • 2311.06697 • Published • 12 -
Long-form factuality in large language models
Paper • 2403.18802 • Published • 26
Ontology
Convolution
-
Trellis Networks for Sequence Modeling
Paper • 1810.06682 • Published • 1 -
Pruning Very Deep Neural Network Channels for Efficient Inference
Paper • 2211.08339 • Published • 1 -
LAPP: Layer Adaptive Progressive Pruning for Compressing CNNs from Scratch
Paper • 2309.14157 • Published • 1 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 143
NoPE
-
The Impact of Positional Encoding on Length Generalization in Transformers
Paper • 2305.19466 • Published • 2 -
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2 -
Position Prediction as an Effective Pretraining Strategy
Paper • 2207.07611 • Published • 1 -
Transformer Language Models without Positional Encodings Still Learn Positional Information
Paper • 2203.16634 • Published • 5
Activation
-
Activation Functions in Deep Learning: A Comprehensive Survey and Benchmark
Paper • 2109.14545 • Published • 1 -
Adaptive Activation-based Structured Pruning
Paper • 2201.10520 • Published • 1 -
Learning Activation Functions for Sparse Neural Networks
Paper • 2305.10964 • Published • 1 -
Exploiting Transformer Activation Sparsity with Dynamic Inference
Paper • 2310.04361 • Published • 1
Relative PE
Diffusion
-
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Paper • 2312.04410 • Published • 15 -
Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling
Paper • 2310.06389 • Published • 1 -
Diffusion Model Alignment Using Direct Preference Optimization
Paper • 2311.12908 • Published • 50 -
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models
Paper • 2305.13655 • Published • 7
Recursive
-
Modeling Hierarchical Structures with Continuous Recursive Neural Networks
Paper • 2106.06038 • Published • 1 -
RNNs of RNNs: Recursive Construction of Stable Assemblies of Recurrent Neural Networks
Paper • 2106.08928 • Published • 1 -
Sliced Recursive Transformer
Paper • 2111.05297 • Published • 1 -
Byte-Level Recursive Convolutional Auto-Encoder for Text
Paper • 1802.01817 • Published
Token dropping
SVG
-
Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding
Paper • 2306.06094 • Published • 1 -
IconShop: Text-Guided Vector Icon Synthesis with Autoregressive Transformers
Paper • 2304.14400 • Published • 4 -
VecFusion: Vector Font Generation with Diffusion
Paper • 2312.10540 • Published • 22 -
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis
Paper • 2401.17093 • Published • 21
Document
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper • 2401.02823 • Published • 37 -
TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents
Paper • 2312.01279 • Published • 6 -
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Paper • 2311.07575 • Published • 15
Evaluation
-
Fusion-Eval: Integrating Evaluators with LLMs
Paper • 2311.09204 • Published • 6 -
Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer
Paper • 2311.06720 • Published • 9 -
Safurai 001: New Qualitative Approach for Code LLM Evaluation
Paper • 2309.11385 • Published • 2 -
Assessment of Pre-Trained Models Across Languages and Grammars
Paper • 2309.11165 • Published • 1
Fashion
Vocoder
Analogy
SSM
-
StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization
Paper • 2311.14495 • Published • 1 -
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Paper • 2401.09417 • Published • 62 -
SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation
Paper • 2401.13560 • Published • 1 -
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces
Paper • 2402.00789 • Published • 2
Reparameterization
Hypercomplex
-
Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with 1/n Parameters
Paper • 2102.08597 • Published • 1 -
PHNNs: Lightweight Neural Networks via Parameterized Hypercomplex Convolutions
Paper • 2110.04176 • Published • 1 -
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
Paper • 2106.04647 • Published • 1
Random
-
Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep Neural Network, a Survey
Paper • 2205.08099 • Published • 1 -
Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs
Paper • 2003.00152 • Published • 1 -
OpenRAND: A Performance Portable, Reproducible Random Number Generation Library for Parallel Computations
Paper • 2310.19925 • Published • 1 -
Squares: A Fast Counter-Based RNG
Paper • 2004.06278 • Published • 1
Byte-level
-
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Paper • 2105.13626 • Published • 3 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 54 -
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Paper • 2305.07185 • Published • 9 -
Byte-Level Recursive Convolutional Auto-Encoder for Text
Paper • 1802.01817 • Published
Similarity search
Education
Multimodal
-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 17 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 27 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 9 -
Conditional Diffusion Distillation
Paper • 2310.01407 • Published • 20
ICL
-
Dissecting In-Context Learning of Translations in GPTs
Paper • 2310.15987 • Published • 6 -
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 43 -
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Paper • 2202.07922 • Published • 1 -
Promptor: A Conversational and Autonomous Prompt Generation Agent for Intelligent Text Entry Techniques
Paper • 2310.08101 • Published • 2
Context compression
-
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 43 -
When can transformers reason with abstract symbols?
Paper • 2310.09753 • Published • 4 -
Improving Length-Generalization in Transformers via Task Hinting
Paper • 2310.00726 • Published • 1 -
In-context Autoencoder for Context Compression in a Large Language Model
Paper • 2307.06945 • Published • 28
Benchmark
-
KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval
Paper • 2310.15511 • Published • 5 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 27 -
SmartPlay : A Benchmark for LLMs as Intelligent Agents
Paper • 2310.01557 • Published • 13 -
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
Paper • 2310.03214 • Published • 20
RL/Alignment
-
Moral Foundations of Large Language Models
Paper • 2310.15337 • Published • 1 -
Specific versus General Principles for Constitutional AI
Paper • 2310.13798 • Published • 3 -
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper • 2310.13639 • Published • 25 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 50
Dataset generation
-
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper • 2310.13961 • Published • 5 -
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Paper • 2202.07922 • Published • 1 -
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Paper • 2310.13671 • Published • 19 -
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Paper • 2309.09582 • Published • 4
Ensemble
-
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper • 2310.13961 • Published • 5 -
Diversity of Thought Improves Reasoning Abilities of Large Language Models
Paper • 2310.07088 • Published • 5 -
AutoMix: Automatically Mixing Language Models
Paper • 2310.12963 • Published • 14 -
SAI: Solving AI Tasks with Systematic Artificial Intelligence in Communication Network
Paper • 2310.09049 • Published • 1
Planning
-
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Paper • 2310.15123 • Published • 8 -
Diversity of Thought Improves Reasoning Abilities of Large Language Models
Paper • 2310.07088 • Published • 5 -
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search
Paper • 2310.13227 • Published • 13 -
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 10
Contrastive
-
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Paper • 2310.13671 • Published • 19 -
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper • 2310.13639 • Published • 25 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 9 -
Ranking LLM-Generated Loop Invariants for Program Verification
Paper • 2310.09342 • Published • 4
Prompt
-
Diversity of Thought Improves Reasoning Abilities of Large Language Models
Paper • 2310.07088 • Published • 5 -
Reverse Chain: A Generic-Rule for LLMs to Master Multi-API Planning
Paper • 2310.04474 • Published • 2 -
Promptor: A Conversational and Autonomous Prompt Generation Agent for Intelligent Text Entry Techniques
Paper • 2310.08101 • Published • 2 -
Instance Needs More Care: Rewriting Prompts for Instances Yields Better Zero-Shot Performance
Paper • 2310.02107 • Published • 3
Knowledge distillation
-
Democratizing Reasoning Ability: Tailored Learning from Large Language Model
Paper • 2310.13332 • Published • 16 -
Teaching Language Models to Self-Improve through Interactive Demonstrations
Paper • 2310.13522 • Published • 12 -
Self-Convinced Prompting: Few-Shot Question Answering with Repeated Introspection
Paper • 2310.05035 • Published • 1 -
Tuna: Instruction Tuning using Feedback from Large Language Models
Paper • 2310.13385 • Published • 10
Dataset pruning/cleaning/dedup
-
AlpaGasus: Training A Better Alpaca with Fewer Data
Paper • 2307.08701 • Published • 23 -
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Paper • 2303.03915 • Published • 7 -
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 24 -
SlimPajama-DC: Understanding Data Combinations for LLM Training
Paper • 2309.10818 • Published • 11
Music
-
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing
Paper • 2310.12404 • Published • 15 -
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
Paper • 2310.11954 • Published • 25 -
A Survey of AI Music Generation Tools and Models
Paper • 2308.12982 • Published • 1 -
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper • 2310.00704 • Published • 21
Math
-
KwaiYiiMath: Technical Report
Paper • 2310.07488 • Published • 2 -
Forward-Backward Reasoning in Large Language Models for Mathematical Verification
Paper • 2308.07758 • Published • 4 -
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning
Paper • 2309.10814 • Published • 3 -
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
Paper • 2310.03731 • Published • 29
Autoencoder
-
Sparse Autoencoders Find Highly Interpretable Features in Language Models
Paper • 2309.08600 • Published • 15 -
In-context Autoencoder for Context Compression in a Large Language Model
Paper • 2307.06945 • Published • 28 -
Self-slimmed Vision Transformer
Paper • 2111.12624 • Published • 1 -
MEMORY-VQ: Compression for Tractable Internet-Scale Memory
Paper • 2308.14903 • Published • 1
Science
-
Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge Distillation in Small Models for Scientific QA
Paper • 2308.04679 • Published • 1 -
CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization
Paper • 2310.10134 • Published • 1 -
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
Paper • 2309.06126 • Published • 17 -
Large Language Model for Science: A Study on P vs. NP
Paper • 2309.05689 • Published • 21
PEFT
-
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Paper • 2310.08659 • Published • 28 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 44 -
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers
Paper • 2309.16119 • Published • 1 -
LoRA ensembles for large language model fine-tuning
Paper • 2310.00035 • Published • 2
Early stopping
-
DARTS+: Improved Differentiable Architecture Search with Early Stopping
Paper • 1909.06035 • Published • 1 -
Confident Adaptive Language Modeling
Paper • 2207.07061 • Published • 1 -
COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency with Slenderized Multi-exit Language Models
Paper • 2210.15523 • Published • 1 -
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding
Paper • 2310.05424 • Published • 1
Audio
-
Large-Scale Automatic Audiobook Creation
Paper • 2309.03926 • Published • 54 -
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper • 2310.00704 • Published • 21 -
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts
Paper • 2309.11977 • Published • 2 -
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
Paper • 2308.16692 • Published • 1
Attention
-
Efficient Memory Management for Large Language Model Serving with PagedAttention
Paper • 2309.06180 • Published • 25 -
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
Paper • 2308.16137 • Published • 40 -
Scaling Transformer to 1M tokens and beyond with RMT
Paper • 2304.11062 • Published • 3 -
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 20
Optimal transport
Softmax
-
Replacing softmax with ReLU in Vision Transformers
Paper • 2309.08586 • Published • 17 -
Softmax Bias Correction for Quantized Generative Models
Paper • 2309.01729 • Published • 1 -
The Closeness of In-Context Learning and Weight Shifting for Softmax Regression
Paper • 2304.13276 • Published • 1 -
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
Paper • 2306.12929 • Published • 12
Hyena
-
Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
Paper • 2310.12109 • Published • 1 -
Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions
Paper • 2310.18780 • Published • 3 -
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Paper • 2311.05908 • Published • 16 -
Multi-Dimensional Hyena for Spatial Inductive Bias
Paper • 2309.13600 • Published • 1
Inference
-
S^{3}: Increasing GPU Utilization during Generative Inference for Higher Throughput
Paper • 2306.06000 • Published • 1 -
Fast Distributed Inference Serving for Large Language Models
Paper • 2305.05920 • Published • 1 -
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Paper • 2305.13144 • Published • 1 -
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Paper • 2303.06182 • Published • 1
Weight averaging
-
Experts Weights Averaging: A New General Training Scheme for Vision Transformers
Paper • 2308.06093 • Published • 2 -
Weight Averaging Improves Knowledge Distillation under Domain Shift
Paper • 2309.11446 • Published • 1 -
SWAMP: Sparse Weight Averaging with Multiple Particles for Iterative Magnitude Pruning
Paper • 2305.14852 • Published • 1 -
Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging
Paper • 2306.16788 • Published • 1
Knowledge graph
-
Knowledge Solver: Teaching LLMs to Search for Domain Knowledge from Knowledge Graphs
Paper • 2309.03118 • Published • 2 -
Head-to-Tail: How Knowledgeable are Large Language Models (LLM)? A.K.A. Will LLMs Replace Knowledge Graphs?
Paper • 2308.10168 • Published • 2 -
MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models
Paper • 2308.09729 • Published • 5 -
Symbolic Knowledge Distillation: from General Language Models to Commonsense Models
Paper • 2110.07178 • Published • 1
Question answering
-
Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering
Paper • 2204.04581 • Published • 1 -
Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions about Code
Paper • 2303.08033 • Published • 1 -
CAR: Conceptualization-Augmented Reasoner for Zero-Shot Commonsense Question Answering
Paper • 2305.14869 • Published • 1 -
Multi-hop Commonsense Knowledge Injection Framework for Zero-Shot Commonsense Question Answering
Paper • 2305.05936 • Published • 1
Multiple choice Q&A
Relationship extraction
-
Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot Relation Extractors
Paper • 2305.11159 • Published • 1 -
CodeIE: Large Code Generation Models are Better Few-Shot Information Extractors
Paper • 2305.05711 • Published • 1 -
Improving Continual Relation Extraction through Prototypical Contrastive Learning
Paper • 2210.04513 • Published • 1
Reversible
Semantic segmentation
-
TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation
Paper • 2202.13393 • Published • 1 -
BPKD: Boundary Privileged Knowledge Distillation For Semantic Segmentation
Paper • 2306.08075 • Published • 1 -
Semantic Consistency for Assuring Reliability of Large Language Models
Paper • 2308.09138 • Published • 2
Backpropagation
-
Sparse Backpropagation for MoE Training
Paper • 2310.00811 • Published • 2 -
The Forward-Forward Algorithm: Some Preliminary Investigations
Paper • 2212.13345 • Published • 2 -
Fine-Tuning Language Models with Just Forward Passes
Paper • 2305.17333 • Published • 3 -
Towards Green AI in Fine-tuning Large Language Models via Adaptive Backpropagation
Paper • 2309.13192 • Published • 1
Text diffusion
-
SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control
Paper • 2210.17432 • Published • 1 -
TESS: Text-to-Text Self-Conditioned Simplex Diffusion
Paper • 2305.08379 • Published • 3 -
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Paper • 2308.12219 • Published • 1 -
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper • 2310.17680 • Published • 73
Image editing
FFN/MLP
-
Scaling MLPs: A Tale of Inductive Bias
Paper • 2306.13575 • Published • 15 -
Trap of Feature Diversity in the Learning of MLPs
Paper • 2112.00980 • Published • 2 -
Understanding the Spectral Bias of Coordinate Based MLPs Via Training Dynamics
Paper • 2301.05816 • Published • 1 -
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?
Paper • 2108.04384 • Published • 1
Federated learning
-
Decentralized Policy Optimization
Paper • 2211.03032 • Published • 1 -
MPCFormer: fast, performant and private Transformer inference with MPC
Paper • 2211.01452 • Published • 1 -
Distributed Pruning Towards Tiny Neural Networks in Federated Learning
Paper • 2212.01977 • Published • 1 -
FedDIP: Federated Learning with Extreme Dynamic Pruning and Incremental Regularization
Paper • 2309.06805 • Published • 1
Embeddings
-
Towards General Text Embeddings with Multi-stage Contrastive Learning
Paper • 2308.03281 • Published • 2 -
NEFTune: Noisy Embeddings Improve Instruction Finetuning
Paper • 2310.05914 • Published • 14 -
EELBERT: Tiny Models through Dynamic Embeddings
Paper • 2310.20144 • Published • 3 -
Dynamic Word Embeddings for Evolving Semantic Discovery
Paper • 1703.00607 • Published • 1
Structured data
-
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?
Paper • 2309.08963 • Published • 11 -
DSG: An End-to-End Document Structure Generator
Paper • 2310.09118 • Published • 2 -
Integrating Graphs with Large Language Models: Methods and Prospects
Paper • 2310.05499 • Published • 2 -
Schema-learning and rebinding as mechanisms of in-context learning and emergence
Paper • 2307.01201 • Published • 2
Constrained decoding
-
Terminology-Aware Translation with Constrained Decoding and Large Language Model Prompting
Paper • 2310.05824 • Published • 1 -
Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation
Paper • 1804.06609 • Published • 1 -
Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search
Paper • 1704.07138 • Published • 1 -
Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning
Paper • 2305.13971 • Published • 4
Finance
-
Chinese Fine-Grained Financial Sentiment Analysis with Large Language Models
Paper • 2306.14096 • Published • 1 -
Instruct-FinGPT: Financial Sentiment Analysis by Instruction Tuning of General-Purpose Large Language Models
Paper • 2306.12659 • Published • 1 -
Transforming Sentiment Analysis in the Financial Domain with ChatGPT
Paper • 2308.07935 • Published • 1 -
Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks
Paper • 2305.05862 • Published • 4
Named Entity Recognition (NER)
-
Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks
Paper • 2305.05862 • Published • 4 -
CodeIE: Large Code Generation Models are Better Few-Shot Information Extractors
Paper • 2305.05711 • Published • 1 -
ProKD: An Unsupervised Prototypical Knowledge Distillation Network for Zero-Resource Cross-Lingual Named Entity Recognition
Paper • 2301.08855 • Published • 1 -
Model-Agnostic Syntactical Information for Pre-Trained Programming Language Models
Paper • 2303.06233 • Published • 1
Meta-learning
-
Self-supervised Meta-Prompt Learning with Meta-Gradient Regularization for Few-shot Generalization
Paper • 2303.12314 • Published • 1 -
Augmented Large Language Models with Parametric Knowledge Guiding
Paper • 2305.04757 • Published • 2 -
Learning to Retrieve In-Context Examples for Large Language Models
Paper • 2307.07164 • Published • 22 -
LiST: Lite Prompted Self-training Makes Parameter-Efficient Few-shot Learners
Paper • 2110.06274 • Published • 1
Annotation
-
Automated Annotation with Generative AI Requires Validation
Paper • 2306.00176 • Published • 1 -
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Paper • 2309.09582 • Published • 4 -
PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation
Paper • 2310.14192 • Published • 2 -
ICLEF: In-Context Learning with Expert Feedback for Explainable Style Transfer
Paper • 2309.08583 • Published • 1
Privacy
-
Privacy-Preserving Prompt Tuning for Large Language Model Services
Paper • 2305.06212 • Published • 1 -
Privately Fine-Tuning Large Language Models with Differential Privacy
Paper • 2210.15042 • Published • 1 -
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale
Paper • 2207.09078 • Published • 1
LLM architecture
-
The Impact of Depth and Width on Transformer Language Model Generalization
Paper • 2310.19956 • Published • 10 -
Retentive Network: A Successor to Transformer for Large Language Models
Paper • 2307.08621 • Published • 171 -
RWKV: Reinventing RNNs for the Transformer Era
Paper • 2305.13048 • Published • 19 -
Attention Is All You Need
Paper • 1706.03762 • Published • 69
Text editing/revision
-
CoEdIT: Text Editing by Task-Specific Instruction Tuning
Paper • 2305.09857 • Published • 7 -
Reducing Sequence Length by Predicting Edit Operations with Large Language Models
Paper • 2305.11862 • Published • 1 -
Controlled Generation with Prompt Insertion for Natural Language Explanations in Grammatical Error Correction
Paper • 2309.11439 • Published • 1 -
Program Merge Conflict Resolution via Neural Transformers
Paper • 2109.00084 • Published • 1
RoPE
-
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
Paper • 2309.10400 • Published • 26 -
CONFLATOR: Incorporating Switching Point based Rotatory Positional Encodings for Code-Mixed Language Modeling
Paper • 2309.05270 • Published • 1 -
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 117
Survey
-
Towards an Understanding of Large Language Models in Software Engineering Tasks
Paper • 2308.11396 • Published • 1 -
Several categories of Large Language Models (LLMs): A Short Survey
Paper • 2307.10188 • Published • 1 -
Large Language Models for Generative Recommendation: A Survey and Visionary Discussions
Paper • 2309.01157 • Published • 1 -
A Survey on Large Language Models for Recommendation
Paper • 2305.19860 • Published • 1
Dataset curation
-
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning
Paper • 2308.12032 • Published • 1 -
Know thy corpus! Robust methods for digital curation of Web corpora
Paper • 2003.06389 • Published • 1 -
Self-Alignment with Instruction Backtranslation
Paper • 2308.06259 • Published • 42 -
The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation
Paper • 2305.06156 • Published • 2
Grounding
-
A Bi-Step Grounding Paradigm for Large Language Models in Recommendation Systems
Paper • 2308.08434 • Published • 1 -
Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
Paper • 2302.02662 • Published • 1 -
Self-driven Grounding: Large Language Model Agents with Automatical Language-aligned Skill Learning
Paper • 2309.01352 • Published • 1 -
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies
Paper • 2308.03188 • Published • 2
Data processing
Interpretability
-
A technical note on bilinear layers for interpretability
Paper • 2305.03452 • Published • 1 -
Interpreting Transformer's Attention Dynamic Memory and Visualizing the Semantic Information Flow of GPT
Paper • 2305.13417 • Published • 1 -
Explainable AI for Pre-Trained Code Models: What Do They Learn? When They Do Not Work?
Paper • 2211.12821 • Published • 2 -
The Linear Representation Hypothesis and the Geometry of Large Language Models
Paper • 2311.03658 • Published • 1
Optimizer
-
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model
Paper • 2305.15265 • Published • 1 -
Mesa: A Memory-saving Training Framework for Transformers
Paper • 2111.11124 • Published • 1 -
Full Parameter Fine-tuning for Large Language Models with Limited Resources
Paper • 2306.09782 • Published • 30 -
Layered gradient accumulation and modular pipeline parallelism: fast and efficient training of large language models
Paper • 2106.02679 • Published • 1
Tree search
-
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models
Paper • 2310.08582 • Published • 2 -
Autonomous Tree-search Ability of Large Language Models
Paper • 2310.10686 • Published • 2 -
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 10 -
PathFinder: Guided Search over Multi-Step Reasoning Paths
Paper • 2312.05180 • Published • 10
Data augmentation
-
DualMix: Unleashing the Potential of Data Augmentation for Online Class-Incremental Learning
Paper • 2303.07864 • Published • 1 -
Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks
Paper • 2305.13547 • Published • 1 -
MixPro: Simple yet Effective Data Augmentation for Prompt-based Learning
Paper • 2304.09402 • Published • 2 -
LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning
Paper • 2305.18169 • Published • 1
Regularization
-
HyperSparse Neural Networks: Shifting Exploration to Exploitation through Adaptive Regularization
Paper • 2308.07163 • Published • 1 -
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
Paper • 2301.09554 • Published • 1 -
Weight Compander: A Simple Weight Reparameterization for Regularization
Paper • 2306.16993 • Published • 1 -
FedDIP: Federated Learning with Extreme Dynamic Pruning and Incremental Regularization
Paper • 2309.06805 • Published • 1
Uncertainty
-
R-Tuning: Teaching Large Language Models to Refuse Unknown Questions
Paper • 2311.09677 • Published • 3 -
Look Before You Leap: An Exploratory Study of Uncertainty Measurement for Large Language Models
Paper • 2307.10236 • Published • 1 -
Shifting Attention to Relevance: Towards the Uncertainty Estimation of Large Language Models
Paper • 2307.01379 • Published • 1 -
Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models
Paper • 2305.19187 • Published • 1
RNN
-
Trellis Networks for Sequence Modeling
Paper • 1810.06682 • Published • 1 -
ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-like Language Models
Paper • 2311.01981 • Published • 1 -
Gated recurrent neural networks discover attention
Paper • 2309.01775 • Published • 10 -
Inverse Approximation Theory for Nonlinear Recurrent Neural Networks
Paper • 2305.19190 • Published • 1
VAE
-
Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach
Paper • 2310.12004 • Published • 2 -
Concept-free Causal Disentanglement with Variational Graph Auto-Encoder
Paper • 2311.10638 • Published • 2 -
Tokenization with Factorized Subword Encoding
Paper • 2306.07764 • Published • 1 -
Mixture-of-experts VAEs can disregard variation in surjective multimodal data
Paper • 2204.05229 • Published • 1
Hebbian
Initialization
-
Pruning at Initialization -- A Sketching Perspective
Paper • 2305.17559 • Published • 1 -
The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training
Paper • 2202.02643 • Published • 1 -
Why Random Pruning Is All We Need to Start Sparse
Paper • 2210.02412 • Published • 1 -
MixtureGrowth: Growing Neural Networks by Recombining Learned Parameters
Paper • 2311.04251 • Published • 1
Approximation
-
Linear Self-Attention Approximation via Trainable Feedforward Kernel
Paper • 2211.04076 • Published • 1 -
Greenformer: Factorization Toolkit for Efficient Deep Neural Networks
Paper • 2109.06762 • Published • 1 -
COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models
Paper • 2305.17235 • Published • 2 -
Exploring Low Rank Training of Deep Neural Networks
Paper • 2209.13569 • Published • 1
Normalization
-
Unified Normalization for Accelerating and Stabilizing Transformers
Paper • 2208.01313 • Published • 1 -
Interpret Vision Transformers as ConvNets with Dynamic Convolutions
Paper • 2309.10713 • Published • 1 -
Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs
Paper • 2003.00152 • Published • 1 -
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
Paper • 2405.11582 • Published • 18
Conditional
Legal
Special Tokens
Confidence
-
Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation
Paper • 2311.08877 • Published • 7 -
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
Paper • 2305.14975 • Published • 2 -
Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language Models
Paper • 2305.13712 • Published • 2
Emotion
Mamba
-
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Paper • 2401.09417 • Published • 62 -
VMamba: Visual State Space Model
Paper • 2401.10166 • Published • 40 -
SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation
Paper • 2401.13560 • Published • 1 -
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces
Paper • 2402.00789 • Published • 2
Phrase
Explanation
Medical
-
SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation
Paper • 2401.13560 • Published • 1 -
Vivim: a Video Vision Mamba for Medical Video Object Segmentation
Paper • 2401.14168 • Published • 2 -
From Beginner to Expert: Modeling Medical Knowledge into General LLMs
Paper • 2312.01040 • Published • 1
Grokking
Literature review
GNN
Vector DB
Neuromorphic
Perceiver
Multimodal
-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 17 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 27 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 9 -
Conditional Diffusion Distillation
Paper • 2310.01407 • Published • 20
Hallucination
-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 17 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 27 -
MAF: Multi-Aspect Feedback for Improving Reasoning in Large Language Models
Paper • 2310.12426 • Published • 1 -
Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration
Paper • 2310.00280 • Published • 3
ICL
-
Dissecting In-Context Learning of Translations in GPTs
Paper • 2310.15987 • Published • 6 -
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 43 -
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Paper • 2202.07922 • Published • 1 -
Promptor: A Conversational and Autonomous Prompt Generation Agent for Intelligent Text Entry Techniques
Paper • 2310.08101 • Published • 2
Multilingual
-
Dissecting In-Context Learning of Translations in GPTs
Paper • 2310.15987 • Published • 6 -
Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca
Paper • 2309.08958 • Published • 2 -
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Paper • 2305.04160 • Published • 2 -
Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning
Paper • 2310.08166 • Published • 1
Context compression
-
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 43 -
When can transformers reason with abstract symbols?
Paper • 2310.09753 • Published • 4 -
Improving Length-Generalization in Transformers via Task Hinting
Paper • 2310.00726 • Published • 1 -
In-context Autoencoder for Context Compression in a Large Language Model
Paper • 2307.06945 • Published • 28
RAG
-
KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval
Paper • 2310.15511 • Published • 5 -
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search
Paper • 2310.13227 • Published • 13 -
Reverse Chain: A Generic-Rule for LLMs to Master Multi-API Planning
Paper • 2310.04474 • Published • 2 -
AgentTuning: Enabling Generalized Agent Abilities for LLMs
Paper • 2310.12823 • Published • 36
Benchmark
-
KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval
Paper • 2310.15511 • Published • 5 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 27 -
SmartPlay : A Benchmark for LLMs as Intelligent Agents
Paper • 2310.01557 • Published • 13 -
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
Paper • 2310.03214 • Published • 20
Long context
-
TRAMS: Training-free Memory Selection for Long-range Language Modeling
Paper • 2310.15494 • Published • 2 -
A Long Way to Go: Investigating Length Correlations in RLHF
Paper • 2310.03716 • Published • 10 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 71 -
Giraffe: Adventures in Expanding Context Lengths in LLMs
Paper • 2308.10882 • Published • 1
RL/Alignment
-
Moral Foundations of Large Language Models
Paper • 2310.15337 • Published • 1 -
Specific versus General Principles for Constitutional AI
Paper • 2310.13798 • Published • 3 -
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper • 2310.13639 • Published • 25 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 50
Shared params
-
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 43 -
SortedNet, a Place for Every Network and Every Network in its Place: Towards a Generalized Solution for Training Many-in-One Neural Networks
Paper • 2309.00255 • Published • 1 -
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)
Paper • 2309.08968 • Published • 23 -
Matryoshka Representation Learning
Paper • 2205.13147 • Published • 16
Dataset generation
-
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper • 2310.13961 • Published • 5 -
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Paper • 2202.07922 • Published • 1 -
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Paper • 2310.13671 • Published • 19 -
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Paper • 2309.09582 • Published • 4
Instruct
-
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper • 2310.13961 • Published • 5 -
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Paper • 2309.09582 • Published • 4 -
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models
Paper • 2310.13127 • Published • 12 -
Evaluating the Robustness to Instructions of Large Language Models
Paper • 2308.14306 • Published • 1
Ensemble
-
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper • 2310.13961 • Published • 5 -
Diversity of Thought Improves Reasoning Abilities of Large Language Models
Paper • 2310.07088 • Published • 5 -
AutoMix: Automatically Mixing Language Models
Paper • 2310.12963 • Published • 14 -
SAI: Solving AI Tasks with Systematic Artificial Intelligence in Communication Network
Paper • 2310.09049 • Published • 1
CoT
-
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Paper • 2310.15123 • Published • 8 -
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Paper • 2310.13671 • Published • 19 -
Self-Convinced Prompting: Few-Shot Question Answering with Repeated Introspection
Paper • 2310.05035 • Published • 1 -
Chain-of-Thought Reasoning is a Policy Improvement Operator
Paper • 2309.08589 • Published • 2
Planning
-
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Paper • 2310.15123 • Published • 8 -
Diversity of Thought Improves Reasoning Abilities of Large Language Models
Paper • 2310.07088 • Published • 5 -
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search
Paper • 2310.13227 • Published • 13 -
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 10
Agent
-
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Paper • 2310.15123 • Published • 8 -
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search
Paper • 2310.13227 • Published • 13 -
LASER: LLM Agent with State-Space Exploration for Web Navigation
Paper • 2309.08172 • Published • 13 -
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 10
Contrastive
-
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Paper • 2310.13671 • Published • 19 -
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper • 2310.13639 • Published • 25 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 9 -
Ranking LLM-Generated Loop Invariants for Program Verification
Paper • 2310.09342 • Published • 4
Reasoning
-
Ada-Instruct: Adapting Instruction Generators for Complex Reasoning
Paper • 2310.04484 • Published • 5 -
Diversity of Thought Improves Reasoning Abilities of Large Language Models
Paper • 2310.07088 • Published • 5 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 79 -
Democratizing Reasoning Ability: Tailored Learning from Large Language Model
Paper • 2310.13332 • Published • 16
Prompt
-
Diversity of Thought Improves Reasoning Abilities of Large Language Models
Paper • 2310.07088 • Published • 5 -
Reverse Chain: A Generic-Rule for LLMs to Master Multi-API Planning
Paper • 2310.04474 • Published • 2 -
Promptor: A Conversational and Autonomous Prompt Generation Agent for Intelligent Text Entry Techniques
Paper • 2310.08101 • Published • 2 -
Instance Needs More Care: Rewriting Prompts for Instances Yields Better Zero-Shot Performance
Paper • 2310.02107 • Published • 3
Datasets
-
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 1 -
Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering
Paper • 2308.13259 • Published • 2 -
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning
Paper • 2309.05653 • Published • 10 -
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Paper • 2309.12284 • Published • 18
Knowledge distillation
-
Democratizing Reasoning Ability: Tailored Learning from Large Language Model
Paper • 2310.13332 • Published • 16 -
Teaching Language Models to Self-Improve through Interactive Demonstrations
Paper • 2310.13522 • Published • 12 -
Self-Convinced Prompting: Few-Shot Question Answering with Repeated Introspection
Paper • 2310.05035 • Published • 1 -
Tuna: Instruction Tuning using Feedback from Large Language Models
Paper • 2310.13385 • Published • 10
Coding
-
Creative Robot Tool Use with Large Language Models
Paper • 2310.13065 • Published • 9 -
CodeCoT and Beyond: Learning to Program and Test like a Developer
Paper • 2308.08784 • Published • 5 -
Lemur: Harmonizing Natural Language and Code for Language Agents
Paper • 2310.06830 • Published • 34 -
CodePlan: Repository-level Coding using LLMs and Planning
Paper • 2309.12499 • Published • 78
Dataset pruning/cleaning/dedup
-
AlpaGasus: Training A Better Alpaca with Fewer Data
Paper • 2307.08701 • Published • 23 -
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Paper • 2303.03915 • Published • 7 -
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 24 -
SlimPajama-DC: Understanding Data Combinations for LLM Training
Paper • 2309.10818 • Published • 11
Speculative
-
AutoMix: Automatically Mixing Language Models
Paper • 2310.12963 • Published • 14 -
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning
Paper • 2310.03094 • Published • 13 -
MatFormer: Nested Transformer for Elastic Inference
Paper • 2310.07707 • Published • 2 -
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Paper • 2310.08461 • Published • 1
Music
-
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing
Paper • 2310.12404 • Published • 15 -
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
Paper • 2310.11954 • Published • 25 -
A Survey of AI Music Generation Tools and Models
Paper • 2308.12982 • Published • 1 -
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper • 2310.00704 • Published • 21
AutoML/NAS
-
AutoML-GPT: Large Language Model for AutoML
Paper • 2309.01125 • Published • 1 -
SAI: Solving AI Tasks with Systematic Artificial Intelligence in Communication Network
Paper • 2310.09049 • Published • 1 -
Prompt2Model: Generating Deployable Models from Natural Language Instructions
Paper • 2308.12261 • Published • 1 -
LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization
Paper • 2306.01102 • Published • 1
Math
-
KwaiYiiMath: Technical Report
Paper • 2310.07488 • Published • 2 -
Forward-Backward Reasoning in Large Language Models for Mathematical Verification
Paper • 2308.07758 • Published • 4 -
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning
Paper • 2309.10814 • Published • 3 -
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
Paper • 2310.03731 • Published • 29
Tabular
-
Effective Distillation of Table-based Reasoning Ability from LLMs
Paper • 2309.13182 • Published • 1 -
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 41 -
Tab-CoT: Zero-shot Tabular Chain of Thought
Paper • 2305.17812 • Published • 2 -
GitTables: A Large-Scale Corpus of Relational Tables
Paper • 2106.07258 • Published • 1
Autoencoder
-
Sparse Autoencoders Find Highly Interpretable Features in Language Models
Paper • 2309.08600 • Published • 15 -
In-context Autoencoder for Context Compression in a Large Language Model
Paper • 2307.06945 • Published • 28 -
Self-slimmed Vision Transformer
Paper • 2111.12624 • Published • 1 -
MEMORY-VQ: Compression for Tractable Internet-Scale Memory
Paper • 2308.14903 • Published • 1
Writing
-
EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation
Paper • 2310.08185 • Published • 8 -
GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of Evidence
Paper • 2310.05388 • Published • 4 -
PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers
Paper • 2311.09180 • Published • 8 -
Weaver: Foundation Models for Creative Writing
Paper • 2401.17268 • Published • 45
Science
-
Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge Distillation in Small Models for Scientific QA
Paper • 2308.04679 • Published • 1 -
CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization
Paper • 2310.10134 • Published • 1 -
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
Paper • 2309.06126 • Published • 17 -
Large Language Model for Science: A Study on P vs. NP
Paper • 2309.05689 • Published • 21
Quantization
-
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Paper • 2310.08659 • Published • 28 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 44 -
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models
Paper • 2309.02784 • Published • 2 -
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers
Paper • 2309.16119 • Published • 1
PEFT
-
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Paper • 2310.08659 • Published • 28 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 44 -
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers
Paper • 2309.16119 • Published • 1 -
LoRA ensembles for large language model fine-tuning
Paper • 2310.00035 • Published • 2
Continual learning
-
CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization
Paper • 2310.10134 • Published • 1 -
TiC-CLIP: Continual Training of CLIP Models
Paper • 2310.16226 • Published • 9 -
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Paper • 2310.10638 • Published • 30 -
Controlled Decoding from Language Models
Paper • 2310.17022 • Published • 15
Early stopping
-
DARTS+: Improved Differentiable Architecture Search with Early Stopping
Paper • 1909.06035 • Published • 1 -
Confident Adaptive Language Modeling
Paper • 2207.07061 • Published • 1 -
COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency with Slenderized Multi-exit Language Models
Paper • 2210.15523 • Published • 1 -
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding
Paper • 2310.05424 • Published • 1
Tokenizer
-
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Paper • 2310.05737 • Published • 4 -
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
Paper • 2308.16692 • Published • 1 -
Towards General Text Embeddings with Multi-stage Contrastive Learning
Paper • 2308.03281 • Published • 2 -
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings
Paper • 2305.11554 • Published • 2
Audio
-
Large-Scale Automatic Audiobook Creation
Paper • 2309.03926 • Published • 54 -
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper • 2310.00704 • Published • 21 -
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts
Paper • 2309.11977 • Published • 2 -
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
Paper • 2308.16692 • Published • 1
Speech
-
Large-Scale Automatic Audiobook Creation
Paper • 2309.03926 • Published • 54 -
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts
Paper • 2309.11977 • Published • 2 -
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
Paper • 2308.16692 • Published • 1 -
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Paper • 2308.05734 • Published • 37
Attention
-
Efficient Memory Management for Large Language Model Serving with PagedAttention
Paper • 2309.06180 • Published • 25 -
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
Paper • 2308.16137 • Published • 40 -
Scaling Transformer to 1M tokens and beyond with RMT
Paper • 2304.11062 • Published • 3 -
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 20
Emergent
-
Are Emergent Abilities in Large Language Models just In-Context Learning?
Paper • 2309.01809 • Published • 3 -
Commonsense Knowledge Transfer for Pre-trained Language Models
Paper • 2306.02388 • Published • 1 -
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Paper • 2305.01610 • Published • 2 -
Schema-learning and rebinding as mechanisms of in-context learning and emergence
Paper • 2307.01201 • Published • 2
Optimal transport
Layout
-
UI Layout Generation with LLMs Guided by UI Grammar
Paper • 2310.15455 • Published • 3 -
You Only Look at Screens: Multimodal Chain-of-Action Agents
Paper • 2309.11436 • Published • 1 -
Never-ending Learning of User Interfaces
Paper • 2308.08726 • Published • 2 -
LMDX: Language Model-based Document Information Extraction and Localization
Paper • 2309.10952 • Published • 66
Softmax
-
Replacing softmax with ReLU in Vision Transformers
Paper • 2309.08586 • Published • 17 -
Softmax Bias Correction for Quantized Generative Models
Paper • 2309.01729 • Published • 1 -
The Closeness of In-Context Learning and Weight Shifting for Softmax Regression
Paper • 2304.13276 • Published • 1 -
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
Paper • 2306.12929 • Published • 12
MoE
-
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper • 2310.16795 • Published • 27 -
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Paper • 2308.12066 • Published • 4 -
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Paper • 2303.06182 • Published • 1 -
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate
Paper • 2112.14397 • Published • 1
Hyena
-
Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
Paper • 2310.12109 • Published • 1 -
Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions
Paper • 2310.18780 • Published • 3 -
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Paper • 2311.05908 • Published • 16 -
Multi-Dimensional Hyena for Spatial Inductive Bias
Paper • 2309.13600 • Published • 1
Pruning
-
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Paper • 2310.17157 • Published • 14 -
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Paper • 2305.15805 • Published • 1 -
Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt
Paper • 2305.11186 • Published • 1 -
Composable Sparse Fine-Tuning for Cross-Lingual Transfer
Paper • 2110.07560 • Published • 2
Inference
-
S^{3}: Increasing GPU Utilization during Generative Inference for Higher Throughput
Paper • 2306.06000 • Published • 1 -
Fast Distributed Inference Serving for Large Language Models
Paper • 2305.05920 • Published • 1 -
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Paper • 2305.13144 • Published • 1 -
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Paper • 2303.06182 • Published • 1
KV Cache
-
S^{3}: Increasing GPU Utilization during Generative Inference for Higher Throughput
Paper • 2306.06000 • Published • 1 -
PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference
Paper • 2405.12532 • Published -
SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget
Paper • 2404.04793 • Published • 1 -
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
Paper • 2405.14366 • Published • 2
Weight averaging
-
Experts Weights Averaging: A New General Training Scheme for Vision Transformers
Paper • 2308.06093 • Published • 2 -
Weight Averaging Improves Knowledge Distillation under Domain Shift
Paper • 2309.11446 • Published • 1 -
SWAMP: Sparse Weight Averaging with Multiple Particles for Iterative Magnitude Pruning
Paper • 2305.14852 • Published • 1 -
Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging
Paper • 2306.16788 • Published • 1
Merging
-
Experts Weights Averaging: A New General Training Scheme for Vision Transformers
Paper • 2308.06093 • Published • 2 -
Platypus: Quick, Cheap, and Powerful Refinement of LLMs
Paper • 2308.07317 • Published • 24 -
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers
Paper • 2211.11315 • Published • 1 -
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
Paper • 2307.13269 • Published • 32
Knowledge graph
-
Knowledge Solver: Teaching LLMs to Search for Domain Knowledge from Knowledge Graphs
Paper • 2309.03118 • Published • 2 -
Head-to-Tail: How Knowledgeable are Large Language Models (LLM)? A.K.A. Will LLMs Replace Knowledge Graphs?
Paper • 2308.10168 • Published • 2 -
MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models
Paper • 2308.09729 • Published • 5 -
Symbolic Knowledge Distillation: from General Language Models to Commonsense Models
Paper • 2110.07178 • Published • 1
Reranking
-
Natural Logic-guided Autoregressive Multi-hop Document Retrieval for Fact Verification
Paper • 2212.05276 • Published • 1 -
Hybrid and Collaborative Passage Reranking
Paper • 2305.09313 • Published • 1 -
Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual Retrieval
Paper • 2204.02292 • Published • 1 -
Discrete Prompt Optimization via Constrained Generation for Zero-shot Re-ranker
Paper • 2305.13729 • Published • 1
Question answering
-
Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering
Paper • 2204.04581 • Published • 1 -
Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions about Code
Paper • 2303.08033 • Published • 1 -
CAR: Conceptualization-Augmented Reasoner for Zero-Shot Commonsense Question Answering
Paper • 2305.14869 • Published • 1 -
Multi-hop Commonsense Knowledge Injection Framework for Zero-Shot Commonsense Question Answering
Paper • 2305.05936 • Published • 1
Memory
-
Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering
Paper • 2204.04581 • Published • 1 -
Retrieval-Augmented Multimodal Language Modeling
Paper • 2211.12561 • Published • 1 -
When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories
Paper • 2212.10511 • Published • 1 -
Memorizing Transformers
Paper • 2203.08913 • Published • 2
Multiple choice Q&A
Summarization
-
Zero-Shot Cross-Lingual Summarization via Large Language Models
Paper • 2302.14229 • Published • 1 -
Towards Unifying Multi-Lingual and Cross-Lingual Summarization
Paper • 2305.09220 • Published • 1 -
GPT Self-Supervision for a Better Data Annotator
Paper • 2306.04349 • Published • 1 -
Few-shot training LLMs for project-specific code-summarization
Paper • 2207.04237 • Published • 1
Relationship extraction
-
Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot Relation Extractors
Paper • 2305.11159 • Published • 1 -
CodeIE: Large Code Generation Models are Better Few-Shot Information Extractors
Paper • 2305.05711 • Published • 1 -
Improving Continual Relation Extraction through Prototypical Contrastive Learning
Paper • 2210.04513 • Published • 1
Text classification
-
Comparison between parameter-efficient techniques and full fine-tuning: A case study on multilingual news article classification
Paper • 2308.07282 • Published • 1 -
PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation
Paper • 2310.14192 • Published • 2 -
Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers
Paper • 2103.14453 • Published • 1 -
Guiding Generative Language Models for Data Augmentation in Few-Shot Text Classification
Paper • 2111.09064 • Published • 1
Reversible
Adversarial
-
LTD: Low Temperature Distillation for Robust Adversarial Training
Paper • 2111.02331 • Published • 1 -
Interpolated Adversarial Training: Achieving Robust Neural Networks without Sacrificing Too Much Accuracy
Paper • 1906.06784 • Published • 1 -
Pruning Adversarially Robust Neural Networks without Adversarial Examples
Paper • 2210.04311 • Published • 1 -
Mitigating the Accuracy-Robustness Trade-off via Multi-Teacher Adversarial Distillation
Paper • 2306.16170 • Published • 1
Semantic segmentation
-
TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation
Paper • 2202.13393 • Published • 1 -
BPKD: Boundary Privileged Knowledge Distillation For Semantic Segmentation
Paper • 2306.08075 • Published • 1 -
Semantic Consistency for Assuring Reliability of Large Language Models
Paper • 2308.09138 • Published • 2
Distributed
-
A Unified View of Long-Sequence Models towards Modeling Million-Scale Dependencies
Paper • 2302.06218 • Published • 1 -
ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Paper • 2306.10209 • Published • 2 -
SE-MoE: A Scalable and Efficient Mixture-of-Experts Distributed Training and Inference System
Paper • 2205.10034 • Published • 1 -
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training
Paper • 2303.06318 • Published • 1
Backpropagation
-
Sparse Backpropagation for MoE Training
Paper • 2310.00811 • Published • 2 -
The Forward-Forward Algorithm: Some Preliminary Investigations
Paper • 2212.13345 • Published • 2 -
Fine-Tuning Language Models with Just Forward Passes
Paper • 2305.17333 • Published • 3 -
Towards Green AI in Fine-tuning Large Language Models via Adaptive Backpropagation
Paper • 2309.13192 • Published • 1
Blockwise
Text diffusion
-
SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control
Paper • 2210.17432 • Published • 1 -
TESS: Text-to-Text Self-Conditioned Simplex Diffusion
Paper • 2305.08379 • Published • 3 -
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Paper • 2308.12219 • Published • 1 -
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper • 2310.17680 • Published • 73
Concept
-
Concept-Oriented Deep Learning with Large Language Models
Paper • 2306.17089 • Published • 1 -
Extracting Mathematical Concepts with Large Language Models
Paper • 2309.00642 • Published • 1 -
An Image is Worth Multiple Words: Learning Object Level Concepts using Multi-Concept Prompt Learning
Paper • 2310.12274 • Published • 13 -
COPEN: Probing Conceptual Knowledge in Pre-trained Language Models
Paper • 2211.04079 • Published • 1
Image editing
Modular
FFN/MLP
-
Scaling MLPs: A Tale of Inductive Bias
Paper • 2306.13575 • Published • 15 -
Trap of Feature Diversity in the Learning of MLPs
Paper • 2112.00980 • Published • 2 -
Understanding the Spectral Bias of Coordinate Based MLPs Via Training Dynamics
Paper • 2301.05816 • Published • 1 -
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?
Paper • 2108.04384 • Published • 1
Positional embeddings
-
Cure the headache of Transformers via Collinear Constrained Attention
Paper • 2309.08646 • Published • 13 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 71 -
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
Paper • 2309.10400 • Published • 26 -
Dynamically Relative Position Encoding-Based Transformer for Automatic Code Edit
Paper • 2205.13522 • Published • 1
Federated learning
-
Decentralized Policy Optimization
Paper • 2211.03032 • Published • 1 -
MPCFormer: fast, performant and private Transformer inference with MPC
Paper • 2211.01452 • Published • 1 -
Distributed Pruning Towards Tiny Neural Networks in Federated Learning
Paper • 2212.01977 • Published • 1 -
FedDIP: Federated Learning with Extreme Dynamic Pruning and Incremental Regularization
Paper • 2309.06805 • Published • 1
Embodied
-
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
Paper • 2309.10150 • Published • 25 -
Code as Policies: Language Model Programs for Embodied Control
Paper • 2209.07753 • Published • 1 -
Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling
Paper • 2402.10211 • Published • 14
Embeddings
-
Towards General Text Embeddings with Multi-stage Contrastive Learning
Paper • 2308.03281 • Published • 2 -
NEFTune: Noisy Embeddings Improve Instruction Finetuning
Paper • 2310.05914 • Published • 14 -
EELBERT: Tiny Models through Dynamic Embeddings
Paper • 2310.20144 • Published • 3 -
Dynamic Word Embeddings for Evolving Semantic Discovery
Paper • 1703.00607 • Published • 1
Hyperparameters
Structured data
-
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?
Paper • 2309.08963 • Published • 11 -
DSG: An End-to-End Document Structure Generator
Paper • 2310.09118 • Published • 2 -
Integrating Graphs with Large Language Models: Methods and Prospects
Paper • 2310.05499 • Published • 2 -
Schema-learning and rebinding as mechanisms of in-context learning and emergence
Paper • 2307.01201 • Published • 2
Sampling
Constrained decoding
-
Terminology-Aware Translation with Constrained Decoding and Large Language Model Prompting
Paper • 2310.05824 • Published • 1 -
Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation
Paper • 1804.06609 • Published • 1 -
Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search
Paper • 1704.07138 • Published • 1 -
Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning
Paper • 2305.13971 • Published • 4
Batched decoding
Finance
-
Chinese Fine-Grained Financial Sentiment Analysis with Large Language Models
Paper • 2306.14096 • Published • 1 -
Instruct-FinGPT: Financial Sentiment Analysis by Instruction Tuning of General-Purpose Large Language Models
Paper • 2306.12659 • Published • 1 -
Transforming Sentiment Analysis in the Financial Domain with ChatGPT
Paper • 2308.07935 • Published • 1 -
Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks
Paper • 2305.05862 • Published • 4
Sentiment analysis
-
Chinese Fine-Grained Financial Sentiment Analysis with Large Language Models
Paper • 2306.14096 • Published • 1 -
Instruct-FinGPT: Financial Sentiment Analysis by Instruction Tuning of General-Purpose Large Language Models
Paper • 2306.12659 • Published • 1 -
Transforming Sentiment Analysis in the Financial Domain with ChatGPT
Paper • 2308.07935 • Published • 1 -
Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks
Paper • 2305.05862 • Published • 4
Named Entity Recognition (NER)
-
Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks
Paper • 2305.05862 • Published • 4 -
CodeIE: Large Code Generation Models are Better Few-Shot Information Extractors
Paper • 2305.05711 • Published • 1 -
ProKD: An Unsupervised Prototypical Knowledge Distillation Network for Zero-Resource Cross-Lingual Named Entity Recognition
Paper • 2301.08855 • Published • 1 -
Model-Agnostic Syntactical Information for Pre-Trained Programming Language Models
Paper • 2303.06233 • Published • 1
Bias
-
Soft-prompt Tuning for Large Language Models to Evaluate Bias
Paper • 2306.04735 • Published • 1 -
Mitigating Popularity Bias in Recommendation with Unbalanced Interactions: A Gradient Perspective
Paper • 2211.01154 • Published • 1 -
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning
Paper • 2212.08061 • Published • 1 -
Bias Assessment and Mitigation in LLM-based Code Generation
Paper • 2309.14345 • Published • 1
Meta-learning
-
Self-supervised Meta-Prompt Learning with Meta-Gradient Regularization for Few-shot Generalization
Paper • 2303.12314 • Published • 1 -
Augmented Large Language Models with Parametric Knowledge Guiding
Paper • 2305.04757 • Published • 2 -
Learning to Retrieve In-Context Examples for Large Language Models
Paper • 2307.07164 • Published • 22 -
LiST: Lite Prompted Self-training Makes Parameter-Efficient Few-shot Learners
Paper • 2110.06274 • Published • 1
Paraphrase
-
LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning
Paper • 2305.18169 • Published • 1 -
Quick Starting Dialog Systems with Paraphrase Generation
Paper • 2204.02546 • Published • 1 -
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper • 2401.16380 • Published • 51
Annotation
-
Automated Annotation with Generative AI Requires Validation
Paper • 2306.00176 • Published • 1 -
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Paper • 2309.09582 • Published • 4 -
PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation
Paper • 2310.14192 • Published • 2 -
ICLEF: In-Context Learning with Expert Feedback for Explainable Style Transfer
Paper • 2309.08583 • Published • 1
Validation
Privacy
-
Privacy-Preserving Prompt Tuning for Large Language Model Services
Paper • 2305.06212 • Published • 1 -
Privately Fine-Tuning Large Language Models with Differential Privacy
Paper • 2210.15042 • Published • 1 -
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale
Paper • 2207.09078 • Published • 1
Document parsing
-
DSG: An End-to-End Document Structure Generator
Paper • 2310.09118 • Published • 2 -
OCR-free Document Understanding Transformer
Paper • 2111.15664 • Published • 3 -
DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents
Paper • 2304.12484 • Published • 1 -
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Paper • 2309.01131 • Published • 1
LLM architecture
-
The Impact of Depth and Width on Transformer Language Model Generalization
Paper • 2310.19956 • Published • 10 -
Retentive Network: A Successor to Transformer for Large Language Models
Paper • 2307.08621 • Published • 171 -
RWKV: Reinventing RNNs for the Transformer Era
Paper • 2305.13048 • Published • 19 -
Attention Is All You Need
Paper • 1706.03762 • Published • 69
Reflection
-
A Zero-Shot Language Agent for Computer Control with Structured Reflection
Paper • 2310.08740 • Published • 16 -
ExpeL: LLM Agents Are Experiential Learners
Paper • 2308.10144 • Published • 3 -
Demystifying GPT Self-Repair for Code Generation
Paper • 2306.09896 • Published • 19 -
Large Language Models are Better Reasoners with Self-Verification
Paper • 2212.09561 • Published • 1
Text editing/revision
-
CoEdIT: Text Editing by Task-Specific Instruction Tuning
Paper • 2305.09857 • Published • 7 -
Reducing Sequence Length by Predicting Edit Operations with Large Language Models
Paper • 2305.11862 • Published • 1 -
Controlled Generation with Prompt Insertion for Natural Language Explanations in Grammatical Error Correction
Paper • 2309.11439 • Published • 1 -
Program Merge Conflict Resolution via Neural Transformers
Paper • 2109.00084 • Published • 1
Clarify
RoPE
-
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
Paper • 2309.10400 • Published • 26 -
CONFLATOR: Incorporating Switching Point based Rotatory Positional Encodings for Code-Mixed Language Modeling
Paper • 2309.05270 • Published • 1 -
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 117
Evolutionary Algorithms
Survey
-
Towards an Understanding of Large Language Models in Software Engineering Tasks
Paper • 2308.11396 • Published • 1 -
Several categories of Large Language Models (LLMs): A Short Survey
Paper • 2307.10188 • Published • 1 -
Large Language Models for Generative Recommendation: A Survey and Visionary Discussions
Paper • 2309.01157 • Published • 1 -
A Survey on Large Language Models for Recommendation
Paper • 2305.19860 • Published • 1
Grammar
-
Evaluating the Capability of Large-scale Language Models on Chinese Grammatical Error Correction Task
Paper • 2307.03972 • Published • 1 -
GrammarGPT: Exploring Open-Source LLMs for Native Chinese Grammatical Error Correction with Supervised Fine-Tuning
Paper • 2307.13923 • Published • 1 -
Evaluating GPT-3.5 and GPT-4 on Grammatical Error Correction for Brazilian Portuguese
Paper • 2306.15788 • Published • 2 -
Are Pre-trained Language Models Useful for Model Ensemble in Chinese Grammatical Error Correction?
Paper • 2305.15183 • Published • 1
Dataset curation
-
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning
Paper • 2308.12032 • Published • 1 -
Know thy corpus! Robust methods for digital curation of Web corpora
Paper • 2003.06389 • Published • 1 -
Self-Alignment with Instruction Backtranslation
Paper • 2308.06259 • Published • 42 -
The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation
Paper • 2305.06156 • Published • 2
Mental health
Grounding
-
A Bi-Step Grounding Paradigm for Large Language Models in Recommendation Systems
Paper • 2308.08434 • Published • 1 -
Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
Paper • 2302.02662 • Published • 1 -
Self-driven Grounding: Large Language Model Agents with Automatical Language-aligned Skill Learning
Paper • 2309.01352 • Published • 1 -
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies
Paper • 2308.03188 • Published • 2
Recommendation
-
A Bi-Step Grounding Paradigm for Large Language Models in Recommendation Systems
Paper • 2308.08434 • Published • 1 -
Large Language Models for Generative Recommendation: A Survey and Visionary Discussions
Paper • 2309.01157 • Published • 1 -
LLM-Rec: Personalized Recommendation via Prompting Large Language Models
Paper • 2307.15780 • Published • 27 -
Leveraging Large Language Models for Pre-trained Recommender Systems
Paper • 2308.10837 • Published • 1
Data processing
ASR
-
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models
Paper • 2309.15701 • Published • 2 -
CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders
Paper • 2309.07707 • Published • 1 -
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Paper • 2311.00430 • Published • 58 -
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Paper • 2309.13876 • Published • 1
Interpretability
-
A technical note on bilinear layers for interpretability
Paper • 2305.03452 • Published • 1 -
Interpreting Transformer's Attention Dynamic Memory and Visualizing the Semantic Information Flow of GPT
Paper • 2305.13417 • Published • 1 -
Explainable AI for Pre-Trained Code Models: What Do They Learn? When They Do Not Work?
Paper • 2211.12821 • Published • 2 -
The Linear Representation Hypothesis and the Geometry of Large Language Models
Paper • 2311.03658 • Published • 1
Multi task
-
Challenges and Opportunities of Using Transformer-Based Multi-Task Learning in NLP Through ML Lifecycle: A Survey
Paper • 2308.08234 • Published • 1 -
Understanding and Improving Information Transfer in Multi-Task Learning
Paper • 2005.00944 • Published • 1 -
Improving Multi-task Learning via Seeking Task-based Flat Regions
Paper • 2211.13723 • Published • 2 -
Improvable Gap Balancing for Multi-Task Learning
Paper • 2307.15429 • Published • 1
Optimizer
-
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model
Paper • 2305.15265 • Published • 1 -
Mesa: A Memory-saving Training Framework for Transformers
Paper • 2111.11124 • Published • 1 -
Full Parameter Fine-tuning for Large Language Models with Limited Resources
Paper • 2306.09782 • Published • 30 -
Layered gradient accumulation and modular pipeline parallelism: fast and efficient training of large language models
Paper • 2106.02679 • Published • 1
Time series
-
LLM4TS: Two-Stage Fine-Tuning for Time-Series Forecasting with Pre-Trained LLMs
Paper • 2308.08469 • Published • 2 -
TEST: Text Prototype Aligned Embedding to Activate LLM's Ability for Time Series
Paper • 2308.08241 • Published • 2 -
Are Large Language Models Temporally Grounded?
Paper • 2311.08398 • Published • 1 -
NL2TL: Transforming Natural Languages to Temporal Logics using Large Language Models
Paper • 2305.07766 • Published • 1
Tree search
-
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models
Paper • 2310.08582 • Published • 2 -
Autonomous Tree-search Ability of Large Language Models
Paper • 2310.10686 • Published • 2 -
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 10 -
PathFinder: Guided Search over Multi-Step Reasoning Paths
Paper • 2312.05180 • Published • 10
Hypernetwork
-
Magnitude Invariant Parametrizations Improve Hypernetwork Learning
Paper • 2304.07645 • Published • 1 -
HyperShot: Few-Shot Learning by Kernel HyperNetworks
Paper • 2203.11378 • Published • 1 -
Hypernetworks for Zero-shot Transfer in Reinforcement Learning
Paper • 2211.15457 • Published • 1 -
Continual Learning with Dependency Preserving Hypernetworks
Paper • 2209.07712 • Published • 1
Data augmentation
-
DualMix: Unleashing the Potential of Data Augmentation for Online Class-Incremental Learning
Paper • 2303.07864 • Published • 1 -
Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks
Paper • 2305.13547 • Published • 1 -
MixPro: Simple yet Effective Data Augmentation for Prompt-based Learning
Paper • 2304.09402 • Published • 2 -
LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning
Paper • 2305.18169 • Published • 1
No backprop
-
Fine-Tuning Language Models with Just Forward Passes
Paper • 2305.17333 • Published • 3 -
HyperTuning: Toward Adapting Large Language Models without Back-propagation
Paper • 2211.12485 • Published • 1 -
Is Complexity Required for Neural Network Pruning? A Case Study on Global Magnitude Pruning
Paper • 2209.14624 • Published • 1 -
Backpropagation-free Training of Deep Physical Neural Networks
Paper • 2304.11042 • Published • 1
Regularization
-
HyperSparse Neural Networks: Shifting Exploration to Exploitation through Adaptive Regularization
Paper • 2308.07163 • Published • 1 -
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
Paper • 2301.09554 • Published • 1 -
Weight Compander: A Simple Weight Reparameterization for Regularization
Paper • 2306.16993 • Published • 1 -
FedDIP: Federated Learning with Extreme Dynamic Pruning and Incremental Regularization
Paper • 2309.06805 • Published • 1
Factuality
-
Fine-tuning Language Models for Factuality
Paper • 2311.08401 • Published • 30 -
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies
Paper • 2308.03188 • Published • 2 -
Trusted Source Alignment in Large Language Models
Paper • 2311.06697 • Published • 12 -
Long-form factuality in large language models
Paper • 2403.18802 • Published • 26
Uncertainty
-
R-Tuning: Teaching Large Language Models to Refuse Unknown Questions
Paper • 2311.09677 • Published • 3 -
Look Before You Leap: An Exploratory Study of Uncertainty Measurement for Large Language Models
Paper • 2307.10236 • Published • 1 -
Shifting Attention to Relevance: Towards the Uncertainty Estimation of Large Language Models
Paper • 2307.01379 • Published • 1 -
Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models
Paper • 2305.19187 • Published • 1
Ontology
RNN
-
Trellis Networks for Sequence Modeling
Paper • 1810.06682 • Published • 1 -
ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-like Language Models
Paper • 2311.01981 • Published • 1 -
Gated recurrent neural networks discover attention
Paper • 2309.01775 • Published • 10 -
Inverse Approximation Theory for Nonlinear Recurrent Neural Networks
Paper • 2305.19190 • Published • 1
Convolution
-
Trellis Networks for Sequence Modeling
Paper • 1810.06682 • Published • 1 -
Pruning Very Deep Neural Network Channels for Efficient Inference
Paper • 2211.08339 • Published • 1 -
LAPP: Layer Adaptive Progressive Pruning for Compressing CNNs from Scratch
Paper • 2309.14157 • Published • 1 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 143
VAE
-
Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach
Paper • 2310.12004 • Published • 2 -
Concept-free Causal Disentanglement with Variational Graph Auto-Encoder
Paper • 2311.10638 • Published • 2 -
Tokenization with Factorized Subword Encoding
Paper • 2306.07764 • Published • 1 -
Mixture-of-experts VAEs can disregard variation in surjective multimodal data
Paper • 2204.05229 • Published • 1
NoPE
-
The Impact of Positional Encoding on Length Generalization in Transformers
Paper • 2305.19466 • Published • 2 -
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2 -
Position Prediction as an Effective Pretraining Strategy
Paper • 2207.07611 • Published • 1 -
Transformer Language Models without Positional Encodings Still Learn Positional Information
Paper • 2203.16634 • Published • 5
Hebbian
Activation
-
Activation Functions in Deep Learning: A Comprehensive Survey and Benchmark
Paper • 2109.14545 • Published • 1 -
Adaptive Activation-based Structured Pruning
Paper • 2201.10520 • Published • 1 -
Learning Activation Functions for Sparse Neural Networks
Paper • 2305.10964 • Published • 1 -
Exploiting Transformer Activation Sparsity with Dynamic Inference
Paper • 2310.04361 • Published • 1
Initialization
-
Pruning at Initialization -- A Sketching Perspective
Paper • 2305.17559 • Published • 1 -
The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training
Paper • 2202.02643 • Published • 1 -
Why Random Pruning Is All We Need to Start Sparse
Paper • 2210.02412 • Published • 1 -
MixtureGrowth: Growing Neural Networks by Recombining Learned Parameters
Paper • 2311.04251 • Published • 1
Relative PE
Approximation
-
Linear Self-Attention Approximation via Trainable Feedforward Kernel
Paper • 2211.04076 • Published • 1 -
Greenformer: Factorization Toolkit for Efficient Deep Neural Networks
Paper • 2109.06762 • Published • 1 -
COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models
Paper • 2305.17235 • Published • 2 -
Exploring Low Rank Training of Deep Neural Networks
Paper • 2209.13569 • Published • 1
Diffusion
-
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Paper • 2312.04410 • Published • 15 -
Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling
Paper • 2310.06389 • Published • 1 -
Diffusion Model Alignment Using Direct Preference Optimization
Paper • 2311.12908 • Published • 50 -
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models
Paper • 2305.13655 • Published • 7
Normalization
-
Unified Normalization for Accelerating and Stabilizing Transformers
Paper • 2208.01313 • Published • 1 -
Interpret Vision Transformers as ConvNets with Dynamic Convolutions
Paper • 2309.10713 • Published • 1 -
Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs
Paper • 2003.00152 • Published • 1 -
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
Paper • 2405.11582 • Published • 18
Recursive
-
Modeling Hierarchical Structures with Continuous Recursive Neural Networks
Paper • 2106.06038 • Published • 1 -
RNNs of RNNs: Recursive Construction of Stable Assemblies of Recurrent Neural Networks
Paper • 2106.08928 • Published • 1 -
Sliced Recursive Transformer
Paper • 2111.05297 • Published • 1 -
Byte-Level Recursive Convolutional Auto-Encoder for Text
Paper • 1802.01817 • Published
Conditional
Token dropping
Legal
SVG
-
Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding
Paper • 2306.06094 • Published • 1 -
IconShop: Text-Guided Vector Icon Synthesis with Autoregressive Transformers
Paper • 2304.14400 • Published • 4 -
VecFusion: Vector Font Generation with Diffusion
Paper • 2312.10540 • Published • 22 -
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis
Paper • 2401.17093 • Published • 21
Special Tokens
Document
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper • 2401.02823 • Published • 37 -
TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents
Paper • 2312.01279 • Published • 6 -
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Paper • 2311.07575 • Published • 15
Confidence
-
Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation
Paper • 2311.08877 • Published • 7 -
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
Paper • 2305.14975 • Published • 2 -
Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language Models
Paper • 2305.13712 • Published • 2
Evaluation
-
Fusion-Eval: Integrating Evaluators with LLMs
Paper • 2311.09204 • Published • 6 -
Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer
Paper • 2311.06720 • Published • 9 -
Safurai 001: New Qualitative Approach for Code LLM Evaluation
Paper • 2309.11385 • Published • 2 -
Assessment of Pre-Trained Models Across Languages and Grammars
Paper • 2309.11165 • Published • 1
Emotion
Fashion
Mamba
-
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Paper • 2401.09417 • Published • 62 -
VMamba: Visual State Space Model
Paper • 2401.10166 • Published • 40 -
SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation
Paper • 2401.13560 • Published • 1 -
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces
Paper • 2402.00789 • Published • 2
Vocoder
Phrase
Analogy
Explanation
SSM
-
StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization
Paper • 2311.14495 • Published • 1 -
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Paper • 2401.09417 • Published • 62 -
SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation
Paper • 2401.13560 • Published • 1 -
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces
Paper • 2402.00789 • Published • 2
Medical
-
SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation
Paper • 2401.13560 • Published • 1 -
Vivim: a Video Vision Mamba for Medical Video Object Segmentation
Paper • 2401.14168 • Published • 2 -
From Beginner to Expert: Modeling Medical Knowledge into General LLMs
Paper • 2312.01040 • Published • 1
Reparameterization
Grokking
Hypercomplex
-
Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with 1/n Parameters
Paper • 2102.08597 • Published • 1 -
PHNNs: Lightweight Neural Networks via Parameterized Hypercomplex Convolutions
Paper • 2110.04176 • Published • 1 -
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
Paper • 2106.04647 • Published • 1
Literature review
Random
-
Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep Neural Network, a Survey
Paper • 2205.08099 • Published • 1 -
Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs
Paper • 2003.00152 • Published • 1 -
OpenRAND: A Performance Portable, Reproducible Random Number Generation Library for Parallel Computations
Paper • 2310.19925 • Published • 1 -
Squares: A Fast Counter-Based RNG
Paper • 2004.06278 • Published • 1
GNN
Byte-level
-
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Paper • 2105.13626 • Published • 3 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 54 -
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Paper • 2305.07185 • Published • 9 -
Byte-Level Recursive Convolutional Auto-Encoder for Text
Paper • 1802.01817 • Published
Vector DB
Similarity search
Neuromorphic
Education