Guangxuan Xiao's picture

3 4 3

Guangxuan Xiao

Guangxuan-Xiao

·

http://guangxuanx.com

Guangxuan-Xiao

AI & ML interests

Efficient Machine Learning

Organizations

authored a paper 7 months ago

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

Paper • 2502.14866 • Published Feb 20 • 13

authored a paper 11 months ago

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Paper • 2410.10819 • Published Oct 14, 2024 • 8

authored 5 papers about 1 year ago

Offsite-Tuning: Transfer Learning without Full Model

Paper • 2302.04870 • Published Feb 9, 2023

Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Paper • 2406.10774 • Published Jun 16, 2024 • 3

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Paper • 2405.04532 • Published May 7, 2024

Retrieval Head Mechanistically Explains Long-Context Factuality

Paper • 2404.15574 • Published Apr 24, 2024 • 3

InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory

Paper • 2402.04617 • Published Feb 7, 2024 • 4

authored a paper over 1 year ago

BitDelta: Your Fine-Tune May Only Be Worth One Bit

Paper • 2402.10193 • Published Feb 15, 2024 • 23

authored 2 papers almost 2 years ago

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Paper • 2211.10438 • Published Nov 18, 2022 • 6

Efficient Streaming Language Models with Attention Sinks

Paper • 2309.17453 • Published Sep 29, 2023 • 14

authored a paper about 2 years ago

FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention

Paper • 2305.10431 • Published May 17, 2023 • 2