view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference By mfuntowicz and 1 other • Jan 16 • 75
DataGemma Release Collection A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated May 30 • 86
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 257