Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,7 @@ sdk: static
|
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
-
Text-Generation-Inference is
|
11 |
|
12 |
- Tensor Parallelism and custom cuda kernels
|
13 |
- Optimized transformers code for inference using flash-attention and Paged Attention on the most popular architectures
|
|
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
+
Text-Generation-Inference is a solution build for deploying and serving Large Language Models (LLMs). TGI enables high-performance text generation using Tensor Parallelism and dynamic batching for the most popular open-source LLMs, including StarCoder, BLOOM, GPT-NeoX, Llama, and T5. Text Generation Inference is already used by customers such as IBM, Grammarly, and the Open-Assistant initiative implements optimization for all supported model architectures, including:
|
11 |
|
12 |
- Tensor Parallelism and custom cuda kernels
|
13 |
- Optimized transformers code for inference using flash-attention and Paged Attention on the most popular architectures
|