Supported Hardware Providers

HUGS are optimized for a wide-variety of accelerators for ML inference, and support across different accelerator families and providers will continue to grow exponentially.

NVIDIA GPUs

NVIDIA GPUs are widely used for machine learning and AI applications, offering high performance and specialized hardware for deep learning tasks. NVIDIA’s CUDA platform provides a robust ecosystem for GPU-accelerated computing.

Supported device(s):

NVIDIA A10G: 24GB GDDR6 memory, 9216 CUDA cores, 288 Tensor cores, 72 RT cores
NVIDIA L4: 24GB GDDR6 memory, 7168 CUDA cores, 224 Tensor cores, 56 RT cores
NVIDIA L40S: 48GB GDDR6 memory, 18176 CUDA cores, 568 Tensor cores, 142 RT cores
NVIDIA A100: 40/80GB HBM2e memory, 6912 CUDA cores, 432 Tensor cores, 108 RT cores
NVIDIA H100: 80GB HBM3 memory, 14592 CUDA cores, 456 Tensor cores, 144 RT cores

AMD GPUs

AMD GPUs provide strong competition in the AI and machine learning space, offering high-performance computing capabilities with their CDNA architecture. AMD’s ROCm (Radeon Open Compute) platform enables GPU-accelerated computing on Linux systems.

Supported device(s):

AMD Instinct MI300X: 192GB HBM3 memory, 304 Compute Units, 4864 AI Accelerators

AWS Accelerators (Inferentia/Trainium)

AWS Inferentia2 is a custom-built accelerator designed specifically for high-performance, cost-effective machine learning inference.

Supported device(s):

AWS Inferentia2: Available in Amazon EC2 Inf2 instances, offering up to 12 Inferentia2 chips per instance. AWS Inferentia2 accelerators are optimized for deploying large language models and other compute-intensive ML workloads, providing high throughput and low latency for inference tasks. More information at Amazon EC2 Inf2 Instances.
AWS Trainium: Coming soon!

Google TPUs

Coming soon

< > Update on GitHub

Hugging Face Generative AI Services (HUGS)

Supported Hardware Providers

NVIDIA GPUs

AMD GPUs

AWS Accelerators (Inferentia/Trainium)

Google TPUs