Hugging Face Generative AI Services (HUGS)

Optimized, zero-configuration inference microservices for open AI models

Hugging Face Generative AI Services (HUGS) are optimized, zero-configuration inference microservices designed to simplify and accelerate the development of AI applications with open models. Built on open-source Hugging Face technologies such as Text Generation Inference or Transformers. HUGS provides the best solution for efficiently building Generative AI Applications with open models and are optimized for a variety of hardware accelerators, including NVIDIA GPUs, AMD GPUs, AWS Inferentia, and Google TPUs (soon).

Key Features

Zero-configuration Deployment: Automatically loads optimal settings based on your hardware environment.
Optimized Hardware Inference Engines: Built on Hugging Face’s Text Generation Inference (TGI), optimized for a variety of hardware.
Hardware Flexibility: Optimized for various accelerators, including NVIDIA GPUs, AMD GPUs, AWS Inferentia, and Google TPUs
Built for Open Models: Compatible with a wide range of popular open AI models, including LLMs, Multimodal Models, and Embedding Models.
Industry Standardized APIs: Easily deployable using Kubernetes and standardized on the OpenAI API.
Security and Control: Deploy HUGS within your own infrastructure for enhanced security and data control.
Enterprise Compliance: Minimizes compliance risks by including necessary licenses and terms of services.

Why HUGS?

Enterprises often struggle with their model-serving infrastructure in terms of performance, engineering complexity, and compliance when using open models. Early-stage startups and large enterprises have built POCs using models not because they want to use closed models with black box APIs but because building their AI with open models takes more work.

HUGS are optimized, zero-configuration inference microservices designed to simplify and accelerate the development of AI models. With HUGS, we want to make switching from a closed-source API to a self-hosted open model easy.

HUGS deliver endpoints compatible with the OpenAI API, so you don’t need to change your code when transitioning your POC to production with your model and infra. They automatically deliver maximum hardware efficiency. HUGS make it easy to keep your applications at the cutting edge of Generative AI by offering updates when new battle-tested open models become available.

Built for Open Models

Compatible with a wide range of popular open AI models, including:

LLMs: Llama, Gemma, Mistral, Mixtral, Qwen, Deepseek (soon), T5 (soon), Yi (soon), Phi (soon), Command R (soon)
(Soon) Multimodal Models: Idefics, Llava
(Soon) Embedding Models: BGE, GTE, Mixbread, Arctic, Jina, Nomic

Getting Started

To start using HUGS, you have several options. You can access HUGS as part of your Hugging Face Enterprise subscription, through Cloud Service Provider (CSP) marketplaces. Currently, you can find HUGS on Amazon Web Services (AWS) and Google Cloud Platform (GCP), and soon on Microsoft Azure. HUGS are also natively available inside DigitalOcean GPU Droplet.

For detailed instructions on deployment and usage:

Hugging Face Enterprise
Amazon Web Services (AWS)
- AWS with NVIDIA GPUs
- AWS with Inferentia & Trainium
DigitalOcean
Google Cloud Platform (GCP)
Microsoft Azure (coming soon)

More Resources

Experience the power of open models with the simplicity of HUGS. Start building your AI applications faster and more efficiently today!

< > Update on GitHub