Inference Providers documentation

Groq

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Groq

Groq is fast AI inference. Their groundbreaking LPU technology delivers record-setting performance and efficiency for GenAI models. With custom chips specifically designed for AI inference workloads and a deterministic, software-first approach, Groq eliminates the bottlenecks of conventional hardware to enable real-time AI applications with predictable latency and exceptional throughput so developers can build fast.

For latest pricing, visit our pricing page.

Resources

Supported tasks

Chat Completion (LLM)

Find out more about Chat Completion (LLM) here.

import os
from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="groq",
    api_key=os.environ["HF_TOKEN"],
)

completion = client.chat.completions.create(
    model="Qwen/Qwen3-32B",
    messages=[
        {
            "role": "user",
            "content": "What is the capital of France?"
        }
    ],
)

print(completion.choices[0].message)

Chat Completion (VLM)

Find out more about Chat Completion (VLM) here.

import os
from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="groq",
    api_key=os.environ["HF_TOKEN"],
)

completion = client.chat.completions.create(
    model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe this image in one sentence."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
                    }
                }
            ]
        }
    ],
)

print(completion.choices[0].message)
< > Update on GitHub