Groq

All supported Groq models can be found here

Groq is fast AI inference. Their groundbreaking LPU technology delivers record-setting performance and efficiency for GenAI models. With custom chips specifically designed for AI inference workloads and a deterministic, software-first approach, Groq eliminates the bottlenecks of conventional hardware to enable real-time AI applications with predictable latency and exceptional throughput so developers can build fast.

For latest pricing, visit our pricing page.

Resources

Website: https://groq.com/
Documentation: https://console.groq.com/docs
Community Forum: https://community.groq.com/
X: @GroqInc
LinkedIn: Groq
YouTube: Groq

Supported tasks

Chat Completion (LLM)

Find out more about Chat Completion (LLM) here.

Language

Client

Provider

Settings

import os
from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="groq",
    api_key=os.environ["HF_TOKEN"],
)

completion = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct",
    messages=[
        {
            "role": "user",
            "content": "What is the capital of France?"
        }
    ],
)

print(completion.choices[0].message)

Chat Completion (VLM)

Find out more about Chat Completion (VLM) here.