Edit Models filters

Apps

Docker Model Runner

Inference Providers

HF Inference API

Misc

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Carbon Emissions

Mixture of Experts

Models

1,033

Full-text search

Active filters: vllm

DBMe/Mistral-Large-Instruct-2411-2.86bpw-h6-exl2

Updated Nov 23, 2024 • 6 • 1

gallantpigeon/mistral-large-instruct-2411-w8a16

31B • Updated Nov 23, 2024 • 6

gallantpigeon/mistral-large-instruct-2411-int8-w8a8

123B • Updated Nov 23, 2024 • 3

tensorblock/Mistral-Small-Instruct-2409-GGUF

22B • Updated Jul 8 • 58

bartowski/Sparse-Llama-3.1-8B-2of4-GGUF

Text Generation • 8B • Updated Nov 26, 2024 • 147 • 3

QuantFactory/Sparse-Llama-3.1-8B-2of4-GGUF

Text Generation • 8B • Updated Nov 27, 2024 • 30 • 4

parasail-ai/GritLM-7B-vllm

Text Generation • 7B • Updated Dec 6, 2024 • 8.23k • 1

QuantFactory/L3-Aspire-Heart-Matrix-8B-GGUF

Text Generation • 8B • Updated Nov 29, 2024 • 52 • 2

tensorblock/Sparse-Llama-3.1-8B-2of4-GGUF

Text Generation • 8B • Updated Jul 9 • 619

dangvansam/gemma-2-27b-it-FP8-fix-system-role

Text Generation • 27B • Updated Dec 4, 2024 • 25

dangvansam/gemma-2-2b-it-fix-system-role

Text Generation • 3B • Updated Dec 8, 2024 • 3

dangvansam/gemma-2-9b-it-fix-system-role

Text Generation • 9B • Updated Dec 8, 2024 • 83 • 1

yejingfu/nmagic-Meta-Llama-3-8B-Instruct-FP8

8B • Updated Dec 5, 2024 • 2

gghfez/Mistral-Large-Instruct-2411

123B • Updated Dec 14, 2024 • 6

jacobcarajo/Ministral-8B-Instruct-2410-Q5_K_M-GGUF

8B • Updated Dec 14, 2024 • 1 • 1

vitekkor/T-pro-it-1.0-bnb-8bit

33B • Updated Dec 16, 2024 • 3 • 1

itlwas/Ministral-8B-Instruct-2410-Q4_K_M-GGUF

8B • Updated Dec 19, 2024 • 7

redhat6/Ministral-8B-Instruct-2410-Q8_0-GGUF

8B • Updated Dec 23, 2024 • 2 • 1

itlwas/Mistral-Small-Instruct-2409-Q4_K_M-GGUF

22B • Updated Dec 24, 2024 • 15

nintwentydo/pixtral-12b-FP8-dynamic-FP8-KV-cache

Image-Text-to-Text • 13B • Updated Jan 6 • 2 • 1

matrixportalx/Ministral-8B-Instruct-2410-Q4_0-GGUF

8B • Updated Jan 1 • 6

adriabama06/SmallThinker-3B-Preview-AWQ

Text Generation • Updated Jan 3 • 2 • 1

matrixportalx/Ministral-8B-Instruct-2410-Q4_K_M-GGUF

8B • Updated Jan 2 • 7 • 1

matrixportalx/Ministral-8B-Instruct-2410-Q4_K_S-GGUF

8B • Updated Jan 2 • 6

RedHatAI/Mixtral-8x22B-v0.1-quantized.w4a16

18B • Updated Jan 3 • 4

RedHatAI/Mixtral-8x7B-v0.1-quantized.w4a16

6B • Updated Mar 1 • 74

RedHatAI/QwQ-32B-Preview-FP8-dynamic

Text Generation • 33B • Updated Jan 3 • 7

RedHatAI/QwQ-32B-Preview-quantized.w4a16

6B • Updated Jan 3 • 47

RedHatAI/Llama-3.1-Nemotron-70B-Instruct-HF-quantized.w4a16

Text Generation • 11B • Updated Jan 3 • 30

RedHatAI/Llama-3.1-Nemotron-70B-Instruct-HF-quantized.w8a8

Text Generation • 71B • Updated Jan 3 • 5