Edit Models filters

Inference Providers

HF Inference API

Misc

compressed-tensors

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Mixture of Experts

Carbon Emissions

Models

2,756

Full-text search

Active filters: compressed-tensors

edge-inference/DSR1-1.5B-llmc-awq-w4

Text Generation • 0.6B • Updated 29 days ago • 39 • 1

TheClusterDev/Qwen3-Next-80B-A3B-Instruct-FP8-Dynamic

Text Generation • Updated 5 days ago • 209k • 4

nm-testing/TinyLlama-1.1B-Chat-v1.0-W8A8-Static-Asym-e2e

1B • Updated about 4 hours ago • 133 • 1

cpatonn/Tongyi-DeepResearch-30B-A3B-AWQ-4bit

5B • Updated 11 days ago • 2.93k • 1

RedHatAI/Apertus-70B-Instruct-2509-FP8-dynamic

Text Generation • 71B • Updated 5 days ago • 138 • 1

cpatonn/Ling-flash-2.0-AWQ-8bit

Text Generation • 31B • Updated 9 days ago • 12 • 1

cpatonn/Magistral-Small-2509-AWQ-8bit

8B • Updated 9 days ago • 175 • 1

RedHatAI/Apertus-70B-Instruct-2509-quantized.w4a16

Text Generation • 11B • Updated 5 days ago • 25 • 1

nm-testing/tinyllama-one-shot-static-quant-test-compressed

Text Generation • 1B • Updated Oct 9, 2024 • 8

nm-testing/tinyllama-one-shot-dynamic-test

Text Generation • 1B • Updated Oct 9, 2024 • 9

nm-testing/tinyllama-one-shot-w4a16-group-packed

Text Generation • 0.3B • Updated Oct 10, 2024 • 18

nm-testing/tinyllama-one-shot-w4a16-channel-compressed

Text Generation • 1B • Updated Oct 9, 2024 • 17

nm-testing/tinyllama-one-shot-w4a16-channel-packed

Text Generation • 0.3B • Updated Oct 9, 2024 • 15

nm-testing/llama7b-one-shot-2_4-w4a16-packed

Text Generation • 1B • Updated Oct 9, 2024 • 14

nm-testing/tinyllama-one-shot-w4a16-group128-packed

Text Generation • 0.3B • Updated Oct 9, 2024 • 3

nm-testing/llama3-8b-w8_channel-a8_tensor-compressed

Text Generation • 8B • Updated Oct 9, 2024 • 8

nm-testing/llama7b-one-shot-2_4-w4a16-marlin24

Text Generation • 0.9B • Updated Jun 4, 2024 • 4

nm-testing/llama7b-one-shot-2_4-w4a16-group128-packed

Text Generation • 1B • Updated Jun 4, 2024 • 6

nm-testing/llama1.1b_0.5_sparse_bitmask

Text Generation • 0.8B • Updated Oct 9, 2024 • 3

nm-testing/llama7b-one-shot-2_4-w4a16-marlin24-t

Text Generation • 1B • Updated Oct 9, 2024 • 13.9k • 1

nm-testing/tinyllama-one-shot-w8a8-dynamic-channel

Text Generation • 1B • Updated Oct 9, 2024 • 8

nm-testing/llama7b-one-shot-2_4-w4a16-marlin24-t-alt

Text Generation • 0.9B • Updated Oct 9, 2024 • 5

nm-testing/tinyllama-marlin24-w4a16-group128

Text Generation • 0.3B • Updated Oct 9, 2024 • 4

nm-testing/tinyllama-oneshot-w8a8-static-v2

Text Generation • 1B • Updated Oct 9, 2024 • 28

nm-testing/tinyllama-oneshot-w8a8-dynamic-token-v2

Text Generation • 1B • Updated Oct 9, 2024 • 4.74k

nm-testing/tinyllama-oneshot-w8a8-static-v3

Text Generation • 1B • Updated Jun 17, 2024 • 7

nm-testing/tinyllama-oneshot-w8a8-dynamic-token-v3

Text Generation • 1B • Updated Jun 17, 2024 • 8

nm-testing/tinyllama-oneshot-w4a16-group128-v2

Text Generation • 0.3B • Updated Oct 9, 2024 • 4.59k

nm-testing/tinyllama-oneshot-w4a16-group128-v3

Text Generation • 0.3B • Updated Aug 19, 2024 • 5

nm-testing/tinyllama-oneshot-w4a16-channel-v2

Text Generation • 0.3B • Updated Oct 9, 2024 • 14.5k • 1