Edit Models filters

Apps

Inference Providers

HF Inference API

Misc

compressed-tensors

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Mixture of Experts

Carbon Emissions

Models

3,016

Full-text search

Active filters: compressed-tensors

nm-testing/TinyLlama-1.1B-Chat-v1.0-FP8_BLOCK-e2e

1B • Updated 2 days ago • 24

nm-testing/TinyLlama-1.1B-Chat-v1.0-FP8_DYNAMIC-e2e

1B • Updated 2 days ago • 299

nm-testing/TinyLlama-1.1B-Chat-v1.0-FP8-e2e

1B • Updated about 2 hours ago • 6.06k

nm-testing/TinyLlama-1.1B-Chat-v1.0-FP8A16_channel-e2e

1B • Updated 2 days ago • 96

nm-testing/TinyLlama-1.1B-Chat-v1.0-FP8A16_tensor-e2e

1B • Updated 2 days ago • 120

nm-testing/TinyLlama-1.1B-Chat-v1.0-W8A8_channel_weight_static_per_tensor-e2e

1B • Updated about 2 hours ago • 180

nm-testing/TinyLlama-1.1B-Chat-v1.0-W8A8-e2e

1B • Updated 2 days ago • 122

nm-testing/TinyLlama-1.1B-Chat-v1.0-W8A8_tensor_weight_static_per_tensor_act-e2e

1B • Updated about 2 hours ago • 196

nm-testing/TinyLlama-1.1B-Chat-v1.0-kv_cache_default_gptq_tinyllama-e2e

0.3B • Updated about 2 hours ago • 152

nm-testing/Phi-3-mini-4k-instruct-kv_cache_default_phi3-e2e

4B • Updated about 2 hours ago • 177

nm-testing/TinyLlama-1.1B-Chat-v1.0-kv_cache_default_tinyllama-e2e

1B • Updated about 2 hours ago • 146

nm-testing/TinyLlama-1.1B-Chat-v1.0-sparse2of4_fp8_dynamic-e2e

0.7B • Updated about 2 hours ago • 162

nm-testing/TinyLlama-1.1B-Chat-v1.0-sparse2of4_only-e2e

0.7B • Updated about 2 hours ago • 173

nm-testing/TinyLlama-1.1B-Chat-v1.0-W4A16_2of4_channel-e2e

0.3B • Updated about 2 hours ago • 165

nm-testing/TinyLlama-1.1B-Chat-v1.0-W4A16_2of4-e2e

0.3B • Updated about 2 hours ago • 159

nm-testing/TinyLlama-1.1B-Chat-v1.0-actorder-group-e2e

0.3B • Updated about 2 hours ago • 285

nm-testing/TinyLlama-1.1B-Chat-v1.0-actorder-weight-e2e

0.3B • Updated about 2 hours ago • 159

nm-testing/TinyLlama-1.1B-Chat-v1.0-W4A16_channel-e2e

0.3B • Updated about 2 hours ago • 170

nm-testing/TinyLlama-1.1B-Chat-v1.0-W4A16-e2e

0.3B • Updated about 2 hours ago • 173

RedHatAI/Llama-4-Maverick-17B-128E-Instruct-NVFP4

Text Generation • 229B • Updated about 14 hours ago • 35

nm-testing/TinyLlama-1.1B-Chat-v1.0-w4a16-asym-awq-e2e

0.3B • Updated about 1 hour ago • 142

nm-testing/TinyLlama-1.1B-Chat-v1.0-w4a16-sym-awq-e2e

0.3B • Updated about 1 hour ago • 136

nm-testing/TinyLlama-1.1B-Chat-v1.0-W8A16_channel-e2e

0.4B • Updated about 1 hour ago • 159

nm-testing/TinyLlama-1.1B-Chat-v1.0-W8A16-e2e

0.4B • Updated about 1 hour ago • 151

nm-testing/TinyLlama-1.1B-Chat-v1.0-W8A8-Dynamic-Asym-e2e

1B • Updated about 1 hour ago • 146

nm-testing/TinyLlama-1.1B-Chat-v1.0-W8A8-Static-Asym-e2e

1B • Updated about 1 hour ago • 136

taint-technica/DeepSeek-R1-0528-GPU

106B • Updated 3 days ago • 68

Firworks/Magistral-Small-2509-Text-Only-nvfp4

14B • Updated 3 days ago • 7

Firworks/Magistral-Small-2509-36B-Text-Only-nvfp4

20B • Updated 3 days ago • 25

maywell/Qwen3-Embedding-8B-FP8-Dynamic

8B • Updated 3 days ago • 20