Big thanks to ymcki for updating the llama.cpp code to support the 'dummy' layers. Use the llama.cpp branch from this PR: https://github.com/ggml-org/llama.cpp/pull/12843 if it hasn't been merged yet.
Note the imatrix data used for the IQ quants has been produced from the Q4 quant!
'Make knowledge free for everyone'
Quantized version of: nvidia/Llama-3_1-Nemotron-Ultra-253B-v1
- Downloads last month
- 1,746
Hardware compatibility
Log In
to view the estimation
1-bit
2-bit
3-bit
4-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no library tag.
Model tree for DevQuasar/nvidia.Llama-3_1-Nemotron-Ultra-253B-v1-GGUF
Base model
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1