copy-paste woes - NVFP4A16 can be run without hardware NVFP4
Browse files
README.md
CHANGED
|
@@ -35,11 +35,6 @@ NVFP4 writeups:
|
|
| 35 |
|
| 36 |
The model was tested with vLLM + 1x or 2x RTX Pro 6000, here is a script suitable for such configuration with 131072 context length.
|
| 37 |
|
| 38 |
-
### Hardware
|
| 39 |
-
|
| 40 |
-
As of October 2025, this quantized model can only be run on architectures with hardware FP4 support (Blackwell or later).
|
| 41 |
-
Cheaper GPUs with 24GB of VRAM (RTX 5080 Super) that can run this model in pairs are expected in Q1 2026.
|
| 42 |
-
|
| 43 |
### Recommendations
|
| 44 |
|
| 45 |
It is however recommended to use only 65K context to avoid significant degradation (https://fiction.live/stories/Fiction-liveBench-Sept-29-2025/oQdzQvKHw8JyXbN87)
|
|
|
|
| 35 |
|
| 36 |
The model was tested with vLLM + 1x or 2x RTX Pro 6000, here is a script suitable for such configuration with 131072 context length.
|
| 37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
### Recommendations
|
| 39 |
|
| 40 |
It is however recommended to use only 65K context to avoid significant degradation (https://fiction.live/stories/Fiction-liveBench-Sept-29-2025/oQdzQvKHw8JyXbN87)
|