mratsim
/

L3.3-Ignition-v0.1-70B-NVFP4A16

@@ -35,11 +35,6 @@ NVFP4 writeups:
 The model was tested with vLLM + 1x or 2x RTX Pro 6000, here is a script suitable for such configuration with 131072 context length.
-### Hardware
-As of October 2025, this quantized model can only be run on architectures with hardware FP4 support (Blackwell or later).
-Cheaper GPUs with 24GB of VRAM (RTX 5080 Super) that can run this model in pairs are expected in Q1 2026.
 ### Recommendations
 It is however recommended to use only 65K context to avoid significant degradation (https://fiction.live/stories/Fiction-liveBench-Sept-29-2025/oQdzQvKHw8JyXbN87)

 The model was tested with vLLM + 1x or 2x RTX Pro 6000, here is a script suitable for such configuration with 131072 context length.
 ### Recommendations
 It is however recommended to use only 65K context to avoid significant degradation (https://fiction.live/stories/Fiction-liveBench-Sept-29-2025/oQdzQvKHw8JyXbN87)