nvidia
/

Llama-3_3-Nemotron-Super-49B-GenRM-Multilingual

Text Generation

Model card Files Files and versions

jiaqiz commited on May 29

Commit

065b1ae

·

verified ·

1 Parent(s): a20fc0d

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -127,12 +127,12 @@ We developed this model using Llama-3.3-Nemotron-Super-49B-v1 as its foundation.
 **Supported Hardware Microarchitecture Compatibility:** <br>
 * NVIDIA Ampere <br>
 * NVIDIA Hopper <br>
-* NVIDIA Turing <br>
 **Supported Operating System(s):** Linux <br>
 ## Quick Start
-We recommend serving the model with vLLM.
 ```
 pip install vllm==0.8.3
@@ -276,7 +276,7 @@ v1.0
 # Inference:
-**Engine:** [Triton](https://developer.nvidia.com/triton-inference-server) <br>
 **Test Hardware:** H100, A100 80GB, A100 40GB <br>

 **Supported Hardware Microarchitecture Compatibility:** <br>
 * NVIDIA Ampere <br>
 * NVIDIA Hopper <br>
 **Supported Operating System(s):** Linux <br>
 ## Quick Start
+We recommend serving the model with vLLM. You can use the model with 2 or more 80GB GPUs (NVIDIA Ampere or newer) with at least 100GB of free disk space to accomodate the download.
 ```
 pip install vllm==0.8.3
 # Inference:
+**Engine:** vLLM <br>
 **Test Hardware:** H100, A100 80GB, A100 40GB <br>