nvidia
/

gpt-oss-120b-Eagle3

Text Generation

Model Optimizer

Model card Files Files and versions

yeyu-nvidia commited on 15 days ago

Commit

59d96a4

·

1 Parent(s): 111cc0e

update AR and deploy instrunctions

Files changed (1) hide show

README.md +15 -15

README.md CHANGED Viewed

@@ -123,16 +123,16 @@ trtllm-serve <gpt-oss-120b checkpoint> --host 0.0.0.0 --port 8000 --backend pyto
 `extra-llm-api-config.yml` is like this
 ```sh
 enable_attention_dp: false
-pytorch_backend_config:
-  enable_overlap_scheduler: false
-  use_cuda_graph: true
-  cuda_graph_max_batch_size: 1
-  autotuner_enabled: false
 speculative_config:
     decoding_type: Eagle
     max_draft_len: 3
-    pytorch_eagle_weights_path: <eagle3 checkpoint>
 kv_cache_config:
     enable_block_reuse: false
@@ -144,14 +144,14 @@ The Eagle acceptance rate benchmark results (MT-Bench) with draft length 3 are p
 | Category   | MT Bench Acceptance Rate |
 |:-----------|:------------------------:|
-| writing    |            2.11            |
-| roleplay   |           2.00           |
-| reasoning  |           2.35           |
-| math       |           2.73           |
-| coding     |           2.46           |
-| extraction |           2.50           |
-| stem       |           2.09           |
-| humanities |           1.92           |
 ## Ethical Considerations
@@ -204,4 +204,4 @@ SUBCARDS:
 |How often is dataset reviewed?|Before Release|
 |Is there provenance for all datasets used in training?|Yes|
 |Does data labeling (annotation, metadata) comply with privacy laws?|Yes|
-|Applicable NVIDIA Privacy Policy|https://www.nvidia.com/en-us/about-nvidia/privacy-policy/|

 `extra-llm-api-config.yml` is like this
 ```sh
 enable_attention_dp: false
+disable_overlap_scheduler: true
+enable_autotuner: false
+cuda_graph_config:
+    max_batch_size: 1
 speculative_config:
     decoding_type: Eagle
     max_draft_len: 3
+    speculative_model_dir: <eagle3 checkpoint>
 kv_cache_config:
     enable_block_reuse: false
 | Category   | MT Bench Acceptance Rate |
 |:-----------|:------------------------:|
+| writing    |           2.24           |
+| roleplay   |           2.25           |
+| reasoning  |           2.47           |
+| math       |           2.83           |
+| coding     |           2.51           |
+| extraction |           2.53           |
+| stem       |           2.17           |
+| humanities |           1.95           |
 ## Ethical Considerations
 |How often is dataset reviewed?|Before Release|
 |Is there provenance for all datasets used in training?|Yes|
 |Does data labeling (annotation, metadata) comply with privacy laws?|Yes|
+|Applicable NVIDIA Privacy Policy|https://www.nvidia.com/en-us/about-nvidia/privacy-policy/|