yeyu-nvidia commited on
Commit
59d96a4
·
1 Parent(s): 111cc0e

update AR and deploy instrunctions

Browse files
Files changed (1) hide show
  1. README.md +15 -15
README.md CHANGED
@@ -123,16 +123,16 @@ trtllm-serve <gpt-oss-120b checkpoint> --host 0.0.0.0 --port 8000 --backend pyto
123
  `extra-llm-api-config.yml` is like this
124
  ```sh
125
  enable_attention_dp: false
126
- pytorch_backend_config:
127
- enable_overlap_scheduler: false
128
- use_cuda_graph: true
129
- cuda_graph_max_batch_size: 1
130
- autotuner_enabled: false
131
 
132
  speculative_config:
133
  decoding_type: Eagle
134
  max_draft_len: 3
135
- pytorch_eagle_weights_path: <eagle3 checkpoint>
136
 
137
  kv_cache_config:
138
  enable_block_reuse: false
@@ -144,14 +144,14 @@ The Eagle acceptance rate benchmark results (MT-Bench) with draft length 3 are p
144
 
145
  | Category | MT Bench Acceptance Rate |
146
  |:-----------|:------------------------:|
147
- | writing | 2.11 |
148
- | roleplay | 2.00 |
149
- | reasoning | 2.35 |
150
- | math | 2.73 |
151
- | coding | 2.46 |
152
- | extraction | 2.50 |
153
- | stem | 2.09 |
154
- | humanities | 1.92 |
155
 
156
  ## Ethical Considerations
157
 
@@ -204,4 +204,4 @@ SUBCARDS:
204
  |How often is dataset reviewed?|Before Release|
205
  |Is there provenance for all datasets used in training?|Yes|
206
  |Does data labeling (annotation, metadata) comply with privacy laws?|Yes|
207
- |Applicable NVIDIA Privacy Policy|https://www.nvidia.com/en-us/about-nvidia/privacy-policy/|
 
123
  `extra-llm-api-config.yml` is like this
124
  ```sh
125
  enable_attention_dp: false
126
+ disable_overlap_scheduler: true
127
+ enable_autotuner: false
128
+
129
+ cuda_graph_config:
130
+ max_batch_size: 1
131
 
132
  speculative_config:
133
  decoding_type: Eagle
134
  max_draft_len: 3
135
+ speculative_model_dir: <eagle3 checkpoint>
136
 
137
  kv_cache_config:
138
  enable_block_reuse: false
 
144
 
145
  | Category | MT Bench Acceptance Rate |
146
  |:-----------|:------------------------:|
147
+ | writing | 2.24 |
148
+ | roleplay | 2.25 |
149
+ | reasoning | 2.47 |
150
+ | math | 2.83 |
151
+ | coding | 2.51 |
152
+ | extraction | 2.53 |
153
+ | stem | 2.17 |
154
+ | humanities | 1.95 |
155
 
156
  ## Ethical Considerations
157
 
 
204
  |How often is dataset reviewed?|Before Release|
205
  |Is there provenance for all datasets used in training?|Yes|
206
  |Does data labeling (annotation, metadata) comply with privacy laws?|Yes|
207
+ |Applicable NVIDIA Privacy Policy|https://www.nvidia.com/en-us/about-nvidia/privacy-policy/|