Commit
·
59d96a4
1
Parent(s):
111cc0e
update AR and deploy instrunctions
Browse files
README.md
CHANGED
@@ -123,16 +123,16 @@ trtllm-serve <gpt-oss-120b checkpoint> --host 0.0.0.0 --port 8000 --backend pyto
|
|
123 |
`extra-llm-api-config.yml` is like this
|
124 |
```sh
|
125 |
enable_attention_dp: false
|
126 |
-
|
127 |
-
|
128 |
-
|
129 |
-
|
130 |
-
|
131 |
|
132 |
speculative_config:
|
133 |
decoding_type: Eagle
|
134 |
max_draft_len: 3
|
135 |
-
|
136 |
|
137 |
kv_cache_config:
|
138 |
enable_block_reuse: false
|
@@ -144,14 +144,14 @@ The Eagle acceptance rate benchmark results (MT-Bench) with draft length 3 are p
|
|
144 |
|
145 |
| Category | MT Bench Acceptance Rate |
|
146 |
|:-----------|:------------------------:|
|
147 |
-
| writing |
|
148 |
-
| roleplay | 2.
|
149 |
-
| reasoning | 2.
|
150 |
-
| math | 2.
|
151 |
-
| coding | 2.
|
152 |
-
| extraction | 2.
|
153 |
-
| stem | 2.
|
154 |
-
| humanities | 1.
|
155 |
|
156 |
## Ethical Considerations
|
157 |
|
@@ -204,4 +204,4 @@ SUBCARDS:
|
|
204 |
|How often is dataset reviewed?|Before Release|
|
205 |
|Is there provenance for all datasets used in training?|Yes|
|
206 |
|Does data labeling (annotation, metadata) comply with privacy laws?|Yes|
|
207 |
-
|Applicable NVIDIA Privacy Policy|https://www.nvidia.com/en-us/about-nvidia/privacy-policy/|
|
|
|
123 |
`extra-llm-api-config.yml` is like this
|
124 |
```sh
|
125 |
enable_attention_dp: false
|
126 |
+
disable_overlap_scheduler: true
|
127 |
+
enable_autotuner: false
|
128 |
+
|
129 |
+
cuda_graph_config:
|
130 |
+
max_batch_size: 1
|
131 |
|
132 |
speculative_config:
|
133 |
decoding_type: Eagle
|
134 |
max_draft_len: 3
|
135 |
+
speculative_model_dir: <eagle3 checkpoint>
|
136 |
|
137 |
kv_cache_config:
|
138 |
enable_block_reuse: false
|
|
|
144 |
|
145 |
| Category | MT Bench Acceptance Rate |
|
146 |
|:-----------|:------------------------:|
|
147 |
+
| writing | 2.24 |
|
148 |
+
| roleplay | 2.25 |
|
149 |
+
| reasoning | 2.47 |
|
150 |
+
| math | 2.83 |
|
151 |
+
| coding | 2.51 |
|
152 |
+
| extraction | 2.53 |
|
153 |
+
| stem | 2.17 |
|
154 |
+
| humanities | 1.95 |
|
155 |
|
156 |
## Ethical Considerations
|
157 |
|
|
|
204 |
|How often is dataset reviewed?|Before Release|
|
205 |
|Is there provenance for all datasets used in training?|Yes|
|
206 |
|Does data labeling (annotation, metadata) comply with privacy laws?|Yes|
|
207 |
+
|Applicable NVIDIA Privacy Policy|https://www.nvidia.com/en-us/about-nvidia/privacy-policy/|
|