Update README.md
Browse files
README.md
CHANGED
@@ -264,9 +264,9 @@ We recommend using `lm_eval==0.4.9` for downstream task evaluation. You can pass
|
|
264 |
lm_eval --model hf \
|
265 |
--model_args pretrained=meta-llama/Meta-Llama-3-8B-Instruct,attn_implementation=flash_attention_2 \
|
266 |
--tasks gsm8k_cot \
|
267 |
-
--gen_kwargs custom_generate=
|
268 |
-
|
269 |
-
|
270 |
```
|
271 |
Note: `SepCache` is typically used in combination with `Flash Attention` to maximize generation efficiency.
|
272 |
|
|
|
264 |
lm_eval --model hf \
|
265 |
--model_args pretrained=meta-llama/Meta-Llama-3-8B-Instruct,attn_implementation=flash_attention_2 \
|
266 |
--tasks gsm8k_cot \
|
267 |
+
--gen_kwargs custom_generate=transformers-community/sep_cache,trust_remote_code=True,monkey_patch_verbose=True,init_cache_size=4,sep_cache_size=128,local_size=256,cache_size=512,separator_token_ids="128000;13;11;30;0;26;25;198;220;662;1174;949;758;2652;551;720;256;262",PADDING_ID=128009 \
|
268 |
+
--device cuda:0\
|
269 |
+
--batch_size 80 2>&1 | tee log.txt
|
270 |
```
|
271 |
Note: `SepCache` is typically used in combination with `Flash Attention` to maximize generation efficiency.
|
272 |
|