transformers-community
/

sep_cache

custom_generate

Model card Files Files and versions

Gausson commited on Jul 15

Commit

094fa15

·

verified ·

1 Parent(s): 9142b02

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -264,9 +264,9 @@ We recommend using `lm_eval==0.4.9` for downstream task evaluation. You can pass
 lm_eval --model hf \
 	--model_args pretrained=meta-llama/Meta-Llama-3-8B-Instruct,attn_implementation=flash_attention_2 \
 	--tasks    gsm8k_cot  \
-	--gen_kwargs custom_generate=Gausson/sep_cache,trust_remote_code=True,monkey_patch_verbose=True,init_cache_size=4,sep_cache_size=128,local_size=256,cache_size=512,separator_token_ids="128000;13;11;30;0;26;25;198;220;662;1174;949;758;2652;551;720;256;262",PADDING_ID=128009\
-	--device cuda:0\
-	--batch_size 80 2>&1 | tee log.txt
 ```
 Note: `SepCache` is typically used in combination with `Flash Attention` to maximize generation efficiency.

 lm_eval --model hf \
 	--model_args pretrained=meta-llama/Meta-Llama-3-8B-Instruct,attn_implementation=flash_attention_2 \
 	--tasks    gsm8k_cot  \
+	--gen_kwargs custom_generate=transformers-community/sep_cache,trust_remote_code=True,monkey_patch_verbose=True,init_cache_size=4,sep_cache_size=128,local_size=256,cache_size=512,separator_token_ids="128000;13;11;30;0;26;25;198;220;662;1174;949;758;2652;551;720;256;262",PADDING_ID=128009 \
+    --device cuda:0\
+    --batch_size 80 2>&1 | tee log.txt
 ```
 Note: `SepCache` is typically used in combination with `Flash Attention` to maximize generation efficiency.