Gausson
/

pythia-160m-deduped-SepLLM

sepllm_gpt_neox

Model card Files Files and versions

pythia-160m-deduped-SepLLM / README.md

Gausson's picture

Update README.md

152c8dc verified 4 months ago

|

3.15 kB

	---
	license: mit
	---


	Please refer to the [SepLLM paper - ICML 2025](https://arxiv.org/abs/2412.12094) and our [`GitHub repository`](https://github.com/HKUDS/SepLLM) for using this model.

	To use the checkpoint of this model, you must install the `transformers-4.38.0.post1+sepllm-py3-none-any.whl` released from our [`GitHub repository`](https://github.com/HKUDS/SepLLM). Below are the reference script for testing and a sample of test results. We conducted testing using `lm_eval==0.4.0`.

	This model has the same config as `Gausson/pythia-160m-deduped-n128-SepLLM`.

	```
	CUDA_LAUNCH_BLOCKING=1
	lm_eval --model hf \
	--model_args pretrained=Gausson/pythia-160m-deduped-SepLLM \
	--tasks arc_challenge,arc_easy,lambada_openai,logiqa,piqa,sciq,winogrande,wsc,wikitext \
	--num_fewshot 5 \
	--device cuda:0\
	--batch_size 32
	```

	```
	hf (pretrained=Gausson/pythia-160m-deduped-SepLLM), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 32
	\| Tasks \|Version\|Filter\|n-shot\| Metric \| \| Value \| \|Stderr\|
	\|--------------\|------:\|------\|-----:\|---------------\|---\|------:\|---\|------\|
	\|arc_challenge \| 1\|none \| 5\|acc \|↑ \| 0.2014\|± \|0.0117\|
	\| \| \|none \| 5\|acc_norm \|↑ \| 0.2346\|± \|0.0124\|
	\|arc_easy \| 1\|none \| 5\|acc \|↑ \| 0.4731\|± \|0.0102\|
	\| \| \|none \| 5\|acc_norm \|↑ \| 0.4520\|± \|0.0102\|
	\|lambada_openai\| 1\|none \| 5\|acc \|↑ \| 0.3315\|± \|0.0066\|
	\| \| \|none \| 5\|perplexity \|↓ \|30.1605\|± \|1.0128\|
	\|logiqa \| 1\|none \| 5\|acc \|↑ \| 0.2273\|± \|0.0164\|
	\| \| \|none \| 5\|acc_norm \|↑ \| 0.2857\|± \|0.0177\|
	\|piqa \| 1\|none \| 5\|acc \|↑ \| 0.6464\|± \|0.0112\|
	\| \| \|none \| 5\|acc_norm \|↑ \| 0.6447\|± \|0.0112\|
	\|sciq \| 1\|none \| 5\|acc \|↑ \| 0.8260\|± \|0.0120\|
	\| \| \|none \| 5\|acc_norm \|↑ \| 0.8150\|± \|0.0123\|
	\|wikitext \| 2\|none \| 5\|bits_per_byte \|↓ \| 0.9207\|± \| N/A\|
	\| \| \|none \| 5\|byte_perplexity\|↓ \| 1.8931\|± \| N/A\|
	\| \| \|none \| 5\|word_perplexity\|↓ \|30.3488\|± \| N/A\|
	\|winogrande \| 1\|none \| 5\|acc \|↑ \| 0.5304\|± \|0.0140\|
	\|wsc \| 1\|none \| 5\|acc \|↑ \| 0.3750\|± \|0.0477\|
	```

	If you find our work helpful, please consider giving us a star ⭐ @ our [`GitHub repository`](https://github.com/HKUDS/SepLLM) and citing our paper. We greatly appreciate your support 😄
	```
	@inproceedings{chen2025sepllm,
	title={{SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator}},
	author={Chen, Guoxuan and Shi, Han and Li, Jiawei and Gao, Yihang and Ren, Xiaozhe and Chen, Yimeng and Jiang, Xin and Li, Zhenguo and Liu, Weiyang and Huang, Chao},
	booktitle={Proceedings of the Forty-Second International Conference on Machine Learning (ICML)},
	year={2025},
	note={Also available at arXiv:2412.12094}
	}
	```