Gausson
/

pythia-160m-deduped-n64-StreamingLLM

sepllm_gpt_neox

Model card Files Files and versions Community

pythia-160m-deduped-n64-StreamingLLM / README.md

Gausson's picture

Update README.md

35d438b verified 2 months ago

|

history blame contribute delete

3.12 kB

	---
	license: mit
	---

	Please refer to the [SepLLM paper - ICML 2025](https://arxiv.org/abs/2412.12094), [StreamingLLM paper - ICLR 2024](https://arxiv.org/abs/2309.17453), and our [`GitHub repository`](https://github.com/HKUDS/SepLLM) for using this model.

	To use the checkpoint of this model, you must install the `transformers-4.38.0.post1+sepllm-py3-none-any.whl` released from our [`GitHub repository`](https://github.com/HKUDS/SepLLM). Below are the reference script for testing and a sample of test results. We conducted testing using `lm_eval==0.4.0`.

	```
	CUDA_LAUNCH_BLOCKING=1
	lm_eval --model hf \
	--model_args pretrained=Gausson/pythia-160m-deduped-n64-StreamingLLM \
	--tasks arc_challenge,arc_easy,lambada_openai,logiqa,piqa,sciq,winogrande,wsc,wikitext \
	--num_fewshot 5 \
	--device cuda:0\
	--batch_size 32
	```

	```
	hf (pretrained=Gausson/pythia-160m-deduped-n64-StreamingLLM), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 32
	\| Tasks \|Version\|Filter\|n-shot\| Metric \| \| Value \| \|Stderr\|
	\|--------------\|------:\|------\|-----:\|---------------\|---\|------:\|---\|------\|
	\|arc_challenge \| 1\|none \| 5\|acc \|↑ \| 0.2056\|± \|0.0118\|
	\| \| \|none \| 5\|acc_norm \|↑ \| 0.2517\|± \|0.0127\|
	\|arc_easy \| 1\|none \| 5\|acc \|↑ \| 0.4739\|± \|0.0102\|
	\| \| \|none \| 5\|acc_norm \|↑ \| 0.4478\|± \|0.0102\|
	\|lambada_openai\| 1\|none \| 5\|acc \|↑ \| 0.2672\|± \|0.0062\|
	\| \| \|none \| 5\|perplexity \|↓ \|44.0211\|± \|1.5224\|
	\|logiqa \| 1\|none \| 5\|acc \|↑ \| 0.2212\|± \|0.0163\|
	\| \| \|none \| 5\|acc_norm \|↑ \| 0.2488\|± \|0.0170\|
	\|piqa \| 1\|none \| 5\|acc \|↑ \| 0.6376\|± \|0.0112\|
	\| \| \|none \| 5\|acc_norm \|↑ \| 0.6349\|± \|0.0112\|
	\|sciq \| 1\|none \| 5\|acc \|↑ \| 0.7570\|± \|0.0136\|
	\| \| \|none \| 5\|acc_norm \|↑ \| 0.7100\|± \|0.0144\|
	\|wikitext \| 2\|none \| 5\|bits_per_byte \|↓ \| 0.9686\|± \| N/A\|
	\| \| \|none \| 5\|byte_perplexity\|↓ \| 1.9569\|± \| N/A\|
	\| \| \|none \| 5\|word_perplexity\|↓ \|36.2348\|± \| N/A\|
	\|winogrande \| 1\|none \| 5\|acc \|↑ \| 0.5335\|± \|0.0140\|
	\|wsc \| 1\|none \| 5\|acc \|↑ \| 0.4327\|± \|0.0488\|
	```

	If you find our work helpful, please consider giving us a star ⭐ @ our [`GitHub repository`](https://github.com/HKUDS/SepLLM) and citing our paper. We greatly appreciate your support 😄
	```
	@inproceedings{chen2025sepllm,
	title={{SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator}},
	author={Chen, Guoxuan and Shi, Han and Li, Jiawei and Gao, Yihang and Ren, Xiaozhe and Chen, Yimeng and Jiang, Xin and Li, Zhenguo and Liu, Weiyang and Huang, Chao},
	booktitle={International Conference on Machine Learning},
	year={2025},
	note={Also available at arXiv:2412.12094}
	}
	```