Safetensors
qwen2
shenwzh3 commited on
Commit
3ee23a4
Β·
verified Β·
1 Parent(s): c303212

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +189 -3
README.md CHANGED
@@ -1,3 +1,189 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ **Note:** For how to use QwenLong-CPRS-7B, please refer to the [github repo](https://github.com/Tongyi-Zhiwen/QwenLong-CPRS)
6
+
7
+ <p align="center" width="100%">
8
+ </p>
9
+
10
+ <div id="top" align="center">
11
+
12
+ QwenLong-CPRS: Towards ∞-LLMs with Dynamic Context Optimization
13
+ -----------------------------
14
+ [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
15
+ [![arXiv](https://img.shields.io/badge/arXiv-2505.18092-b31b1b.svg)](https://arxiv.org/abs/2505.18092)
16
+ [![GitHub](https://img.shields.io/badge/GitHub-TongyiZhiwen-4b32c3?logo=github)](https://github.com/Tongyi-Zhiwen)
17
+ [![ModelScope](https://img.shields.io/badge/πŸ€–%20ModelScope-purple)](https://modelscope.cn/models/iic/QwenLong-CPRS-7B)
18
+ [![HuggingFace](https://img.shields.io/badge/πŸ€—%20HuggingFace-yellow)](https://huggingface.co/Tongyi-Zhiwen)
19
+
20
+ <!-- **Authors:** -->
21
+
22
+ _**Weizhou Shen, Chenliang Li, Fanqi Wan, Shengyi Liao, Shaopeng Lai, Bo Zhang, Yingcheng Shi, Yuning Wu, Gang Fu, Zhansheng Li, Bin Yang,Ji Zhang, Fei Huang, Jingren Zhou, Ming Yan***_
23
+
24
+
25
+ <!-- **Affiliations:** -->
26
+
27
+
28
+
29
+ _Tongyi Lab, Alibaba Group_
30
+
31
+ <p align="center">
32
+ <img src="./assets/performance.png" width="100%"> <br>
33
+ </p>
34
+
35
+
36
+
37
+
38
+ </div>
39
+
40
+ ## πŸ“š Introduction
41
+
42
+ In this work, we present QwenLong-CPRS, a novel framework designed to optimize long-context processing through query-aware multi-granularity compression, outperforming RAG and sparse attention methods. Distinct from RAG's coarse chunk-level retrieval, it achieves precise information extraction via token-level content selection, enhancing accuracy. Unlike sparse attention (SA) requiring model retraining, it functions as a plug-and-play module compatible with any downstream LLMs while eliminating retraining demands. This dual advantage enables both fine-grained context optimization and seamless integration across architectures.
43
+
44
+
45
+ <p align="center">
46
+ <img src="./assets/concept.png" width="100%"> <br>
47
+ </p>
48
+
49
+ We implement QwenLong-CPRS with four key innovations:
50
+ * _**Controllable Context Optimization**_: Processes control prompts + queries to generate compact, task-specific context segments without retraining.
51
+
52
+ * _**Hybrid Attention Architecture**_: Combines bi-directional modeling (context localization) with causal LM (representation fidelity).
53
+
54
+ * _**LM-as-Critic Framework**_: Repurposes the pretrained LM head to score token relevance, preserving original knowledge while enabling compression.
55
+
56
+ * _**Window-Parrallel Inference**_: Splits long context into $w$-sized windows for parallel processing, reducing prefill complexity.
57
+
58
+
59
+
60
+
61
+
62
+ <p align="center">
63
+ <img src="./assets/framework.png" width="100%"> <br>
64
+ </p>
65
+
66
+ ## πŸŽ‰ News
67
+
68
+ - **May 26, 2025:** πŸ”₯ We release [πŸ€— QwenLong-CPRS-7B](https://huggingface.co/Tongyi-Zhiwen/QwenLong-CPRS-7B), a 7B context compression model designed for explicit long-context optimization.
69
+ πŸ”₯**Key Achievements**:
70
+ βœ… **Superior Performance**: Outperforms RAG and sparse attention in both accuracy and efficiency across five long-context benchmarks.
71
+ βœ… **Universal Compatibility**: Seamlessly integrates with all flagship LLMs (GPT-4o, Gemini 2.0-pro, Claude 3.7-sonnet, DeepSeek-v3, Qwen2.5-max), achieving 21.59Γ— context compression and +19.15 avg. performance boost.
72
+ βœ… **New SOTA**: When paired with Qwen2.5-32B-Instruct, it surpasses top proprietary models by +4.85 on Ruler-128K and +10.88 on InfiniteBench, setting a new SOTA.
73
+
74
+ - **May 24, 2025:** πŸ”₯ We release the πŸ’» [Demo Code](https://github.com/Tongyi-Zhiwen/QwenLong-CPRS/examples) for deploying the QwenLong-CPRS API and runing simpling long-context tasks with QwenLong-CPRS cascading an LLM.
75
+
76
+
77
+
78
+
79
+ ## 🎯 Model Results
80
+
81
+ Here are the evaluation results.
82
+
83
+ <p align="center">
84
+ <img src="./assets/main_res.png" width="100%"> <br>
85
+ </p>
86
+
87
+ <p align="center">
88
+ <img src="./assets/niah.png" width="100%"> <br>
89
+ </p>
90
+
91
+ <p align="center">
92
+ <img src="./assets/different_llm.png" width="100%"> <br>
93
+ </p>
94
+
95
+
96
+ <p align="center">
97
+ <img src="./assets/latency.png" width="100%"> <br>
98
+ </p>
99
+
100
+ ## πŸ› οΈ Requirements
101
+
102
+ ```bash
103
+ # Create the conda environment
104
+ conda create -n qwenlong-cprs python==3.10
105
+ conda activate qwenlong-cprs
106
+
107
+ # Install verl
108
+ cd QwenLong-CPRS
109
+ pip3 install -e .
110
+ ```
111
+
112
+ ## πŸš€ Quick Start
113
+
114
+ Here we provide how to run QwenLong-CPRS with LLM in a long-context task:
115
+
116
+ ### Step 1: Deploy local QwenLong-CPRS API
117
+ ```bash
118
+ cd QwenLong-CPRS/src/api_utils
119
+ export CUDA_VISIBLE_DEVICES=0
120
+ export MODEL_DIR="Tongyi-Zhiwen/QwenLong-CPRS-7B"
121
+ uvicorn run_api:app --port 8091 --host '0.0.0.0' --workers 1
122
+ ```
123
+
124
+ ### Step 2: Data Preparation
125
+ Download Ruler-128K test data from [huggingface-hub](https://huggingface.co/datasets/Tongyi-Zhiwen/ruler-128k-subset) and put it on the `data` folder.
126
+
127
+
128
+ ### Step 3: Run the demo inference code
129
+ ```bash
130
+ cd QwenLong-CPRS/examples
131
+ export LLM_APIKEY="" # your llm api key
132
+ export LLM_APIURL="" # your llm api url
133
+
134
+
135
+ # for NIAH tasks
136
+ export CPRS_PROMPT="You are an expert for information extraction, your task is to extract the 'needles' in the format of 'One of the special magic {type_needle_v} for {key} is: {value}.' from the documents to answer user's question.\N## tagging rule:\n- tag the needles with 'needle'"
137
+
138
+
139
+ python infer.py \
140
+ --model "your LLM model name" \
141
+ --input_path "data/niah.jsonl" \
142
+ --output_path "output/llm_with_cprs/niah.jsonl" \
143
+ --cprs_prompt "$CPRS_PROMPT" \
144
+ --use_compress True
145
+
146
+
147
+ # for qa tasks
148
+ export CPRS_PROMPT="You are an expert for information extraction, your task is to extract some sentences from the documents as the supporting facts of the user's question.\n## tagging rule:\n- tag the supporting facts with 'fact'"
149
+ python infer.py \
150
+ --model "your LLM model name" \
151
+ --input_path "data/qa.jsonl" \
152
+ --output_path "output/llm_with_cprs/niah.jsonl" \
153
+ --cprs_prompt "$CPRS_PROMPT" \
154
+ --use_compress True
155
+
156
+
157
+
158
+ # for variable tracking tasks
159
+ export CPRS_PROMPT="You are an expert for information extraction, your task is to extract the assigment chains like 'VAR XXX1 = XXX2' from the documents to answer the question of the user.\n## tagging rule:\n- tag the assignment chains with 'chain'"
160
+ python infer.py \
161
+ --model "your LLM model name" \
162
+ --input_path "data/qa.jsonl" \
163
+ --output_path "output/llm_with_cprs/niah.jsonl" \
164
+ --cprs_prompt "$CPRS_PROMPT" \
165
+ --use_compress True
166
+ ```
167
+
168
+
169
+ ## 🌐 Join the Community
170
+ Chinese users can scan QR codes to join DingTalk/WeChat groups.
171
+
172
+ | WeChat | DingTalk |
173
+ |----------|---------|
174
+ | ![Alt Text](./assets/wechat_group.JPG) | ![Alt Text](./assets/dingding_group.png) |
175
+
176
+ ## πŸ“ Citation
177
+
178
+ If you find this work is relevant with your research or applications, please feel free to cite our work!
179
+ ```
180
+ @misc{shen2025qwenlongcprsinftyllmsdynamiccontext,
181
+ title={QwenLong-CPRS: Towards $\infty$-LLMs with Dynamic Context Optimization},
182
+ author={Weizhou Shen and Chenliang Li and Fanqi Wan and Shengyi Liao and Shaopeng Lai and Bo Zhang and Yingcheng Shi and Yuning Wu and Gang Fu and Zhansheng Li and Bin Yang and Ji Zhang and Fei Huang and Jingren Zhou and Ming Yan},
183
+ year={2025},
184
+ eprint={2505.18092},
185
+ archivePrefix={arXiv},
186
+ primaryClass={cs.CL},
187
+ url={https://arxiv.org/abs/2505.18092},
188
+ }
189
+ ```