Files changed (1) hide show
  1. README.md +195 -183
README.md CHANGED
@@ -1,184 +1,196 @@
1
- ---
2
- license: mit
3
- library_name: transformers
4
- datasets:
5
- - jhu-clsp/rank1-training-data
6
- base_model:
7
- - Qwen/Qwen2.5-14B
8
- pipeline_tag: text-ranking
9
- tags:
10
- - reranker
11
- - retrieval
12
- language:
13
- - en
14
- ---
15
-
16
- # rank1-14b: Test-Time Compute for Reranking in Information Retrieval
17
-
18
- 📄 [Paper](https://arxiv.org/abs/2502.18418) | 🚀 [GitHub Repository](https://github.com/orionw/rank1)
19
-
20
- rank1 is a reasoning reranker model that "thinks" before making relevance judgments. This 14B parameter model is trained from the Qwen2.5-14B base model and leverages test-time compute to generate reasoning chains before deciding if a document is relevant to a query.
21
-
22
- ## Model Description
23
-
24
- rank1 introduces a novel approach to information retrieval by generating explicit reasoning chains before making relevance judgments. Unlike traditional rerankers that directly output scores, rank1:
25
-
26
- 1. Receives a query and document pair
27
- 2. Generates a reasoning chain within a `<think>...</think>` section
28
- 3. Makes a binary relevance judgment (`true` or `false`)
29
- 4. Returns a confidence score based on the logits of the true/false tokens
30
-
31
- This approach helps the model break down complex relevance decisions into logical steps, improving performance across diverse retrieval tasks.
32
-
33
- ## Model Family
34
-
35
- | Model | Base | Description |
36
- |:------|:-----|:------------|
37
- | [rank1-0.5b](https://huggingface.co/jhu-clsp/rank1-0.5b) | Qwen2.5-0.5B | Smallest variant (0.5B parameters) |
38
- | [rank1-1.5b](https://huggingface.co/jhu-clsp/rank1-1.5b) | Qwen2.5-1.5B | Smaller variant (1.5B parameters) |
39
- | [rank1-3b](https://huggingface.co/jhu-clsp/rank1-3b) | Qwen2.5-3B | Smaller variant (3B parameters) |
40
- | [rank1-7b](https://huggingface.co/jhu-clsp/rank1-7b) | Qwen2.5-7B | Smaller variant (7B parameters) |
41
- | [rank1-14b](https://huggingface.co/jhu-clsp/rank1-14b) | Qwen2.5-14B | Current model (14B parameters) |
42
- | [rank1-32b](https://huggingface.co/jhu-clsp/rank1-32b) | Qwen2.5-32B | Largest variant (32B parameters) |
43
- | [rank1-mistral-2501-24b](https://huggingface.co/jhu-clsp/rank1-mistral-2501-24b) | Mistral-Small 2501 24B | Trained from Mistral base |
44
- | [rank1-llama3-8b](https://huggingface.co/jhu-clsp/rank1-llama3-8b) | Llama 3.1 8B | Trained from Llama 3.1 base |
45
-
46
- ### Quantized Variants
47
-
48
- | Model | Description |
49
- |:------|:------------|
50
- | [rank1-7b-awq](https://huggingface.co/jhu-clsp/rank1-7b-awq) | Quantized version of rank1-7b |
51
- | [rank1-14b-awq](https://huggingface.co/jhu-clsp/rank1-14b-awq) | Quantized version of rank1-14b |
52
- | [rank1-32b-awq](https://huggingface.co/jhu-clsp/rank1-32b-awq) | Quantized version of rank1-32b |
53
- | [rank1-mistral-2501-24b-awq](https://huggingface.co/jhu-clsp/rank1-mistral-2501-24b-awq) | Quantized version of rank1-mistral-24b |
54
- | [rank1-llama3-8b-awq](https://huggingface.co/jhu-clsp/rank1-llama3-8b-awq) | Quantized version of rank1-llama3-8b |
55
-
56
- ## Associated Data and Resources
57
-
58
- | Resource | Description |
59
- |:---------|:------------|
60
- | [rank1-r1-msmarco](https://huggingface.co/datasets/jhu-clsp/rank1-r1-msmarco) | All R1 output examples from MS MARCO |
61
- | [rank1-training-data](https://huggingface.co/datasets/jhu-clsp/rank1-training-data) | Training data used for rank1 models |
62
- | [rank1-run-files](https://huggingface.co/datasets/jhu-clsp/rank1-run-files) | Pre-computed run files for use in top 100 doc reranking |
63
- | [GitHub Repository](https://github.com/orionw/rank1) | Official rank1 repository |
64
-
65
- ## Usage
66
- Note that official usage is found on the Github and accounts for edge cases. But for simple use cases the minimal example below works.
67
-
68
- <details>
69
- <summary>Click to expand: Minimal example with vLLM</summary>
70
-
71
- ```python
72
- from vllm import LLM, SamplingParams
73
- import math
74
-
75
- # Initialize the model with vLLM
76
- model = LLM(
77
- model="jhu-clsp/rank1-14b",
78
- tensor_parallel_size=1, # Number of GPUs
79
- trust_remote_code=True,
80
- max_model_len=16000, # Context length
81
- gpu_memory_utilization=0.9,
82
- dtype="float16",
83
- )
84
-
85
- # Set up sampling parameters
86
- sampling_params = SamplingParams(
87
- temperature=0,
88
- max_tokens=8192,
89
- logprobs=20,
90
- stop=["</think> true", "</think> false"],
91
- skip_special_tokens=False
92
- )
93
-
94
- # Prepare the prompt
95
- def create_prompt(query, document):
96
- return (
97
- "Determine if the following passage is relevant to the query. "
98
- "Answer only with 'true' or 'false'.\n"
99
- f"Query: {query}\n"
100
- f"Passage: {document}\n"
101
- "<think>"
102
- )
103
-
104
- # Example usage
105
- query = "What are the effects of climate change?"
106
- document = "Climate change leads to rising sea levels, extreme weather events, and disruptions to ecosystems. These effects are caused by increasing greenhouse gas concentrations in the atmosphere due to human activities."
107
-
108
- # Generate prediction
109
- prompt = create_prompt(query, document)
110
- outputs = model.generate([prompt], sampling_params)
111
-
112
- # Extract score
113
- output = outputs[0].outputs[0]
114
- text = output.text
115
- final_logits = output.logprobs[-1]
116
-
117
- # Get token IDs for "true" and "false" tokens
118
- from transformers import AutoTokenizer
119
- tokenizer = AutoTokenizer.from_pretrained("jhu-clsp/rank1-14b")
120
- true_token = tokenizer(" true", add_special_tokens=False).input_ids[0]
121
- false_token = tokenizer(" false", add_special_tokens=False).input_ids[0]
122
-
123
- # Calculate relevance score (probability of "true")
124
- true_logit = final_logits[true_token].logprob
125
- false_logit = final_logits[false_token].logprob
126
- true_score = math.exp(true_logit)
127
- false_score = math.exp(false_logit)
128
- relevance_score = true_score / (true_score + false_score)
129
-
130
- print(f"Reasoning chain: {text}")
131
- print(f"Relevance score: {relevance_score}")
132
- ```
133
-
134
- </details>
135
-
136
- ## Performance
137
-
138
- rank1-14b demonstrates strong performance on retrieval benchmarks, particularly on tasks requiring complex reasoning. The model's ability to "think through" relevance decisions makes it especially effective for nuanced topics.
139
-
140
- For specific benchmark results and comparisons with other models, please refer to the paper and the official GitHub repository.
141
-
142
- ## Installation
143
-
144
- Please see the Github for detailed installation instructions.
145
-
146
- ## MTEB Integration
147
-
148
- rank1 is compatible with the [MTEB benchmarking framework](https://github.com/embeddings-benchmark/mteb):
149
-
150
- ```python
151
- from mteb import MTEB
152
- from rank1 import rank1 # From the official repo
153
-
154
- # Initialize the model
155
- model = rank1(
156
- model_name_or_path="jhu-clsp/rank1-14b",
157
- num_gpus=1,
158
- device="cuda"
159
- )
160
-
161
- # Run evaluation on specific tasks
162
- evaluation = MTEB(tasks=["NevIR"])
163
- results = evaluation.run(model)
164
- ```
165
-
166
- ## Citation
167
-
168
- If you use rank1 in your research, please cite our work:
169
-
170
- ```bibtex
171
- @misc{weller2025rank1testtimecomputereranking,
172
- title={Rank1: Test-Time Compute for Reranking in Information Retrieval},
173
- author={Orion Weller and Kathryn Ricci and Eugene Yang and Andrew Yates and Dawn Lawrie and Benjamin Van Durme},
174
- year={2025},
175
- eprint={2502.18418},
176
- archivePrefix={arXiv},
177
- primaryClass={cs.IR},
178
- url={https://arxiv.org/abs/2502.18418},
179
- }
180
- ```
181
-
182
- ## License
183
-
 
 
 
 
 
 
 
 
 
 
 
 
184
  [MIT License](https://github.com/orionw/rank1/blob/main/LICENSE)
 
1
+ ---
2
+ license: mit
3
+ library_name: transformers
4
+ datasets:
5
+ - jhu-clsp/rank1-training-data
6
+ base_model:
7
+ - Qwen/Qwen2.5-14B
8
+ pipeline_tag: text-ranking
9
+ tags:
10
+ - reranker
11
+ - retrieval
12
+ language:
13
+ - zho
14
+ - eng
15
+ - fra
16
+ - spa
17
+ - por
18
+ - deu
19
+ - ita
20
+ - rus
21
+ - jpn
22
+ - kor
23
+ - vie
24
+ - tha
25
+ - ara
26
+ ---
27
+
28
+ # rank1-14b: Test-Time Compute for Reranking in Information Retrieval
29
+
30
+ 📄 [Paper](https://arxiv.org/abs/2502.18418) | 🚀 [GitHub Repository](https://github.com/orionw/rank1)
31
+
32
+ rank1 is a reasoning reranker model that "thinks" before making relevance judgments. This 14B parameter model is trained from the Qwen2.5-14B base model and leverages test-time compute to generate reasoning chains before deciding if a document is relevant to a query.
33
+
34
+ ## Model Description
35
+
36
+ rank1 introduces a novel approach to information retrieval by generating explicit reasoning chains before making relevance judgments. Unlike traditional rerankers that directly output scores, rank1:
37
+
38
+ 1. Receives a query and document pair
39
+ 2. Generates a reasoning chain within a `<think>...</think>` section
40
+ 3. Makes a binary relevance judgment (`true` or `false`)
41
+ 4. Returns a confidence score based on the logits of the true/false tokens
42
+
43
+ This approach helps the model break down complex relevance decisions into logical steps, improving performance across diverse retrieval tasks.
44
+
45
+ ## Model Family
46
+
47
+ | Model | Base | Description |
48
+ |:------|:-----|:------------|
49
+ | [rank1-0.5b](https://huggingface.co/jhu-clsp/rank1-0.5b) | Qwen2.5-0.5B | Smallest variant (0.5B parameters) |
50
+ | [rank1-1.5b](https://huggingface.co/jhu-clsp/rank1-1.5b) | Qwen2.5-1.5B | Smaller variant (1.5B parameters) |
51
+ | [rank1-3b](https://huggingface.co/jhu-clsp/rank1-3b) | Qwen2.5-3B | Smaller variant (3B parameters) |
52
+ | [rank1-7b](https://huggingface.co/jhu-clsp/rank1-7b) | Qwen2.5-7B | Smaller variant (7B parameters) |
53
+ | [rank1-14b](https://huggingface.co/jhu-clsp/rank1-14b) | Qwen2.5-14B | Current model (14B parameters) |
54
+ | [rank1-32b](https://huggingface.co/jhu-clsp/rank1-32b) | Qwen2.5-32B | Largest variant (32B parameters) |
55
+ | [rank1-mistral-2501-24b](https://huggingface.co/jhu-clsp/rank1-mistral-2501-24b) | Mistral-Small 2501 24B | Trained from Mistral base |
56
+ | [rank1-llama3-8b](https://huggingface.co/jhu-clsp/rank1-llama3-8b) | Llama 3.1 8B | Trained from Llama 3.1 base |
57
+
58
+ ### Quantized Variants
59
+
60
+ | Model | Description |
61
+ |:------|:------------|
62
+ | [rank1-7b-awq](https://huggingface.co/jhu-clsp/rank1-7b-awq) | Quantized version of rank1-7b |
63
+ | [rank1-14b-awq](https://huggingface.co/jhu-clsp/rank1-14b-awq) | Quantized version of rank1-14b |
64
+ | [rank1-32b-awq](https://huggingface.co/jhu-clsp/rank1-32b-awq) | Quantized version of rank1-32b |
65
+ | [rank1-mistral-2501-24b-awq](https://huggingface.co/jhu-clsp/rank1-mistral-2501-24b-awq) | Quantized version of rank1-mistral-24b |
66
+ | [rank1-llama3-8b-awq](https://huggingface.co/jhu-clsp/rank1-llama3-8b-awq) | Quantized version of rank1-llama3-8b |
67
+
68
+ ## Associated Data and Resources
69
+
70
+ | Resource | Description |
71
+ |:---------|:------------|
72
+ | [rank1-r1-msmarco](https://huggingface.co/datasets/jhu-clsp/rank1-r1-msmarco) | All R1 output examples from MS MARCO |
73
+ | [rank1-training-data](https://huggingface.co/datasets/jhu-clsp/rank1-training-data) | Training data used for rank1 models |
74
+ | [rank1-run-files](https://huggingface.co/datasets/jhu-clsp/rank1-run-files) | Pre-computed run files for use in top 100 doc reranking |
75
+ | [GitHub Repository](https://github.com/orionw/rank1) | Official rank1 repository |
76
+
77
+ ## Usage
78
+ Note that official usage is found on the Github and accounts for edge cases. But for simple use cases the minimal example below works.
79
+
80
+ <details>
81
+ <summary>Click to expand: Minimal example with vLLM</summary>
82
+
83
+ ```python
84
+ from vllm import LLM, SamplingParams
85
+ import math
86
+
87
+ # Initialize the model with vLLM
88
+ model = LLM(
89
+ model="jhu-clsp/rank1-14b",
90
+ tensor_parallel_size=1, # Number of GPUs
91
+ trust_remote_code=True,
92
+ max_model_len=16000, # Context length
93
+ gpu_memory_utilization=0.9,
94
+ dtype="float16",
95
+ )
96
+
97
+ # Set up sampling parameters
98
+ sampling_params = SamplingParams(
99
+ temperature=0,
100
+ max_tokens=8192,
101
+ logprobs=20,
102
+ stop=["</think> true", "</think> false"],
103
+ skip_special_tokens=False
104
+ )
105
+
106
+ # Prepare the prompt
107
+ def create_prompt(query, document):
108
+ return (
109
+ "Determine if the following passage is relevant to the query. "
110
+ "Answer only with 'true' or 'false'.\n"
111
+ f"Query: {query}\n"
112
+ f"Passage: {document}\n"
113
+ "<think>"
114
+ )
115
+
116
+ # Example usage
117
+ query = "What are the effects of climate change?"
118
+ document = "Climate change leads to rising sea levels, extreme weather events, and disruptions to ecosystems. These effects are caused by increasing greenhouse gas concentrations in the atmosphere due to human activities."
119
+
120
+ # Generate prediction
121
+ prompt = create_prompt(query, document)
122
+ outputs = model.generate([prompt], sampling_params)
123
+
124
+ # Extract score
125
+ output = outputs[0].outputs[0]
126
+ text = output.text
127
+ final_logits = output.logprobs[-1]
128
+
129
+ # Get token IDs for "true" and "false" tokens
130
+ from transformers import AutoTokenizer
131
+ tokenizer = AutoTokenizer.from_pretrained("jhu-clsp/rank1-14b")
132
+ true_token = tokenizer(" true", add_special_tokens=False).input_ids[0]
133
+ false_token = tokenizer(" false", add_special_tokens=False).input_ids[0]
134
+
135
+ # Calculate relevance score (probability of "true")
136
+ true_logit = final_logits[true_token].logprob
137
+ false_logit = final_logits[false_token].logprob
138
+ true_score = math.exp(true_logit)
139
+ false_score = math.exp(false_logit)
140
+ relevance_score = true_score / (true_score + false_score)
141
+
142
+ print(f"Reasoning chain: {text}")
143
+ print(f"Relevance score: {relevance_score}")
144
+ ```
145
+
146
+ </details>
147
+
148
+ ## Performance
149
+
150
+ rank1-14b demonstrates strong performance on retrieval benchmarks, particularly on tasks requiring complex reasoning. The model's ability to "think through" relevance decisions makes it especially effective for nuanced topics.
151
+
152
+ For specific benchmark results and comparisons with other models, please refer to the paper and the official GitHub repository.
153
+
154
+ ## Installation
155
+
156
+ Please see the Github for detailed installation instructions.
157
+
158
+ ## MTEB Integration
159
+
160
+ rank1 is compatible with the [MTEB benchmarking framework](https://github.com/embeddings-benchmark/mteb):
161
+
162
+ ```python
163
+ from mteb import MTEB
164
+ from rank1 import rank1 # From the official repo
165
+
166
+ # Initialize the model
167
+ model = rank1(
168
+ model_name_or_path="jhu-clsp/rank1-14b",
169
+ num_gpus=1,
170
+ device="cuda"
171
+ )
172
+
173
+ # Run evaluation on specific tasks
174
+ evaluation = MTEB(tasks=["NevIR"])
175
+ results = evaluation.run(model)
176
+ ```
177
+
178
+ ## Citation
179
+
180
+ If you use rank1 in your research, please cite our work:
181
+
182
+ ```bibtex
183
+ @misc{weller2025rank1testtimecomputereranking,
184
+ title={Rank1: Test-Time Compute for Reranking in Information Retrieval},
185
+ author={Orion Weller and Kathryn Ricci and Eugene Yang and Andrew Yates and Dawn Lawrie and Benjamin Van Durme},
186
+ year={2025},
187
+ eprint={2502.18418},
188
+ archivePrefix={arXiv},
189
+ primaryClass={cs.IR},
190
+ url={https://arxiv.org/abs/2502.18418},
191
+ }
192
+ ```
193
+
194
+ ## License
195
+
196
  [MIT License](https://github.com/orionw/rank1/blob/main/LICENSE)