nadiinchi commited on
Commit
3ee21cf
·
verified ·
1 Parent(s): c91f0bf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -10
README.md CHANGED
@@ -18,21 +18,33 @@ Provence is a lightweight **context pruning model** for retrieval-augmented gene
18
  * *Backbone model*: [DeBERTav3-reranker](https://huggingface.co/naver/trecdl22-crossencoder-debertav3) (trained from [DeBERTa-v3-large](https://huggingface.co/microsoft/deberta-v3-large))
19
  * *Model size*: 430 million parameters
20
  * *Context length*: 512 tokens
21
- * *Other model variants*: *TODO*
22
 
23
  ## Usage
24
 
 
 
25
  ```python
26
  from transformers import AutoModel
27
 
28
  provence = AutoModel.from_pretrained("naver/provence-reranker-debertav3-v1", trust_remote_code=True)
29
 
30
- context = [["Shepherd’s pie. History. In early cookery books, the dish was a means of using leftover roasted meat of any kind, and the pie dish was lined on the sides and bottom with mashed potato, as well as having a mashed potato crust on top. Variations and similar dishes. Other potato-topped pies include: The modern ”Cumberland pie” is a version with either beef or lamb and a layer of bread- crumbs and cheese on top. In medieval times, and modern-day Cumbria, the pastry crust had a filling of meat with fruits and spices.. In Quebec, a varia- tion on the cottage pie is called ”Paˆte ́ chinois”. It is made with ground beef on the bottom layer, canned corn in the middle, and mashed potato on top.. The ”shepherdess pie” is a vegetarian version made without meat, or a vegan version made without meat and dairy.. In the Netherlands, a very similar dish called ”philosopher’s stew” () often adds ingredients like beans, apples, prunes, or apple sauce.. In Brazil, a dish called in refers to the fact that a manioc puree hides a layer of sun-dried meat."]]
31
- query = ['what goes on the bottom of shepherd’s pie']
 
 
 
 
 
 
 
 
 
 
32
 
33
- pruned_context = provence.process(context, query)
34
- # print(f"Pruned context: {pruned_context}")
35
- # Pruned context: [['Shepherd’s pie. In early cookery books, the dish was a means of using leftover roasted meat of any kind, and the pie dish was lined on the sides and bottom with mashed potato, as well as having a mashed potato crust on top.']]
 
36
  ```
37
 
38
  Training code, as well as RAG experiments with Provence can be found in the [BERGEN](https://github.com/naver/bergen) library.
@@ -40,11 +52,11 @@ Training code, as well as RAG experiments with Provence can be found in the [BER
40
  ## Model interface
41
 
42
  Interface of the `process` function:
43
- * `questions`: `List[str]`: a list of input questions
44
- * `contexts`: `List[List[str]]`: a list of retrieved contexts, provided in a list for each question. `len(contexts)` should be equal to `len(questions)`
45
- * `titles`: `Optional[Union[List[List[str]], str]]`, _default: “first_sentence”_: an optional list of titles for retrieved contexts, same shape as `contexts`. If it is equal to `first_sentence`, then the first sentence of each context is assumed to be the title. If None, then it is assumed that no titles are provided. Titles are only used if `always_select_title=True`.
46
  * `threshold` _(float, $ \in [0, 1]$, default 0.1)_: which threshold to use for context pruning. We recommend 0.1 for more conservative pruning (no performance drop or lowest performance drops) and 0.5 for higher compression, but this value can be further tuned to meet the specific use case requirements.
47
- * `always_select_title` _(bool, default: True)_: if True, the first sentence (title) will always be selected. This is important, e.g., for Wikipedia passages, to provide proper context for the next sentences.
48
  * `batch_size` (int, default: 32)
49
  * `reorder` _(bool, default: False)_: if True, the provided contexts for each question will be reordered according to the computed question-passage relevance scores. If False, the original user-provided order of contexts will be preserved.
50
  * `top_k` _(int, default: 5)_: if `reorder=True`, specifies the number of top-ranked passages to keep for each question.
 
18
  * *Backbone model*: [DeBERTav3-reranker](https://huggingface.co/naver/trecdl22-crossencoder-debertav3) (trained from [DeBERTa-v3-large](https://huggingface.co/microsoft/deberta-v3-large))
19
  * *Model size*: 430 million parameters
20
  * *Context length*: 512 tokens
 
21
 
22
  ## Usage
23
 
24
+ Pruning a single context for a single question:
25
+
26
  ```python
27
  from transformers import AutoModel
28
 
29
  provence = AutoModel.from_pretrained("naver/provence-reranker-debertav3-v1", trust_remote_code=True)
30
 
31
+ context = "Shepherd’s pie. History. In early cookery books, the dish was a means of using leftover roasted meat of any kind, and the pie dish was lined on the sides and bottom with mashed potato, as well as having a mashed potato crust on top. Variations and similar dishes. Other potato-topped pies include: The modern ”Cumberland pie” is a version with either beef or lamb and a layer of bread- crumbs and cheese on top. In medieval times, and modern-day Cumbria, the pastry crust had a filling of meat with fruits and spices.. In Quebec, a varia- tion on the cottage pie is called ”Paˆte ́ chinois”. It is made with ground beef on the bottom layer, canned corn in the middle, and mashed potato on top.. The ”shepherdess pie” is a vegetarian version made without meat, or a vegan version made without meat and dairy.. In the Netherlands, a very similar dish called ”philosopher’s stew” () often adds ingredients like beans, apples, prunes, or apple sauce.. In Brazil, a dish called in refers to the fact that a manioc puree hides a layer of sun-dried meat."
32
+ question = 'What goes on the bottom of Shepherd’s pie?'
33
+
34
+ provence_output = provence.process(question, context)
35
+ # print(f"Provence Output: {provence_output}")
36
+ # Provence Output: {'reranking_score': 3.022725, pruned_context': 'In early cookery books, the dish was a means of using leftover roasted meat of any kind, and the pie dish was lined on the sides and bottom with mashed potato, as well as having a mashed potato crust on top.']]
37
+ ```
38
+
39
+ You can also pass a list of questions and a list of lists of contexts (multiple contexts per question to be pruned) for batched processing.
40
+
41
+ Setting `always_select_title=True` will keep the first sentence "Shepherd’s pie". This is especially useful for Wikipedia articles where the title is often needed to understand the context.
42
+ More details on how the title is defined are given below.
43
 
44
+ ```python
45
+ provence_output = provence.process(question, context, always_select_title=True)
46
+ # print(f"Provence Output: {provence_output}")
47
+ # Provence Output: {'reranking_score': 3.022725, pruned_context': 'Shepherd’s pie. In early cookery books, the dish was a means of using leftover roasted meat of any kind, and the pie dish was lined on the sides and bottom with mashed potato, as well as having a mashed potato crust on top.']]
48
  ```
49
 
50
  Training code, as well as RAG experiments with Provence can be found in the [BERGEN](https://github.com/naver/bergen) library.
 
52
  ## Model interface
53
 
54
  Interface of the `process` function:
55
+ * `question`: `Union[List[str], str]`: an input question (str) or a list of input questions (for batched processing)
56
+ * `context`: `Union[List[List[str]], [List[str]], str]`: context(s) to be pruned. This can be either a single string (in case of a singe str question), or a list of lists contexts (a list of contexts per question), with `len(contexts)` equal to `len(questions)`
57
+ * `title`: `Optional[Union[List[List[str]], str]]`, _default: “first_sentence”_: an optional argument for defining titles. If `title=first_sentence`, then the first sentence of each context is assumed to be the title. If `title=None`, then it is assumed that no titles are provided. Titles can be also passed as a list of lists of str, i.e. titles shaped the same way as contexts. Titles are only used if `always_select_title=True`.
58
  * `threshold` _(float, $ \in [0, 1]$, default 0.1)_: which threshold to use for context pruning. We recommend 0.1 for more conservative pruning (no performance drop or lowest performance drops) and 0.5 for higher compression, but this value can be further tuned to meet the specific use case requirements.
59
+ * `always_select_title` _(bool, default: True)_: if True, the first sentence (title) will always be selected. This is important, e.g., for Wikipedia passages, to provide proper contextualization for the next sentences.
60
  * `batch_size` (int, default: 32)
61
  * `reorder` _(bool, default: False)_: if True, the provided contexts for each question will be reordered according to the computed question-passage relevance scores. If False, the original user-provided order of contexts will be preserved.
62
  * `top_k` _(int, default: 5)_: if `reorder=True`, specifies the number of top-ranked passages to keep for each question.