peterizsak commited on
Commit
48f830c
·
verified ·
1 Parent(s): 8b31b62

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md CHANGED
@@ -1,3 +1,91 @@
1
  ---
2
  license: mit
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language:
4
+ - en
5
  ---
6
+
7
+ # BGE-small-en-v1.5-rag-int8-static
8
+
9
+ A quantized version of [BAAI/BGE-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) quantized with [Intel® Neural Compressor](https://github.com/huggingface/optimum-intel) and compatible with [Optimum-Intel](https://github.com/huggingface/optimum-intel).
10
+
11
+ The model can be used with [Optimum-Intel](https://github.com/huggingface/optimum-intel) API and as a standalone model or as an embedder or ranker module as part of [fastRAG](https://github.com/IntelLabs/fastRAG) RAG pipeline.
12
+
13
+ ## Technical details
14
+
15
+ Quantized using post-training static quantization.
16
+
17
+ | | |
18
+ |---|:---:|
19
+ | Calibration set | [qasper](https://huggingface.co/datasets/allenai/qasper) (with 50 random samples)" |
20
+ | Quantization tool | [Optimum-Intel](https://github.com/huggingface/optimum-intel) |
21
+ | Backend | `IPEX` |
22
+ | Original model | [BAAI/BGE-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) |
23
+
24
+ Instructions how to reproduce the quantized model can be found [here](https://github.com/IntelLabs/fastRAG/tree/main/scripts/optimizations/embedders).
25
+
26
+ ## Evaluation - MTEB
27
+
28
+ Model performance on the [Massive Text Embedding Benchmark (MTEB)](https://huggingface.co/spaces/mteb/leaderboard) *retrieval* and *reranking* tasks.
29
+
30
+ | | `INT8` | `FP32` | % diff |
31
+ |---|:---:|:---:|:---:|
32
+ | Reranking | 0.5826 | 0.5836 | -0.166% |
33
+ | Retrieval | 0.5138 | 0.5168 | -0.58% |
34
+
35
+ ## Usage
36
+
37
+ ### Using with Optimum-intel
38
+
39
+ See [Optimum-intel](https://github.com/huggingface/optimum-intel) installation page for instructions how to install. Or run:
40
+
41
+ ``` sh
42
+ pip install -U optimum[neural-compressor, ipex] intel-extension-for-transformers
43
+ ```
44
+
45
+ Loading a model:
46
+
47
+ ``` python
48
+ from optimum.intel import IPEXModel
49
+
50
+ model = IPEXModel.from_pretrained("Intel/bge-small-en-v1.5-rag-int8-static")
51
+ ```
52
+
53
+ Running inference:
54
+
55
+ ``` python
56
+ from transformers import AutoTokenizer
57
+
58
+ tokenizer = AutoTokenizer.from_pretrained("Intel/bge-small-en-v1.5-rag-int8-static")
59
+
60
+ inputs = tokenizer(sentences, return_tensors='pt')
61
+
62
+ with torch.no_grad():
63
+ outputs = model(**inputs)
64
+ # get the vector of [CLS]
65
+ embedded = model_output[0][:, 0]
66
+ ```
67
+
68
+ ### Using with a fastRAG RAG pipeline
69
+
70
+ Get started with installing [fastRAG](https://github.com/IntelLabs/fastRAG) as instructed [here](https://github.com/IntelLabs/fastRAG).
71
+
72
+ Below is an example for loading the model into a ranker node that embeds and re-ranks all the documents it gets in the node input of a pipeline.
73
+
74
+ ``` python
75
+ from fastrag.rankers import QuantizedBiEncoderRanker
76
+
77
+ ranker = QuantizedBiEncoderRanker("Intel/bge-small-en-v1.5-rag-int8-static")
78
+ ```
79
+
80
+ and plugging it into a pipeline
81
+
82
+ ``` python
83
+
84
+ from haystack import Pipeline
85
+
86
+ p = Pipeline()
87
+ p.add_node(component=retriever, name="retriever", inputs=["Query"])
88
+ p.add_node(component=ranker, name="ranker", inputs=["retriever"])
89
+ ```
90
+
91
+ See a more complete example notebook [here](https://github.com/IntelLabs/fastRAG/blob/main/examples/optimized-embeddings.ipynb).