Mjollnir1996 commited on
Commit
cc9f031
·
1 Parent(s): 25113a1

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -57
README.md DELETED
@@ -1,57 +0,0 @@
1
- ---
2
- language: multilingual
3
- datasets:
4
- - NQ
5
- - Trivia
6
- - SQuAD
7
- - MLQA
8
- - DRCD
9
- ---
10
-
11
- # dpr-ctx_encoder-bert-base-multilingual
12
-
13
- ## Description
14
-
15
- Multilingual DPR Model base on bert-base-multilingual-cased.
16
- [DPR model](https://arxiv.org/abs/2004.04906)
17
- [DPR repo](https://github.com/facebookresearch/DPR)
18
-
19
- ## Data
20
- 1. [NQ](https://github.com/facebookresearch/DPR/blob/master/data/download_data.py)
21
- 2. [Trivia](https://github.com/facebookresearch/DPR/blob/master/data/download_data.py)
22
- 3. [SQuAD](https://github.com/facebookresearch/DPR/blob/master/data/download_data.py)
23
- 4. [DRCD*](https://github.com/DRCKnowledgeTeam/DRCD)
24
- 5. [MLQA*](https://github.com/facebookresearch/MLQA)
25
-
26
- `question pairs for train`: 644,217
27
- `question pairs for dev`: 73,710
28
-
29
- *DRCD and MLQA are converted using script from haystack [squad_to_dpr.py](https://github.com/deepset-ai/haystack/blob/master/haystack/retriever/squad_to_dpr.py)
30
-
31
- ## Training Script
32
- I use the script from [haystack](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial9_DPR_training.ipynb)
33
-
34
- ## Usage
35
-
36
- ```python
37
- from transformers import DPRContextEncoder, DPRContextEncoderTokenizer
38
- tokenizer = DPRContextEncoderTokenizer.from_pretrained('voidful/dpr-ctx_encoder-bert-base-multilingual')
39
- model = DPRContextEncoder.from_pretrained('voidful/dpr-ctx_encoder-bert-base-multilingual')
40
- input_ids = tokenizer("Hello, is my dog cute ?", return_tensors='pt')["input_ids"]
41
- embeddings = model(input_ids).pooler_output
42
- ```
43
-
44
- Follow the tutorial from `haystack`:
45
- [Better Retrievers via "Dense Passage Retrieval"](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial6_Better_Retrieval_via_DPR.ipynb)
46
- ```
47
- from haystack.retriever.dense import DensePassageRetriever
48
- retriever = DensePassageRetriever(document_store=document_store,
49
- query_embedding_model="voidful/dpr-question_encoder-bert-base-multilingual",
50
- passage_embedding_model="voidful/dpr-ctx_encoder-bert-base-multilingual",
51
- max_seq_len_query=64,
52
- max_seq_len_passage=256,
53
- batch_size=16,
54
- use_gpu=True,
55
- embed_title=True,
56
- use_fast_tokenizers=True)
57
- ```