update readme with ref to dataset
Browse files
README.md
CHANGED
@@ -6,6 +6,11 @@ language:
|
|
6 |
base_model:
|
7 |
- distilbert/distilbert-base-multilingual-cased
|
8 |
library_name: transformers
|
|
|
|
|
|
|
|
|
|
|
9 |
---
|
10 |
|
11 |
# DLL Catalog Author Reconciliation Model
|
@@ -26,4 +31,4 @@ Achieving accuracy and reliability in this process will make the second goal of
|
|
26 |
|
27 |
## The Model
|
28 |
|
29 |
-
After preliminary experiments with sequential neural network models using [bag-of-words](https://en.wikipedia.org/wiki/Bag-of-words_model), [term frequency-inverse document frequency](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) (tf-idf), and custom word embedding encoding, I settled on using a pretrained BERT model developed by [Devlin et al. 2018](https://arxiv.org/abs/1810.04805v2). Specifically, I'm using [Hugging Face's DistilBert base multilingual (cased) model](https://huggingface.co/distilbert/distilbert-base-multilingual-cased), which is based on work by [Sanh et al. 2020](https://doi.org/10.48550/arXiv.1910.01108).
|
|
|
6 |
base_model:
|
7 |
- distilbert/distilbert-base-multilingual-cased
|
8 |
library_name: transformers
|
9 |
+
datasets:
|
10 |
+
- sjhuskey/latin_author_dll_id
|
11 |
+
metrics:
|
12 |
+
- f1
|
13 |
+
- accuracy
|
14 |
---
|
15 |
|
16 |
# DLL Catalog Author Reconciliation Model
|
|
|
31 |
|
32 |
## The Model
|
33 |
|
34 |
+
After preliminary experiments with sequential neural network models using [bag-of-words](https://en.wikipedia.org/wiki/Bag-of-words_model), [term frequency-inverse document frequency](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) (tf-idf), and custom word embedding encoding, I settled on using a pretrained BERT model developed by [Devlin et al. 2018](https://arxiv.org/abs/1810.04805v2). Specifically, I'm using [Hugging Face's DistilBert base multilingual (cased) model](https://huggingface.co/distilbert/distilbert-base-multilingual-cased), which is based on work by [Sanh et al. 2020](https://doi.org/10.48550/arXiv.1910.01108).
|