Milan Straka
commited on
Commit
·
20ab64f
1
Parent(s):
41cfe08
Describe that we dropped the pooler in v1.1.
Browse files
README.md
CHANGED
|
@@ -14,9 +14,10 @@ tags:
|
|
| 14 |
## Version History
|
| 15 |
|
| 16 |
- **version 1.1**: Version 1.1 was released in Jan 2024, with a change to the
|
| 17 |
-
tokenizer; the model parameters were mostly kept the same, but
|
| 18 |
-
were enlarged (by copying suitable rows) to correspond to
|
| 19 |
-
tokenizer
|
|
|
|
| 20 |
|
| 21 |
The tokenizer in the initial release (a) contained a hole (51959 did not
|
| 22 |
correspond to any token), and (b) mapped several tokens (unseen during training
|
|
@@ -29,8 +30,9 @@ tags:
|
|
| 29 |
mapping all tokens to a unique ID. That also required increasing the
|
| 30 |
vocabulary size and embeddings weights (by replicating the embedding of the
|
| 31 |
`[UNK]` token). Without finetuning, version 1.1 and version 1.0 gives exactly
|
| 32 |
-
the same
|
| 33 |
-
|
|
|
|
| 34 |
|
| 35 |
However, the sizes of the embeddings (and LM head weights and biases) are
|
| 36 |
different, so the weights of the version 1.1 are not compatible with the
|
|
|
|
| 14 |
## Version History
|
| 15 |
|
| 16 |
- **version 1.1**: Version 1.1 was released in Jan 2024, with a change to the
|
| 17 |
+
tokenizer described below; the model parameters were mostly kept the same, but
|
| 18 |
+
(a) the embeddings were enlarged (by copying suitable rows) to correspond to
|
| 19 |
+
the updated tokenizer, (b) the pooler was dropped (originally it was only
|
| 20 |
+
randomly initialized).
|
| 21 |
|
| 22 |
The tokenizer in the initial release (a) contained a hole (51959 did not
|
| 23 |
correspond to any token), and (b) mapped several tokens (unseen during training
|
|
|
|
| 30 |
mapping all tokens to a unique ID. That also required increasing the
|
| 31 |
vocabulary size and embeddings weights (by replicating the embedding of the
|
| 32 |
`[UNK]` token). Without finetuning, version 1.1 and version 1.0 gives exactly
|
| 33 |
+
the same embeddings on any input (apart from the pooler missing in v1.1),
|
| 34 |
+
and the tokens in version 1.0 that mapped to a different ID than the `[UNK]`
|
| 35 |
+
token map to the same ID in version 1.1.
|
| 36 |
|
| 37 |
However, the sizes of the embeddings (and LM head weights and biases) are
|
| 38 |
different, so the weights of the version 1.1 are not compatible with the
|