dschulmeist commited on
Commit
7cb9879
·
verified ·
1 Parent(s): 9dfb35a

add or update model card

Browse files
Files changed (1) hide show
  1. README.md +50 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - pt
4
+ library_name: transformers
5
+ pipeline_tag: feature-extraction
6
+ tags:
7
+ - BERT
8
+ - encoder
9
+ - embeddings
10
+ - TiME
11
+ - pt
12
+ - size:m
13
+ license: apache-2.0
14
+ teacher_model: FacebookAI/xlm-roberta-large
15
+ datasets:
16
+ - uonlp/CulturaX
17
+ ---
18
+
19
+ # TiME Portuguese (pt, m)
20
+
21
+ Monolingual BERT-style encoder that outputs embeddings for Portuguese.
22
+ Distilled from FacebookAI/xlm-roberta-large.
23
+
24
+ ## Specs
25
+ - language: Portuguese (pt)
26
+ - size: m
27
+ - architecture: BERT encoder
28
+ - layers: 6
29
+ - hidden size: 768
30
+ - intermediate size: 3072
31
+
32
+ ## Usage (mean pooled embeddings)
33
+
34
+ ```python
35
+ from transformers import AutoTokenizer, AutoModel
36
+ import torch
37
+
38
+ repo = "dschulmeist/TiME-pt-m"
39
+ tok = AutoTokenizer.from_pretrained(repo)
40
+ mdl = AutoModel.from_pretrained(repo)
41
+
42
+ def mean_pool(last_hidden_state, attention_mask):
43
+ mask = attention_mask.unsqueeze(-1).type_as(last_hidden_state)
44
+ return (last_hidden_state * mask).sum(1) / mask.sum(1).clamp(min=1e-9)
45
+
46
+ inputs = tok(["example sentence"], padding=True, truncation=True, return_tensors="pt")
47
+ outputs = mdl(**inputs)
48
+ emb = mean_pool(outputs.last_hidden_state, inputs['attention_mask'])
49
+ print(emb.shape)
50
+ ```