CrabInHoney commited on
Commit
8248ef6
·
verified ·
1 Parent(s): b45a7af

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -3
README.md CHANGED
@@ -1,3 +1,62 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ urlbert-tiny-base-v4 is a lightweight BERT-based model specifically optimized for URL analysis. This version includes several improvements over the previous version:
3
+
4
+ - Trained using a teacher-student architecture
5
+ - Utilized masked token prediction as the primary pre-training task
6
+ - Incorporated knowledge distillation from a larger model's logits
7
+ - Additional training on 3 specialized tasks to enhance URL structure understanding
8
+
9
+ The result is an efficient model that can be rapidly fine-tuned for URL classification tasks with minimal computational resources.
10
+
11
+ ## Model Details
12
+
13
+ - **Parameters:** 3.72M
14
+ - **Tensor Type:** F32
15
+ - **Previous Version:** [urlbert-tiny-base-v3](https://huggingface.co/CrabInHoney/urlbert-tiny-base-v3)
16
+
17
+ ## Usage Example
18
+
19
+ ```python
20
+ from transformers import BertTokenizerFast, BertForMaskedLM, pipeline
21
+ import torch
22
+
23
+ device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
24
+ print(f"Device: {device}")
25
+
26
+ model_name = "CrabInHoney/urlbert-tiny-base-v4"
27
+
28
+ tokenizer = BertTokenizerFast.from_pretrained(model_name)
29
+ model = BertForMaskedLM.from_pretrained(model_name)
30
+ model.to(device)
31
+
32
+ fill_mask = pipeline(
33
+ "fill-mask",
34
+ model=model,
35
+ tokenizer=tokenizer,
36
+ device=0 if torch.cuda.is_available() else -1
37
+ )
38
+
39
+ sentences = [
40
+ "http://example.[MASK]/"
41
+ ]
42
+
43
+ for sentence in sentences:
44
+ print(f"\nInput: {sentence}")
45
+ results = fill_mask(sentence)
46
+ for result in results:
47
+ token_str = result['token_str']
48
+ score = result['score']
49
+ print(f"Predicted token: {token_str}, probability: {score:.4f}")
50
+ ```
51
+
52
+ ### Sample Output
53
+
54
+ ```
55
+ Input: http://example.[MASK]/
56
+
57
+ Predicted token: com, probability: 0.7307
58
+ Predicted token: net, probability: 0.1319
59
+ Predicted token: org, probability: 0.0881
60
+ Predicted token: info, probability: 0.0094
61
+ Predicted token: cn, probability: 0.0084
62
+ ```