azherali commited on
Commit
fd649c1
·
verified ·
1 Parent(s): 8582ec3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -8
README.md CHANGED
@@ -3,13 +3,33 @@ library_name: transformers
3
  license: apache-2.0
4
  base_model: bert-base-uncased
5
  tags:
6
- - generated_from_trainer
 
 
 
7
  metrics:
8
  - accuracy
9
  - f1
10
  model-index:
11
  - name: bert_paraphrase
12
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14
 
15
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -17,19 +37,77 @@ should probably proofread and complete it, then remove this comment. -->
17
 
18
  # bert_paraphrase
19
 
20
- This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on an unknown dataset.
21
- It achieves the following results on the evaluation set:
 
 
 
22
  - Loss: 0.4042
23
  - Accuracy: 0.8676
24
  - F1: 0.9078
25
 
26
- ## Model description
27
 
28
- More information needed
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  ## Intended uses & limitations
31
 
32
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
  ## Training and evaluation data
35
 
@@ -62,4 +140,4 @@ The following hyperparameters were used during training:
62
  - Transformers 4.55.2
63
  - Pytorch 2.8.0+cu126
64
  - Datasets 4.0.0
65
- - Tokenizers 0.21.4
 
3
  license: apache-2.0
4
  base_model: bert-base-uncased
5
  tags:
6
+ - paraphrase-detection
7
+ - sentence-pair-classification
8
+ - glue
9
+ - mrpc
10
  metrics:
11
  - accuracy
12
  - f1
13
  model-index:
14
  - name: bert_paraphrase
15
+ results:
16
+ - task:
17
+ name: Paraphrase Detection
18
+ type: text-classification
19
+ dataset:
20
+ name: GLUE MRPC
21
+ type: glue
22
+ config: mrpc
23
+ split: validation
24
+ metrics:
25
+ - name: Accuracy
26
+ type: accuracy
27
+ value: 0.8676
28
+ - name: F1
29
+ type: f1
30
+ value: 0.9078
31
+ language:
32
+ - en
33
  ---
34
 
35
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
37
 
38
  # bert_paraphrase
39
 
40
+ This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on the **Microsoft Research Paraphrase Corpus (MRPC)**, a subset of the [GLUE benchmark](https://huggingface.co/datasets/glue).
41
+
42
+ It is trained to determine whether **two sentences are semantically equivalent (paraphrases) or not**.
43
+
44
+ ## 📊 Evaluation Results
45
  - Loss: 0.4042
46
  - Accuracy: 0.8676
47
  - F1: 0.9078
48
 
 
49
 
50
+ ## 🧾 Model Description
51
+
52
+ - **Model type:** BERT-base (uncased)
53
+ - **Task:** Binary classification (paraphrase vs not paraphrase)
54
+ - **Languages:** English
55
+ - **Labels:**
56
+ - `0` → Not paraphrase
57
+ - `1` → Paraphrase
58
+
59
+ ---
60
+
61
+ ## ✅ Intended Uses & Limitations
62
 
63
  ## Intended uses & limitations
64
 
65
+ ### Intended uses
66
+ - Detect if two sentences convey the same meaning.
67
+ - Useful for:
68
+ - Duplicate question detection (e.g., Quora, FAQ bots).
69
+ - Semantic similarity search.
70
+ - Improving information retrieval systems.
71
+
72
+ ### Limitations
73
+ - Only trained on English (MRPC dataset).
74
+ - May not generalize well to other domains (e.g., legal, medical).
75
+ - Binary labels only (no "degree of similarity").
76
+
77
+ ---
78
+
79
+ ## 📚 How to Use
80
+
81
+ You can use this model with the Hugging Face `pipeline` for quick inference:
82
+
83
+ ```python
84
+ from transformers import pipeline
85
+
86
+ paraphrase_detector = pipeline(
87
+ "text-classification",
88
+ model="azherali/bert_paraphrase",
89
+ tokenizer="azherali/bert_paraphrase"
90
+ )
91
+
92
+ single_pair = [
93
+ {"text": "The car is red.", "text_pair": "The automobile is red."},
94
+ ]
95
+ result = paraphrase_detector(single_pair)
96
+ print( result)
97
+ [{'label': 'paraphrase', 'score': 0.9801033139228821}]
98
+
99
+ # Test pairs
100
+ pairs = [
101
+ {"text": "The car is red.", "text_pair": "The automobile is red."},
102
+ {"text": "He enjoys playing football.", "text_pair": "She likes cooking."},
103
+ ]
104
+ result = paraphrase_detector(pairs)
105
+ print( result)
106
+
107
+
108
+
109
+ [{'label': 'paraphrase', 'score': 0.9801033139228821}, {'label': 'not_paraphrase', 'score': 0.9302119016647339}]
110
+ ```
111
 
112
  ## Training and evaluation data
113
 
 
140
  - Transformers 4.55.2
141
  - Pytorch 2.8.0+cu126
142
  - Datasets 4.0.0
143
+ - Tokenizers 0.21.4