azherali
/

bert_paraphrase

@@ -3,13 +3,33 @@ library_name: transformers
 license: apache-2.0
 base_model: bert-base-uncased
 tags:
-- generated_from_trainer
 metrics:
 - accuracy
 - f1
 model-index:
 - name: bert_paraphrase
-  results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -17,19 +37,77 @@ should probably proofread and complete it, then remove this comment. -->
 # bert_paraphrase
-This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on an unknown dataset.
-It achieves the following results on the evaluation set:
 - Loss: 0.4042
 - Accuracy: 0.8676
 - F1: 0.9078
-## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
@@ -62,4 +140,4 @@ The following hyperparameters were used during training:
 - Transformers 4.55.2
 - Pytorch 2.8.0+cu126
 - Datasets 4.0.0
-- Tokenizers 0.21.4

 license: apache-2.0
 base_model: bert-base-uncased
 tags:
+- paraphrase-detection
+- sentence-pair-classification
+- glue
+- mrpc
 metrics:
 - accuracy
 - f1
 model-index:
 - name: bert_paraphrase
+  results:
+  - task:
+      name: Paraphrase Detection
+      type: text-classification
+    dataset:
+      name: GLUE MRPC
+      type: glue
+      config: mrpc
+      split: validation
+    metrics:
+    - name: Accuracy
+      type: accuracy
+      value: 0.8676
+    - name: F1
+      type: f1
+      value: 0.9078
+language:
+- en
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # bert_paraphrase
+This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on the **Microsoft Research Paraphrase Corpus (MRPC)**, a subset of the [GLUE benchmark](https://huggingface.co/datasets/glue).
+It is trained to determine whether **two sentences are semantically equivalent (paraphrases) or not**.
+## 📊 Evaluation Results
 - Loss: 0.4042
 - Accuracy: 0.8676
 - F1: 0.9078
+## 🧾 Model Description
+- **Model type:** BERT-base (uncased)
+- **Task:** Binary classification (paraphrase vs not paraphrase)
+- **Languages:** English
+- **Labels:**
+  - `0` → Not paraphrase
+  - `1` → Paraphrase
+---
+## ✅ Intended Uses & Limitations
 ## Intended uses & limitations
+### Intended uses
+- Detect if two sentences convey the same meaning.
+- Useful for:
+  - Duplicate question detection (e.g., Quora, FAQ bots).
+  - Semantic similarity search.
+  - Improving information retrieval systems.
+### Limitations
+- Only trained on English (MRPC dataset).
+- May not generalize well to other domains (e.g., legal, medical).
+- Binary labels only (no "degree of similarity").
+---
+## 📚 How to Use
+You can use this model with the Hugging Face `pipeline` for quick inference:
+```python
+from transformers import pipeline
+paraphrase_detector = pipeline(
+    "text-classification",
+    model="azherali/bert_paraphrase",
+    tokenizer="azherali/bert_paraphrase"
+)
+single_pair = [
+    {"text": "The car is red.", "text_pair": "The automobile is red."},
+]
+result = paraphrase_detector(single_pair)
+print( result)
+[{'label': 'paraphrase', 'score': 0.9801033139228821}]
+# Test pairs
+pairs = [
+    {"text": "The car is red.", "text_pair": "The automobile is red."},
+    {"text": "He enjoys playing football.", "text_pair": "She likes cooking."},
+]
+result = paraphrase_detector(pairs)
+print( result)
+[{'label': 'paraphrase', 'score': 0.9801033139228821}, {'label': 'not_paraphrase', 'score': 0.9302119016647339}]
+```
 ## Training and evaluation data
 - Transformers 4.55.2
 - Pytorch 2.8.0+cu126
 - Datasets 4.0.0
+- Tokenizers 0.21.4