Update README.md
Browse files
README.md
CHANGED
@@ -78,27 +78,48 @@ license: mit
|
|
78 |
|
79 |
# SentenceTransformer
|
80 |
|
81 |
-
This is a [sentence-transformers](https://www.SBERT.net) model
|
82 |
|
83 |
## Model Details
|
84 |
|
85 |
### Model Description
|
86 |
- **Model Type:** Sentence Transformer
|
87 |
-
|
|
|
88 |
- **Maximum Sequence Length:** 1024 tokens
|
89 |
- **Output Dimensionality:** 384 dimensions
|
90 |
- **Similarity Function:** Cosine Similarity
|
91 |
-
- **Training Dataset:**
|
92 |
-
- [stsb](https://huggingface.co/datasets/sentence-transformers/stsb)
|
93 |
- **Language:** en
|
94 |
-
|
95 |
|
96 |
### Model Sources
|
97 |
|
|
|
98 |
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
|
99 |
-
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
|
100 |
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
|
101 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
102 |
### Full Model Architecture
|
103 |
|
104 |
```
|
|
|
78 |
|
79 |
# SentenceTransformer
|
80 |
|
81 |
+
This is a [sentence-transformers](https://www.SBERT.net) model based on a custom ModernBERT-Small architecture, trained from scratch using a multi-stage pipeline. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
82 |
|
83 |
## Model Details
|
84 |
|
85 |
### Model Description
|
86 |
- **Model Type:** Sentence Transformer
|
87 |
+
- **Base model:** Custom-trained ModernBERT-Small (trained from scratch)
|
88 |
+
- **Architecture:** ModernBERT-Small
|
89 |
- **Maximum Sequence Length:** 1024 tokens
|
90 |
- **Output Dimensionality:** 384 dimensions
|
91 |
- **Similarity Function:** Cosine Similarity
|
|
|
|
|
92 |
- **Language:** en
|
93 |
+
- **License:** MIT
|
94 |
|
95 |
### Model Sources
|
96 |
|
97 |
+
- **Repository:** [ModernBERT Training Scripts](https://github.com/Johnnyboycurtis/semantic-search-models/tree/main/ModernBERT)
|
98 |
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
|
|
|
99 |
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
|
100 |
|
101 |
+
### Training Procedure
|
102 |
+
|
103 |
+
This model was developed using a sophisticated, multi-stage "curriculum learning" approach to build a deep semantic understanding. The training scripts are available in the linked repository.
|
104 |
+
|
105 |
+
#### Stage 1: Foundational Contrastive Training
|
106 |
+
The model was first trained on a large, diverse collection of over 1 million triplets from three different datasets. This stage taught the model a broad, foundational understanding of language, relevance, and logical relationships.
|
107 |
+
- **Datasets:**
|
108 |
+
- [sentence-transformers/all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli)
|
109 |
+
- [sentence-transformers/trivia-qa-triplet](https://huggingface.co/datasets/sentence-transformers/trivia-qa-triplet)
|
110 |
+
- [sentence-transformers/msmarco-msmarco-distilbert-base-v3](https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-v3)
|
111 |
+
- **Loss Function:** `MultipleNegativesRankingLoss`
|
112 |
+
|
113 |
+
#### Stage 2: Advanced Knowledge Distillation
|
114 |
+
The foundational model was then refined by having it mimic a state-of-the-art teacher model ([BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5)). This stage transferred the nuanced knowledge of the expert teacher to our more efficient student model.
|
115 |
+
- **Teacher Model:** `BAAI/bge-base-en-v1.5`
|
116 |
+
- **Loss Function:** `DistillKLDivLoss`
|
117 |
+
|
118 |
+
#### Stage 3: Task-Specific Fine-Tuning
|
119 |
+
As a final "calibration" step, the best distilled model was fine-tuned directly on the Semantic Textual Similarity (STS) benchmark. This specializes the model for tasks requiring precise similarity scores.
|
120 |
+
- **Dataset:** [sentence-transformers/stsb](https://huggingface.co/datasets/sentence-transformers/stsb)
|
121 |
+
- **Loss Function:** `CosineSimilarityLoss`
|
122 |
+
|
123 |
### Full Model Architecture
|
124 |
|
125 |
```
|