johnnyboycurtis commited on
Commit
6f4c68e
·
verified ·
1 Parent(s): e139f13

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -6
README.md CHANGED
@@ -78,27 +78,48 @@ license: mit
78
 
79
  # SentenceTransformer
80
 
81
- This is a [sentence-transformers](https://www.SBERT.net) model trained on the [stsb](https://huggingface.co/datasets/sentence-transformers/stsb) dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
82
 
83
  ## Model Details
84
 
85
  ### Model Description
86
  - **Model Type:** Sentence Transformer
87
- <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
 
88
  - **Maximum Sequence Length:** 1024 tokens
89
  - **Output Dimensionality:** 384 dimensions
90
  - **Similarity Function:** Cosine Similarity
91
- - **Training Dataset:**
92
- - [stsb](https://huggingface.co/datasets/sentence-transformers/stsb)
93
  - **Language:** en
94
- <!-- - **License:** Unknown -->
95
 
96
  ### Model Sources
97
 
 
98
  - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
99
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
100
  - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
101
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
  ### Full Model Architecture
103
 
104
  ```
 
78
 
79
  # SentenceTransformer
80
 
81
+ This is a [sentence-transformers](https://www.SBERT.net) model based on a custom ModernBERT-Small architecture, trained from scratch using a multi-stage pipeline. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
82
 
83
  ## Model Details
84
 
85
  ### Model Description
86
  - **Model Type:** Sentence Transformer
87
+ - **Base model:** Custom-trained ModernBERT-Small (trained from scratch)
88
+ - **Architecture:** ModernBERT-Small
89
  - **Maximum Sequence Length:** 1024 tokens
90
  - **Output Dimensionality:** 384 dimensions
91
  - **Similarity Function:** Cosine Similarity
 
 
92
  - **Language:** en
93
+ - **License:** MIT
94
 
95
  ### Model Sources
96
 
97
+ - **Repository:** [ModernBERT Training Scripts](https://github.com/Johnnyboycurtis/semantic-search-models/tree/main/ModernBERT)
98
  - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
 
99
  - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
100
 
101
+ ### Training Procedure
102
+
103
+ This model was developed using a sophisticated, multi-stage "curriculum learning" approach to build a deep semantic understanding. The training scripts are available in the linked repository.
104
+
105
+ #### Stage 1: Foundational Contrastive Training
106
+ The model was first trained on a large, diverse collection of over 1 million triplets from three different datasets. This stage taught the model a broad, foundational understanding of language, relevance, and logical relationships.
107
+ - **Datasets:**
108
+ - [sentence-transformers/all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli)
109
+ - [sentence-transformers/trivia-qa-triplet](https://huggingface.co/datasets/sentence-transformers/trivia-qa-triplet)
110
+ - [sentence-transformers/msmarco-msmarco-distilbert-base-v3](https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-v3)
111
+ - **Loss Function:** `MultipleNegativesRankingLoss`
112
+
113
+ #### Stage 2: Advanced Knowledge Distillation
114
+ The foundational model was then refined by having it mimic a state-of-the-art teacher model ([BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5)). This stage transferred the nuanced knowledge of the expert teacher to our more efficient student model.
115
+ - **Teacher Model:** `BAAI/bge-base-en-v1.5`
116
+ - **Loss Function:** `DistillKLDivLoss`
117
+
118
+ #### Stage 3: Task-Specific Fine-Tuning
119
+ As a final "calibration" step, the best distilled model was fine-tuned directly on the Semantic Textual Similarity (STS) benchmark. This specializes the model for tasks requiring precise similarity scores.
120
+ - **Dataset:** [sentence-transformers/stsb](https://huggingface.co/datasets/sentence-transformers/stsb)
121
+ - **Loss Function:** `CosineSimilarityLoss`
122
+
123
  ### Full Model Architecture
124
 
125
  ```