johnnyboycurtis
/

ModernBERT-small

@@ -78,27 +78,48 @@ license: mit
 # SentenceTransformer
-This is a [sentence-transformers](https://www.SBERT.net) model trained on the [stsb](https://huggingface.co/datasets/sentence-transformers/stsb) dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
 ### Model Description
 - **Model Type:** Sentence Transformer
-<!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
 - **Maximum Sequence Length:** 1024 tokens
 - **Output Dimensionality:** 384 dimensions
 - **Similarity Function:** Cosine Similarity
-- **Training Dataset:**
-    - [stsb](https://huggingface.co/datasets/sentence-transformers/stsb)
 - **Language:** en
-<!-- - **License:** Unknown -->
 ### Model Sources
 - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
-- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
 - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
 ### Full Model Architecture
 ```

 # SentenceTransformer
+This is a [sentence-transformers](https://www.SBERT.net) model based on a custom ModernBERT-Small architecture, trained from scratch using a multi-stage pipeline. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
 ### Model Description
 - **Model Type:** Sentence Transformer
+- **Base model:** Custom-trained ModernBERT-Small (trained from scratch)
+- **Architecture:** ModernBERT-Small
 - **Maximum Sequence Length:** 1024 tokens
 - **Output Dimensionality:** 384 dimensions
 - **Similarity Function:** Cosine Similarity
 - **Language:** en
+- **License:** MIT
 ### Model Sources
+- **Repository:** [ModernBERT Training Scripts](https://github.com/Johnnyboycurtis/semantic-search-models/tree/main/ModernBERT)
 - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
 - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
+### Training Procedure
+This model was developed using a sophisticated, multi-stage "curriculum learning" approach to build a deep semantic understanding. The training scripts are available in the linked repository.
+#### Stage 1: Foundational Contrastive Training
+The model was first trained on a large, diverse collection of over 1 million triplets from three different datasets. This stage taught the model a broad, foundational understanding of language, relevance, and logical relationships.
+- **Datasets:**
+    - [sentence-transformers/all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli)
+    - [sentence-transformers/trivia-qa-triplet](https://huggingface.co/datasets/sentence-transformers/trivia-qa-triplet)
+    - [sentence-transformers/msmarco-msmarco-distilbert-base-v3](https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-v3)
+- **Loss Function:** `MultipleNegativesRankingLoss`
+#### Stage 2: Advanced Knowledge Distillation
+The foundational model was then refined by having it mimic a state-of-the-art teacher model ([BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5)). This stage transferred the nuanced knowledge of the expert teacher to our more efficient student model.
+- **Teacher Model:** `BAAI/bge-base-en-v1.5`
+- **Loss Function:** `DistillKLDivLoss`
+#### Stage 3: Task-Specific Fine-Tuning
+As a final "calibration" step, the best distilled model was fine-tuned directly on the Semantic Textual Similarity (STS) benchmark. This specializes the model for tasks requiring precise similarity scores.
+- **Dataset:** [sentence-transformers/stsb](https://huggingface.co/datasets/sentence-transformers/stsb)
+- **Loss Function:** `CosineSimilarityLoss`
 ### Full Model Architecture
 ```