tomaarsen HF Staff commited on
Commit
2cc04ba
·
verified ·
1 Parent(s): d144d60
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -10,7 +10,7 @@ tags:
10
  - generated_from_trainer
11
  - dataset_size:100000
12
  - loss:CachedMultipleNegativesRankingLoss
13
- base_model: google/embeddinggemma-300M
14
  widget:
15
  - source_sentence: 'What are the potential effects of stopping inhaled corticosteroid
16
  (ICS) therapy in patients with chronic obstructive pulmonary disease (COPD)?
@@ -987,7 +987,7 @@ metrics:
987
  - cosine_mrr@10
988
  - cosine_map@100
989
  model-index:
990
- - name: EmbeddingGemma-300M trained on the Medical Instruction and RetrIeval Dataset
991
  (MIRIAD)
992
  results:
993
  - task:
@@ -1096,9 +1096,9 @@ model-index:
1096
  name: Cosine Map@100
1097
  ---
1098
 
1099
- # EmbeddingGemma-300M finetuned on the Medical Instruction and RetrIeval Dataset (MIRIAD)
1100
 
1101
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [google/embeddinggemma-300M](https://huggingface.co/google/embeddinggemma-300M) on the [miriad/miriad-4.4M](https://huggingface.co/datasets/miriad/miriad-4.4M) dataset (specifically the first 100.000 question-passage pairs from [tomaarsen/miriad-4.4M-split](https://huggingface.co/datasets/tomaarsen/miriad-4.4M-split)). It maps sentences & documents to a 768-dimensional dense vector space and can be used for medical information retrieval, specifically designed for searching for passages (up to 1k tokens) of scientific medical papers using detailed medical questions.
1102
 
1103
  This model has been trained using code from our [EmbeddingGemma blogpost](https://huggingface.co/blog/embeddinggemma) to showcase how the EmbeddingGemma model can be finetuned on specific domains/tasks for even stronger performance. It is not affiliated with Google.
1104
 
@@ -1106,7 +1106,7 @@ This model has been trained using code from our [EmbeddingGemma blogpost](https:
1106
 
1107
  ### Model Description
1108
  - **Model Type:** Sentence Transformer
1109
- - **Base model:** [google/embeddinggemma-300M](https://huggingface.co/google/embeddinggemma-300M) <!-- at revision a3cd7d576fa223c646b6b3fb05d801d031ddd393 -->
1110
  - **Maximum Sequence Length:** 1024 tokens
1111
  - **Output Dimensionality:** 768 dimensions
1112
  - **Similarity Function:** Cosine Similarity
@@ -1148,7 +1148,7 @@ Then you can load this model and run inference.
1148
  from sentence_transformers import SentenceTransformer
1149
 
1150
  # Download from the 🤗 Hub
1151
- model = SentenceTransformer("sentence-transformers/embeddinggemma-300M-medical")
1152
  # Run inference
1153
  queries = [
1154
  "What are some potential limitations in projecting the future demand for joint replacement surgeries?\n",
 
10
  - generated_from_trainer
11
  - dataset_size:100000
12
  - loss:CachedMultipleNegativesRankingLoss
13
+ base_model: google/embeddinggemma-300m
14
  widget:
15
  - source_sentence: 'What are the potential effects of stopping inhaled corticosteroid
16
  (ICS) therapy in patients with chronic obstructive pulmonary disease (COPD)?
 
987
  - cosine_mrr@10
988
  - cosine_map@100
989
  model-index:
990
+ - name: EmbeddingGemma-300m trained on the Medical Instruction and RetrIeval Dataset
991
  (MIRIAD)
992
  results:
993
  - task:
 
1096
  name: Cosine Map@100
1097
  ---
1098
 
1099
+ # EmbeddingGemma-300m finetuned on the Medical Instruction and RetrIeval Dataset (MIRIAD)
1100
 
1101
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m) on the [miriad/miriad-4.4M](https://huggingface.co/datasets/miriad/miriad-4.4M) dataset (specifically the first 100.000 question-passage pairs from [tomaarsen/miriad-4.4M-split](https://huggingface.co/datasets/tomaarsen/miriad-4.4M-split)). It maps sentences & documents to a 768-dimensional dense vector space and can be used for medical information retrieval, specifically designed for searching for passages (up to 1k tokens) of scientific medical papers using detailed medical questions.
1102
 
1103
  This model has been trained using code from our [EmbeddingGemma blogpost](https://huggingface.co/blog/embeddinggemma) to showcase how the EmbeddingGemma model can be finetuned on specific domains/tasks for even stronger performance. It is not affiliated with Google.
1104
 
 
1106
 
1107
  ### Model Description
1108
  - **Model Type:** Sentence Transformer
1109
+ - **Base model:** [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m) <!-- at revision a3cd7d576fa223c646b6b3fb05d801d031ddd393 -->
1110
  - **Maximum Sequence Length:** 1024 tokens
1111
  - **Output Dimensionality:** 768 dimensions
1112
  - **Similarity Function:** Cosine Similarity
 
1148
  from sentence_transformers import SentenceTransformer
1149
 
1150
  # Download from the 🤗 Hub
1151
+ model = SentenceTransformer("sentence-transformers/embeddinggemma-300m-medical")
1152
  # Run inference
1153
  queries = [
1154
  "What are some potential limitations in projecting the future demand for joint replacement surgeries?\n",