Yesimm commited on
Commit
2f1f13a
·
verified ·
1 Parent(s): e96f242

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +30 -2
README.md CHANGED
@@ -63157,7 +63157,7 @@ library_name: sentence-transformers
63157
 
63158
  # BGE-M3 fine-tuned with Matryoshka + MNRLoss
63159
 
63160
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) on the json dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
63161
 
63162
  ## Model Details
63163
 
@@ -63203,7 +63203,7 @@ Then you can load this model and run inference.
63203
  from sentence_transformers import SentenceTransformer
63204
 
63205
  # Download from the 🤗 Hub
63206
- model = SentenceTransformer("Yesimm/InfectaVec-v2")
63207
  # Run inference
63208
  sentences = [
63209
  '최근 몇 년간 SFTS의 발생 추세는 어떤가요?',
@@ -63422,6 +63422,34 @@ You can finetune this model on your own dataset.
63422
  - Datasets: 4.0.0
63423
  - Tokenizers: 0.21.1
63424
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63425
  ## Citation
63426
 
63427
  ### BibTeX
 
63157
 
63158
  # BGE-M3 fine-tuned with Matryoshka + MNRLoss
63159
 
63160
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) on the json dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. The main difference with InfectaVec-v1 model is that InfectaVec-v2 is trained with paraphrased and bitext mined queries (En to Kr, Kr to En).
63161
 
63162
  ## Model Details
63163
 
 
63203
  from sentence_transformers import SentenceTransformer
63204
 
63205
  # Download from the 🤗 Hub
63206
+ model = SentenceTransformer("Yesimm/InfectaVec-v1")
63207
  # Run inference
63208
  sentences = [
63209
  '최근 몇 년간 SFTS의 발생 추세는 어떤가요?',
 
63422
  - Datasets: 4.0.0
63423
  - Tokenizers: 0.21.1
63424
 
63425
+
63426
+ ### Evaluation Results
63427
+ ## Evaluation Results on Infectious Diseases Test Dataset
63428
+
63429
+ | Model | Epoch | Accuracy(@1) | Recall(@1) | Precision(@10) | NDCG(@10) | MRR(@10) | MAP(@100) |
63430
+ |--------------|-------|-------------|-----------|----------------|-----------|----------|-----------|
63431
+ | **BGE-M3** | - | 44.49 | 44.49 | 7.43 | 58.97 | 54.12 | 54.83 |
63432
+ | **InfectaVec v1** | 2 | 62.67 | 62.67 | 9.37 | 78.21 | 73.23 | 73.58 |
63433
+ | | 3 | 62.30 | 62.30 | 9.42 | 78.58 | 73.52 | 73.85 |
63434
+ | | 4 | 62.83 | 62.83 | 9.46 | 78.92 | 73.87 | 74.18 |
63435
+ | **InfectaVec v2** | 2 | 59.49 | 59.49 | 9.08 | 75.35 | 70.37 | 70.81 |
63436
+ | | 3 | 61.08 | 61.08 | 9.16 | 76.43 | 71.55 | 71.98 |
63437
+ | | 4 | 61.29 | 61.29 | 9.16 | 76.49 | 71.63 | 72.07 |
63438
+
63439
+
63440
+ ## Evaluation Results on MTEB Medical Benchmarks for Retrieval, Clustering and Semantic Text Similarity Tasks
63441
+ | Models | PublicHealthQA (Kr) | PublicHealthQA (En) | MedrxivClusteringS2S.v2 (En) | BIOSSES (En) |
63442
+ |---------------------|-------------------|--------------------|-----------------------------|-------------|
63443
+ | **BGE-M3** | 80.41 | 83.81 | 30.63 | 83.38 |
63444
+ | **Multilingual e5-large** | 85.14 | 84.57 | 39.14 | 87.45 |
63445
+ | **InfectaVec-v1** | 79.70 | 82.57 | 34.62 | 79.37 |
63446
+ | **Qwen-3 Embedding-0.6B** | 81.10 | 83.84 | 40.38 | 84.73 |
63447
+ | **InfectaVec-v2** | 82.36 | 84.85 | 34.23 | 76.51 |
63448
+
63449
+ ## Citation
63450
+
63451
+ ### BibTeX
63452
+
63453
  ## Citation
63454
 
63455
  ### BibTeX