Sentence Similarity
Safetensors
English
bert

DisEmbed (Disease Embedding)

DisEmbed-v1 is a disease-focused embedding model designed for the medical domain, trained on a synthetic dataset comprising disease descriptions, symptoms, and Q&A pairs. It outperforms general medical models in disease-specific tasks, particularly in distinguishing similar diseases. DisEmbed excels in retrieval task and disease-context identification.

Model Details

Model Description

image/png

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("SalmanFaroz/DisEmbed-v1")
# Run inference
sentences = [
    'Chronic cough with blood-streaked sputum, severe night sweats, and unintentional weight loss.Painful breathing or chest pain, often worsened by coughing.Swelling in the neck or lymph nodes, and frequent fatigue.',
    'Asthma',
    'Tuberculosis'
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)

Citation

@article{faroz2024disembed,
  title={DisEmbed: Transforming Disease Understanding through Embeddings},
  author={Faroz, Salman},
  journal={arXiv preprint arXiv:2412.15258},
  year={2024},
  doi={10.48550/arXiv.2412.15258},
  url={https://arxiv.org/abs/2412.15258}
}
Downloads last month
7
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for SalmanFaroz/DisEmbed-v1

Finetuned
(140)
this model

Dataset used to train SalmanFaroz/DisEmbed-v1

Collection including SalmanFaroz/DisEmbed-v1