YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

STEM Embedding Model

๐Ÿงฌ Embedding model optimized for STEM content (Math, Physics, CS, Biology).

Performance

  • Separation Score: 0.6767 (Excellent!)
  • Accuracy: 97.18%
  • Training: 75k+ STEM chunks from Wikipedia + Semantic Scholar

Usage

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("zacbrld/MNLP_M3_document_encoder_120tok")
tokenizer = AutoTokenizer.from_pretrained("zacbrld/MNLP_M3_document_encoder_120tok")

# Encode text
inputs = tokenizer("Neural networks use backpropagation", return_tensors="pt", truncation=True, padding=True)
embeddings = model(**inputs).last_hidden_state.mean(dim=1)

Training Details

  • Base: sentence-transformers/all-MiniLM-L6-v2
  • Method: Contrastive learning with triplet loss
  • Specialized for scientific and technical content
Downloads last month
-
Safetensors
Model size
22.7M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support