
Arxiv: Arxiv | Code: Open-PMC Github | Dataset: Hugging Face
Model Overview
This model is a checkpoint trained on the Open-PMC dataset. It utilizes a Vision Transformer (ViT-base16) as the backbone for visual feature extraction and PubMedBERT for processing text data. The model is trained using Contrastive Learning with the vanilla Info-NCE loss to learn meaningful representations across different modalities.
Model Architecture
- Vision Backbone: ViT-B/16 (Pretrained on ImageNet)
- Text Backbone: PubMedBERT (Pretrained on PubMedCentral Abstracts)
- Training Objective: Contrastive Learning with Info-NCE Loss
Training Framework
The model was trained using the mmlearn framework, which is designed for multimodal learning. You can find more information and access the framework here.
How to Use
Please visit out GitHub for information on how to run benchmarking using this checkpoint
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.