MolEncoder: MLM for Molecules
Collection
Collection for the paper "MolEncoder: Towards Optimal Masked Language Modeling for Molecules".
•
7 items
•
Updated
MolEncoder is a BERT-based chemical language model pretrained on SMILES strings using masked language modeling (MLM). It was designed to investigate optimal pretraining strategies for molecular representation learning, with a particular focus on masking ratio, dataset size, and model size. It is described in detail in the paper "MolEncoder: Towards Optimal Masked Language Modeling for Molecules".
Please refer to the MolEncoder GitHub repository for detailed instructions and ready-to-use examples on fine-tuning the model on custom data and running predictions.
If you use this model, please cite: Citation-will-be-inserted-soon