Derify
/

ChemBERTa-druglike

@@ -4,11 +4,115 @@ datasets:
 - Derify/augmented_canonical_druglike_QED_43M
 - Derify/druglike
 metrics:
-- spearmanr
 library_name: transformers
 tags:
 - ChemBERTa
 - cheminformatics
 ---
 # ChemBERTa-druglike: Two-phase MLM Pretraining for Drug-like SMILES
@@ -50,6 +154,26 @@ The model's effectiveness was validated through downstream Chem-MRL training on
 W&B report on [ChemBERTa-druglike evaluation](https://api.wandb.ai/links/ecortes/afh508m3).
 ## Use Cases
 - Molecular property prediction
@@ -61,6 +185,38 @@ W&B report on [ChemBERTa-druglike evaluation](https://api.wandb.ai/links/ecortes
 - Optimized specifically for drug-like molecules
 - Performance may vary on non-drug-like chemical compounds
-## Citation
-- Chithrananda, Seyone, et al. "ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction." _arXiv [Cs.LG]_, 2020. [Link](http://arxiv.org/abs/2010.09885).
-- Ahmad, Walid, et al. "ChemBERTa-2: Towards Chemical Foundation Models." _arXiv [Cs.LG]_, 2022. [Link](http://arxiv.org/abs/2209.01712).

 - Derify/augmented_canonical_druglike_QED_43M
 - Derify/druglike
 metrics:
+- roc_auc
+- rmse
 library_name: transformers
 tags:
 - ChemBERTa
 - cheminformatics
+pipeline_tag: fill-mask
+model-index:
+- name: Derify/ChemBERTa-druglike
+  results:
+  - task:
+      type: text-classification
+      name: Classification (ROC AUC)
+    dataset:
+      name: BACE
+      type: Derify/druglike
+    metrics:
+    - type: roc_auc
+      value: 0.8114
+  - task:
+      type: text-classification
+      name: Classification (ROC AUC)
+    dataset:
+      name: BBBP
+      type: Derify/druglike
+    metrics:
+    - type: roc_auc
+      value: 0.7399
+  - task:
+      type: text-classification
+      name: Classification (ROC AUC)
+    dataset:
+      name: TOX21
+      type: Derify/druglike
+    metrics:
+    - type: roc_auc
+      value: 0.7522
+  - task:
+      type: text-classification
+      name: Classification (ROC AUC)
+    dataset:
+      name: HIV
+      type: Derify/druglike
+    metrics:
+    - type: roc_auc
+      value: 0.7527
+  - task:
+      type: text-classification
+      name: Classification (ROC AUC)
+    dataset:
+      name: SIDER
+      type: Derify/druglike
+    metrics:
+    - type: roc_auc
+      value: 0.6577
+  - task:
+      type: text-classification
+      name: Classification (ROC AUC)
+    dataset:
+      name: CLINTOX
+      type: Derify/druglike
+    metrics:
+    - type: roc_auc
+      value: 0.9660
+  - task:
+      type: regression
+      name: Regression (RMSE)
+    dataset:
+      name: ESOL
+      type: Derify/druglike
+    metrics:
+    - type: rmse
+      value: 0.8241
+  - task:
+      type: regression
+      name: Regression (RMSE)
+    dataset:
+      name: FREESOLV
+      type: Derify/druglike
+    metrics:
+    - type: rmse
+      value: 0.5350
+  - task:
+      type: regression
+      name: Regression (RMSE)
+    dataset:
+      name: LIPO
+      type: Derify/druglike
+    metrics:
+    - type: rmse
+      value: 0.6663
+  - task:
+      type: regression
+      name: Regression (RMSE)
+    dataset:
+      name: BACE
+      type: Derify/druglike
+    metrics:
+    - type: rmse
+      value: 1.0105
+  - task:
+      type: regression
+      name: Regression (RMSE)
+    dataset:
+      name: CLEARANCE
+      type: Derify/druglike
+    metrics:
+    - type: rmse
+      value: 43.4499
 ---
 # ChemBERTa-druglike: Two-phase MLM Pretraining for Drug-like SMILES
 W&B report on [ChemBERTa-druglike evaluation](https://api.wandb.ai/links/ecortes/afh508m3).
+## Benchmarks
+### Classification Datasets (ROC AUC - Higher is better)
+| Model                     | BACE↑  | BBBP↑  | TOX21↑ | HIV↑   | SIDER↑ | CLINTOX↑ |
+| ------------------------- | ------ | ------ | ------ | ------ | ------ | -------- |
+| **Tasks**                 | 1      | 1      | 12     | 1      | 27     | 2        |
+| Derify/ChemBERTa-druglike | 0.8114 | 0.7399 | 0.7522 | 0.7527 | 0.6577 | 0.9660   |
+### Regression Datasets (RMSE - Lower is better)
+| Model                     | ESOL↓  | FREESOLV↓ | LIPO↓  | BACE↓  | CLEARANCE↓ |
+| ------------------------- | ------ | --------- | ------ | ------ | ---------- |
+| **Tasks**                 | 1      | 1         | 1      | 1      | 1          |
+| Derify/ChemBERTa-druglike | 0.8241 | 0.5350    | 0.6663 | 1.0105 | 43.4499    |
+Benchmarks were conducted using the [chemberta3](https://github.com/deepforestsci/chemberta3) framework.
+Datasets were split with DeepChem’s scaffold splits and filtered to include only molecules with SMILES length ≤128, matching the model’s maximum input length.
+The ChemBERTa-druglike model was fine-tuned for 100 epochs with a learning rate of 3e-5 and batch size of 32.
+Each task was run with 3 different random seeds, and the mean performance is reported.
 ## Use Cases
 - Molecular property prediction
 - Optimized specifically for drug-like molecules
 - Performance may vary on non-drug-like chemical compounds
+## Citations
+### ChemBERTa Series
+```
+@misc{chithrananda2020chembertalargescaleselfsupervisedpretraining,
+      title={ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction},
+      author={Seyone Chithrananda and Gabriel Grand and Bharath Ramsundar},
+      year={2020},
+      eprint={2010.09885},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2010.09885},
+}
+```
+```
+@misc{ahmad2022chemberta2chemicalfoundationmodels,
+      title={ChemBERTa-2: Towards Chemical Foundation Models},
+      author={Walid Ahmad and Elana Simon and Seyone Chithrananda and Gabriel Grand and Bharath Ramsundar},
+      year={2022},
+      eprint={2209.01712},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2209.01712},
+}
+```
+```
+@misc{singh2025chemberta3opensource,
+  title={ChemBERTa-3: An Open Source Training Framework for Chemical Foundation Models},
+  author={Singh, R. and Barsainyan, A. A. and Irfan, R. and Amorin, C. J. and He, S. and Davis, T. and others},
+  year={2025},
+  howpublished={ChemRxiv},
+  doi={10.26434/chemrxiv-2025-4glrl-v2},
+  note={This content is a preprint and has not been peer-reviewed},
+  url={https://doi.org/10.26434/chemrxiv-2025-4glrl-v2}
+}
+```