NovoMolGen
Collection
6 items
•
Updated
NovoMolGen is a family of molecular foundation models trained on 1.5 billion ZINC‑22 molecules using Llama architectures and FlashAttention. It achieves state‑of‑the‑art performance on both unconstrained and goal‑directed molecule generation tasks.
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("chandar-lab/NovoMolGen_32M_SMILES_BPE", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("chandar-lab/NovoMolGen_32M_SMILES_BPE", trust_remote_code=True)
outputs = model.sample(tokenizer=tokenizer, batch_size=4)
print(outputs['SMILES'])
@article{chitsaz2024novomolgen,
title={NovoMolGen: Rethinking Molecular Language Model Pretraining},
author={Chitsaz, Kamran and Balaji, Roshan and Fournier, Quentin and Bhatt, Nirav Pravinbhai and Chandar, Sarath},
journal={arXiv preprint},
year={2025},
}