NovoMolGen

NovoMolGen is a family of molecular foundation models trained on 1.5 billion ZINC‑22 molecules using Llama architectures and FlashAttention. It achieves state‑of‑the‑art performance on both unconstrained and goal‑directed molecule generation tasks.

How to load

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("chandar-lab/NovoMolGen_32M_SMILES_AtomWise", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("chandar-lab/NovoMolGen_32M_SMILES_AtomWise", trust_remote_code=True)

Quickstart

outputs = model.sample(tokenizer=tokenizer, batch_size=4)
print(outputs['SMILES'])

Citation

@article{chitsaz2024novomolgen,
  title={NovoMolGen: Rethinking Molecular Language Model Pretraining},
  author={Chitsaz, Kamran and Balaji, Roshan and Fournier, Quentin and Bhatt, Nirav Pravinbhai and Chandar, Sarath},
  journal={arXiv preprint},
  year={2025},
}

chandar-lab
/

NovoMolGen_32M_SMILES_AtomWise

NovoMolGen

How to load

Quickstart

Citation

Collection including chandar-lab/NovoMolGen_32M_SMILES_AtomWise

NovoMolGen