NovoMolGen

NovoMolGen is a family of molecular foundation models trained on 1.5 billion ZINC‑22 molecules using Llama architectures and FlashAttention. It achieves state‑of‑the‑art performance on both unconstrained and goal‑directed molecule generation tasks.

How to load

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("chandar-lab/NovoMolGen_32M_SMILES_AtomWise", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("chandar-lab/NovoMolGen_32M_SMILES_AtomWise", trust_remote_code=True)

Quickstart

outputs = model.sample(tokenizer=tokenizer, batch_size=4)
print(outputs['SMILES'])

Citation

@article{chitsaz2024novomolgen,
  title={NovoMolGen: Rethinking Molecular Language Model Pretraining},
  author={Chitsaz, Kamran and Balaji, Roshan and Fournier, Quentin and Bhatt, Nirav Pravinbhai and Chandar, Sarath},
  journal={arXiv preprint},
  year={2025},
}
Downloads last month
88
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including chandar-lab/NovoMolGen_32M_SMILES_AtomWise