ProfileBFN

Official implementation of ICLR 2025 "ProfileBFN: Steering Protein Family Design through Profile Bayesian Flow".

Environment

The environment is based on PyTorch 1.13. Follow the official installation instructions to set it up according to your CUDA version. Then, install the following packages:

pip install omegaconf hydra-core bitarray rdkit-pypi scipy lmdb numba scikit-learn

More detailed environment settings are located in env.yaml

Data

Data used for evaluating the model is already put in the data folder

Checkpoints

We provide the pretrained checkpoint as ProfileBFN_150M.ckpt and ProfileBFN_650M.ckpt, please download all files and set the CKPT_PATH to the corresponding directory.

Sampling

mkdir ./results All Generation Results will be placed in such subdir.

Run make sample_profile -f scripts.mk to sample protein family based MSA. Note that inputs with inconsistent lengths would be automatically aligned.

Run make sample_sequence -f scripts.mk to sample protein family based on single protein sequence.

Evaluation

Evaluating generated protein family by CCMPRED

Clone CCMPRED repo in dir test/ccmpred and follow instructions as their README.

targets are generated sequence under results/sample_profile dir after the sampling process

cd test/ccmpred
docker build -f docker/Dockerfile -t exp/contact_evaluation .
CUDA_VISIBLE_DEVICES=4,5,6,7 ./scripts/run_evaluate.sh -i <input_dir> -o <output_dir>

Citation

@article{gong2025steering,
  title={Steering Protein Family Design through Profile Bayesian Flow},
  author={Gong, Jingjing and Pei, Yu and Long, Siyu and Song, Yuxuan and Zhang, Zhe and Huang, Wenhao and Cao, Ziyao and Zhang, Shuyi and Zhou, Hao and Ma, Wei-Ying},
  journal={arXiv preprint arXiv:2502.07671},
  year={2025}
}