Papers
arxiv:2404.16866

Annotation-guided Protein Design with Multi-Level Domain Alignment

Published on Apr 18, 2024
Authors:
,
,
,
,
,
,
,
,

Abstract

PAAG, a multi-modality protein design framework, uses textual annotations to generate proteins with specific domains, achieving higher success rates in protein design tasks.

AI-generated summary

The core challenge of de novo protein design lies in creating proteins with specific functions or properties, guided by certain conditions. Current models explore to generate protein using structural and evolutionary guidance, which only provide indirect conditions concerning functions and properties. However, textual annotations of proteins, especially the annotations for protein domains, which directly describe the protein's high-level functionalities, properties, and their correlation with target amino acid sequences, remain unexplored in the context of protein design tasks. In this paper, we propose Protein-Annotation Alignment Generation, PAAG, a multi-modality protein design framework that integrates the textual annotations extracted from protein database for controllable generation in sequence space. Specifically, within a multi-level alignment module, PAAG can explicitly generate proteins containing specific domains conditioned on the corresponding domain annotations, and can even design novel proteins with flexible combinations of different kinds of annotations. Our experimental results underscore the superiority of the aligned protein representations from PAAG over 7 prediction tasks. Furthermore, PAAG demonstrates a significant increase in generation success rate (24.7% vs 4.7% in zinc finger, and 54.3% vs 22.0% in the immunoglobulin domain) in comparison to the existing model. We anticipate that PAAG will broaden the horizons of protein design by leveraging the knowledge from between textual annotation and proteins.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2404.16866 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2404.16866 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2404.16866 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.