Abstract
Accurately predicting cellular activities of proteins based on their primary amino acid sequences would greatly improve our understanding of the proteome. In this paper, we present CELL-E, a text-to-image transformer architecture that generates a 2D probability density map of protein distribution within cells. Given a amino acid sequence and a reference image for cell or nucleus morphology, CELL-E offers a more direct representation of protein localization, as opposed to previous in silico methods that rely on pre-defined, discrete class annotations of protein localization to subcellular compartments.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Contributing authors: yss{at}berkeley.edu;
Manuscript re-organized for a more biological audience. Computational mutation study and attention visualization for nuclear localization signal updated.
https://github.com/BoHuangLab/Protein-Localization-Transformer