The UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 49, D480–D489 (2021).
Google Scholar
Kuhlman, B. & Bradley, P. Advances in protein structure prediction and design. Nat. Rev. Mol. Cell Biol. 20, 681–697 (2019).
Google Scholar
Huang, P.-S., Boyken, S. E. & Baker, D. The coming of age of de novo protein design. Nature 537, 320–327 (2016).
Google Scholar
Koga, N. et al. Principles for designing ideal protein structures. Nature 491, 222–227 (2012).
Google Scholar
Cao, L. et al. Design of protein-binding proteins from the target structure alone. Nature 605, 551–560 (2022).
Google Scholar
Kries, H., Blomberg, R. & Hilvert, D. De novo enzymes by computational design. Curr. Opin. Chem. Biol. 17, 221–228 (2013).
Google Scholar
Joh, N. H. et al. De novo design of a transmembrane Zn2+-transporting four-helix bundle. Science 346, 1520–1524 (2014).
Google Scholar
Smith, J. M. Natural selection and the concept of a protein space. Nature 225, 563–564 (1970).
Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Google Scholar
Ramesh, A. et al. Zero-shot text-to-image generation. In Proc. 38th International Conference on Machine Learning (eds Meila, M. et al.) 8821–8831 (PMLR, 2021).
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C. & Chen, M. Hierarchical text-conditional image generation with CLIP latents. Preprint at https://arxiv.org/abs/2204.06125 (2022).
Saharia, C. et al. Photorealistic text-to-image diffusion models with deep language understanding. In Proc. Advances in Neural Information Processing Systems 35 (eds Koyejo, S. et al.) 36479–36494 (NeurIPS, 2022).
Riesselman, A. J., Ingraham, J. B. & Marks, D. S. Deep generative models of genetic variation capture the effects of mutations. Nat. Methods 15, 816–822 (2018).
Google Scholar
Greener, J. G., Moffat, L. & Jones, D. T. Design of metalloproteins and novel protein folds using variational autoencoders. Sci. Rep. 8, 16189 (2018).
Google Scholar
Ingraham, J., Garg, V., Barzilay, R. & Jaakkola, T. Generative models for graph-based protein design. In Proc. Advances in Neural Information Processing Systems 32 (eds Wallach, H. et al.) (NeurIPS, 2019).
Anand, N. et al. Protein sequence design with a learned potential. Nat. Commun. 13, 746 (2022).
Google Scholar
Madani, A. et al. ProGen: language modeling for protein generation. Preprint at http://arxiv.org/abs/2004.03497 (2020).
Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021).
Google Scholar
Notin, P. et al. Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval. In Proc. 39th International Conference on Machine Learning (eds Chaudhuri, K. et al.) 16990–17017 (PMLR, 2022).
Anand, N. & Huang, P.-S. Generative modeling for protein structures. In Proc. Advances in Neural Information Processing Systems 31 (eds Bengio, S. et al.) (NeurIPS, 2018).
Lin, Z., Sercu, T., LeCun, Y. & Rives, A. Deep generative models create new and diverse protein structures. In Machine Learning in Structural Biology Workshop at the 35th Conference on Neural Information Processing Systems (MLSB, 2021).
Eguchi, R. R., Choe, C. A. & Huang, P.-S. Ig-VAE: Generative modeling of protein structure by direct 3D coordinate generation. PLoS Comput. Biol. 18, e1010271 (2022).
Google Scholar
Anand, N. & Achim, T. Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. Preprint at https://arxiv.org/abs/2205.15019 (2022).
Trippe, B. L. et al. Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem. In Proc. 11th International Conference on Learning Representations (eds Kim, B. et al.) (OpenReview.net, 2023).
Wu, K. E. et al. Protein structure generation via folding diffusion. Preprint at https://arxiv.org/abs/2209.15611 (2022).
Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023).
Google Scholar
Barnes, J. & Hut, P. A hierarchical O(N log N) force-calculation algorithm. Nature 324, 446–449 (1986).
Google Scholar
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N. & Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In Proc. 32nd International Conference on Machine Learning Vol. 27 (eds Bach, F. et al.) 2256–2265 (PMLR, 2015).
Song, Y. et al. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations (eds Hofmann, K. et al.) (OpenReview.net, 2021).
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural message passing for quantum chemistry. In Proc. 34th International Conference on Machine Learning (eds Precup, D. et al.) 1263–1272 (PMLR, 2017).
Battaglia, P. W. et al. Relational inductive biases, deep learning, and graph networks. Preprint at https://arxiv.org/abs/1806.01261 (2018).
Jing, B., Eismann, S., Suriana, P., Townshend, R. J. L. & Dror, R. Learning from protein structure with geometric vector perceptrons. In International Conference on Learning Representations (eds Hofmann, K. et al.) (OpenReview.net, 2021).
Hsu, C. et al. Learning inverse folding from millions of predicted structures. In Proc. 39th International Conference on Machine Learning Vol. 162 (eds Chaudhuri, K. et al.) 8946–8970 (PMLR, 2022).
Dauparas, J. et al. Robust deep learning–based protein sequence design using ProteinMPNN. Science 378, 49–56 (2022).
Google Scholar
Plaxco, K. W., Simons, K. T. & Baker, D. Contact order, transition state placement and the refolding rates of single domain proteins. J. Mol. Biol. 277, 985–994 (1998).
Tanner, J. J. Empirical power laws for the radii of gyration of protein oligomers. Acta Crystallogr. D 72, 1119–1129 (2016).
Google Scholar
Mackenzie, C. O., Zhou, J. & Grigoryan, G. Tertiary alphabet for the observable protein structural universe. Proc. Natl Acad. Sci. USA 113, E7438–E7447 (2016).
Google Scholar
Zhou, J., Panaitiu, A. E. & Grigoryan, G. A general-purpose protein design framework based on mining sequence–structure relationships in known protein structures. Proc. Natl Acad. Sci. USA 117, 1059–1068 (2020).
Google Scholar
Wu, R. et al. High-resolution de novo structure prediction from primary sequence. Preprint at bioRxiv https://doi.org/10.1101/2022.07.21.500999 (2022).
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Google Scholar
Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 (2005).
Google Scholar
Sillitoe, I. et al. CATH: increased structural coverage of functional space. Nucleic Acids Res. 49, D266–D273 (2021).
Google Scholar
Røgen, P. & Fain, B. Automatic classification of protein structure by using Gauss integrals. Proc. Natl Acad. Sci. USA 100, 119–124 (2003).
Google Scholar
Harder, T., Borg, M., Boomsma, W., Røgen, P. & Hamelryck, T. Fast large-scale clustering of protein structures using Gauss integrals. Bioinformatics 28, 510–515 (2012).
Google Scholar
McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: Uniform Manifold Approximation and Projection. J. Open Source Softw. 3, 861 (2018).
Wicky, B. I. M. et al. Hallucinating symmetric protein assemblies. Science 378, 56–61 (2022).
Google Scholar
King, N. P. et al. Accurate design of co-assembling multi-component protein nanomaterials. Nature 510, 103–108 (2014).
Google Scholar
Peyré, G. & Cuturi, M. Computational optimal transport: with applications to data science. Found. Trends Mach. Learn. 11, 355–607 (2019).
Google Scholar
Cabantous, S., Terwilliger, T. C. & Waldo, G. S. Protein tagging and detection with engineered self-assembling fragments of green fluorescent protein. Nat. Biotechnol. 23, 102–107 (2005).
Google Scholar
Micsonai, A. et al. BeStSel: webserver for secondary structure and fold prediction for protein CD spectroscopy. Nucleic Acids Res. 50, W90–W98 (2022).
Google Scholar
Grigoryan, G. & DeGrado, W. F. Probing designability via a generalized model of helical bundle geometry. J. Mol. Biol. 405, 1079–1100 (2011).
Google Scholar
Woolfson, D. N. et al. De novo protein design: how do we expand into the universe of possible protein structures? Curr. Opin. Struct. Biol. 33, 16–26 (2015).
Google Scholar
Beesley, J. L. & Woolfson, D. N. The de novo design of α-helical peptides for supramolecular self-assembly. Curr. Opin. Biotechnol. 58, 175–182 (2019).
Google Scholar