Do “Newly Born” orphan proteins resemble “Never Born” proteins? A study using three deep learning algorithms

Jing Liu, Rongqing Yuan, Wei Shao, Jitong Wang, Israel Silman*, Joel L. Sussman*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)

Abstract

“Newly Born” proteins, devoid of detectable homology to any other proteins, known as orphan proteins, occur in a single species or within a taxonomically restricted gene family. They are generated by the expression of novel open reading frames, and appear throughout evolution. We were curious if three recently developed programs for predicting protein structures, namely, AlphaFold2, RoseTTAFold, and ESMFold, might be of value for comparison of such “Newly Born” proteins to random polypeptides with amino acid content similar to that of native proteins, which have been called “Never Born” proteins. The programs were used to compare the structures of two sets of “Never Born” proteins that had been expressed—Group 1, which had been shown experimentally to possess substantial secondary structure, and Group 3, which had been shown to be intrinsically disordered. Overall, although the models generated were scored as being of low quality, they nevertheless revealed some general principles. Specifically, all four members of Group 1 were predicted to be compact by all three algorithms, in agreement with the experimental data, whereas the members of Group 3 were predicted to be very extended, as would be expected for intrinsically disordered proteins, again consistent with the experimental data. These predicted differences were shown to be statistically significant by comparing their accessible surface areas. The three programs were then used to predict the structures of three orphan proteins whose crystal structures had been solved, two of which display novel folds. Surprisingly, only for the protein which did not have a novel fold, and was taxonomically restricted, rather than being a true orphan, did all three algorithms predict very similar, high-quality structures, closely resembling the crystal structure. Finally, they were used to predict the structures of seven orphan proteins with well-identified biological functions, whose 3D structures are not known. Two proteins, which were predicted to be disordered based on their sequences, are predicted by all three structure algorithms to be extended structures. The other five were predicted to be compact structures with only two exceptions in the case of AlphaFold2. All three prediction algorithms make remarkably similar and high-quality predictions for one large protein, HCO_11565, from a nematode. It is conjectured that this is due to many homologs in the taxonomically restricted family of which it is a member, and to the fact that the Dali server revealed several nonrelated proteins with similar folds. An animated Interactive 3D Complement (I3DC) is available in Proteopedia at http://proteopedia.org/w/Journal:Proteins:3.

Original languageEnglish
Pages (from-to)1097-1115
Number of pages19
JournalProteins: Structure, Function and Bioinformatics
Volume91
Issue number8
DOIs
Publication statusPublished Online - 24 Apr 2023

Bibliographical note

Funding Information:
The Israeli tutors and the Chinese students acknowledge the support of the YutChun‐Weizmann Program that enabled this study. The study was also supported by a research grant from the Center for Scientific Excellence at the Weizmann Institute of Science. We are grateful to Dr. Shifra Ben‐Dor for valuable discussions, to Prof. Robin Gasser (University of Melbourne) for providing us with the sequence of HCO_011565, to Prof. Keith Dunker (University of Indiana) for recommending the flDPnn algorithm for disorder prediction, and to Dr. Sergey Ovchinnikov (Harvard University) for valuable advice concerning the use of AF2 Colab.

Publisher Copyright:
© 2023 The Authors. Proteins: Structure, Function, and Bioinformatics published by Wiley Periodicals LLC.

All Science Journal Classification (ASJC) codes

  • Structural Biology
  • Biochemistry
  • Molecular Biology

Fingerprint

Dive into the research topics of 'Do “Newly Born” orphan proteins resemble “Never Born” proteins? A study using three deep learning algorithms'. Together they form a unique fingerprint.

Cite this