Designing a hybrid structure based on deep learning to predict phosphorylation sites

Document Type : Original Research

Authors

University of Birjand

Abstract
Phosphorylation is one of the most important types of post-translational modification (PTM) that plays an important role in protein function studies and experimental design. Considering the importance of phosphorylation in proteins and the increasing number of protein sequences in the database, the need to improve computational methods for predicting phosphorylation sites becomes more important day by day in terms of speed and accuracy. Although many predictive tools have been introduced to predict phosphorylation sites using different machine learning methods, there is still a long way to go to a very efficient tool and efforts to achieve such a tool continue. Recent studies have shown that deep learning-based methods are the best approach for predicting phosphorylation sites. This is because deep learning, as an advanced machine learning method, can automatically recognize complex representations of phosphorylation patterns from raw sequences, therefore providing a powerful tool for improved phosphorylation site prediction.
In this study, a hybrid structure based on the convolutional deep learning method named ConvoPhos has been introduced for predicting phosphorylation sites. In such a way that the CkSAApair feature vector obtained from sequences is used as the input for part of the classifier and the conversion of sequences to images, as the input for another part of the convolutional networks.The results of 10-fold cross-validation show an accuracy value of 94% for phosphosite data and an AUC of 90%, which is the highest performance compared to the other methods.

Keywords

Subjects


[1] H. Xu et al., “PTMD: A Database of Human Disease-associated Post-translational Modifications,” Genomics Proteomics Bioinformatics, vol. 16, no. 4, pp. 244–251, Aug. 2018, doi: 10.1016/j.gpb.2018.06.004.
[2] F. Ardito, M. Giuliani, D. Perrone, G. Troiano, and L. Lo Muzio, “The crucial role of protein phosphorylation in cell signaling and its use as targeted therapy (Review),” Int J Mol Med, vol. 40, no. 2, pp. 271–280, Aug. 2017, doi: 10.3892/ijmm.2017.3036.
[3] K. McCance and S. Huether, Pathophysiology: The Biologic Basis for Disease in Adults and Children, 7th ed. Elsevier,
 2014.
[4] Z. Zahiri, N. Mehrshad, and M. Mehrshad, “DF-Phos: Prediction of Protein Phosphorylation Sites by Deep Forest,” The Journal of Biochemistry, Dec. 2023.
[5] Y. Dou, B. Yao, and C. Zhang, “PhosphoSVM: Prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine,” Amino Acids, vol. 46, no. 6, pp. 1459–1469,  
 2014.
[6] B. Trost and A. Kusalik, “Computational prediction of eukaryotic phosphorylation sites,” Bioinformatics, vol. 27, no. 21, pp. 2927–2935,
 2011.
[7] N. Blom, S. Gammeltoft, and S. Brunak, “Sequence and structure-based prediction of eukaryotic protein phosphorylation sites.” J Mol Biol, vol. 294, no. 5, pp. 1351–62, 199.
[8] A. K. Biswas, N. Noman, and A. R. Sikder, “Machine learning approach to predict protein phosphorylation sites by incorporating evolutionary information.,”BMC Bioinformatics, vol. 11, no. 1, p. 273, May.2010.
[9] L. M. Iakoucheva et al., “The importance of intrinsic disorder for protein phosphorylation,” Nucleic Acids Res, vol. 32, no. 3, pp. 1037–
 1049, 2004.
[10] J. Gao, J. J. Thelen, a K. Dunker, and D. Xu, “Musite, a tool for global prediction of general and kinase-specific phosphorylation sites.,” Mol Cell Proteomics, vol. 9, no. 12, pp.
 2586–2600, 2010.
[11] L. Breiman, “Random Forests,” Mach Learn, vol. 45, no. 1, pp. 5–32, 2001.
[12] A. Jones, H. Ismail, J. H. Kim, R. Newman, and B. K. Dukka, “RF-Phos: Random forest-based prediction of phosphorylation sites,” in 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE, Nov. 2015, pp. 135–140.
[13] C. Angermueller, T. Pärnamaa, L. Parts, and O. Stegle, “Deep learning for computational biology,” pp. 1–16, 2016.
[14] H. R. Hassanzadeh and M. D. Wang, “DeeperBind: Enhancing prediction of sequence specificities of DNA binding proteins,” Proceedings - 2016 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2016, pp. 178–183, 2017.
[15] D. Wang et al., “MusiteDeep: A deep-learning framework for general and kinase-specific phosphorylation site prediction,” Bioinformatics, vol. 33, no. 24, pp. 3909–3916,
 2017.
[16] Y. Xie et al., “DeepNitro : Prediction of Protein Nitration and Nitrosylation Sites by Deep Learning,” Genomics Proteomics Bioinformatics, vol. 16, no. 4, pp. 294–306,
 2018
[17] F. Luo, M. Wang, Y. Liu, X. M. Zhao, A. Li, and J. Hancock, “DeepPhos: Prediction of protein phosphorylation sites with deep learning,” Bioinformatics, vol. 35, no. 16, pp.
 2766–2773, 2019.
[18] L. Guo et al., “DeepPSP: A Global-Local Information-Based Deep Neural Network for the Prediction of Protein Phosphorylation Sites,” J Proteome Res, vol. 20, no. 1, pp. 346–
 356, 2021.
[19] K. Yu et al., “qPhos: a database of protein phosphorylation dynamics in humans,” Nucleic Acids Res, vol. 47, no. D1, pp. D451–D458, Jan. 2019.
[20] K.-Y. Huang et al., “dbPTM in 2019: exploring disease association and cross-talk of post-translational modifications.” Nucleic Acids Res, vol. 47, no. D1, pp. D298–D308, Jan. 2019.
[21] S. Ullah et al., “DbPAF: An integrative database of protein phosphorylation in animals and fungi,” Sci Rep, vol. 6, no. January, pp. 1–
 9, 2016.
[22] H. Dinkel et al., “Phospho.ELM: A database of phosphorylation sites-update 2011,” Nucleic Acids Res, vol. 39, no. SUPPL. 1, pp.
 261–267, 2011.
[23] F. Gnad, J. Gunawardena, and M. Mann, “PHOSIDA 2011: the posttranslational modification database,” Nucleic Acids Res, vol.
 39, no. suppl_1, pp. D253–D260, 2010.
 [24] Y. Shi et al., “dbPSP 2.0, an updated database of protein phosphorylation sites in prokaryotes,” Sci Data, vol. 7, no. 1, pp. 1–9, Dec. 202.
[25] J. L. Heazlewood et al., “PhosPhAt: a database of phosphorylation sites in Arabidopsis thaliana and a plant-specific phosphorylation site predictor.” Nucleic Acids Res, vol. 36, no. Database issue, pp. D1015-21, Jan. 2008.
[26] P. V. Hornbeck, B. Zhang, B. Murray, J. M. Kornhauser, V. Latham, and E. Skrzypek, “PhosphoSitePlus, 2014: Mutations, PTMs and recalibrations,” Nucleic Acids Res, vol. 43, no. D1, pp. D512–D520, 2015.
[27] I. Deznabi, B. Arabaci, M. Koyutürk, and O. Tastan, “DeepKinZero: zero-shot learning for predicting kinase–phosphosite associations involving understudied kinases,” Bioinformatics, vol. 36, no. 12, pp. 3652–3661, Jun. 2020.
[28] S. Ahmed, M. Kabir, M. Arif, Z. U. Khan, and D.-J. Yu, “DeepPPSite: A deep learning-based model for analysis and prediction of phosphorylation sites using efficient sequence information,” Anal Biochem, vol. 612, p.
 113955, 2021.
[29] N. Blom, T. Sicheritz-Pontén, R. Gupta, S. Gammeltoft, and S. Brunak, “Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence,” Proteomics, vol. 4, no. 6, pp.
 1633–1649, 2004.
[30] W. Li and A. Godzik, “Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences,” Bioinformatics, vol. 22, no. 13, pp. 1658–1659, Jul. 2006.
[31] Z. Ju and S.-Y. Wang, “Prediction of citrullination sites by incorporating k-spaced amino acid pairs into Chou’s general pseudo amino acid composition,” Gene, vol. 664, pp.
 78–83, 2018.
[32] Md. M. Hasan, Y. Zhou, X. Lu, J. Li, J. Song, and Z. Zhang, “Computational Identification of Protein Pupylation Sites by Using Profile-Based Composition of k-Spaced Amino Acid Pairs,” PLoS One, vol. 10, no. 6, p. e0129635, Jun. 2015.
[33] X. Zhao, W. Zhang, X. Xu, Z. Ma, and M. Yin, “Prediction of Protein Phosphorylation Sites by Using the Composition of k-Spaced Amino Acid Pairs,” PLoS One, vol. 7, p. e46302, Jan. 2012.
[34] S. Liu, C. Cui, H. Chen, and T. Liu, “Ensemble learning-based feature selection for phosphorylation site detection,” Front Genet, vol. 13, Oct. 2022.
[35] Y.-Z. A. N. D. W. X.-F. A. N. D. W. C. A. N. D. Y. R.-X. A. N. D. Z. Z. Chen Zhen AND Chen, “Prediction of Ubiquitination Sites by Using the Composition of k-Spaced Amino Acid Pairs,” PLoS One, vol. 6, no. 7, pp. 1–8, Jan. 201.