Prediction of human-virus protein-protein interaction using heterogeneous siamese neural network

Authors

Amirkabir university of technology

Abstract
Viral infections represent pathological conditions arising from the intrusion of viruses into host cells and their replication. The onset of infection is intricately tied to the interplay between viral and host cell proteins. Thus, elucidating these protein-protein interactions assumes a pivotal role in the encompassing prevention, treatment, and control of viral infections. Given traditional laboratory experimentation's prohibitively high costs and time-intensive nature, researchers have increasingly turned to computational approaches for predicting human-virus protein-protein interactions. Despite the performance of these computational approaches, a challenge persists in the need for an effective protein representation that adequately captures their structural intricacies.

In this paper, we present PBS, a novel model for the prediction of protein-protein interactions between viruses and humans. PBS leverages the transformers to effectively represent proteins. The model unified the latent space for human and virus proteins through the implementation of heterogeneous siamese neural networks.The model achieves an accuracy score of 81.41%, an area under the ROC curve score of 87.35%, an area under the precision-recall curve score of 87.78%, an F1 score of 81.58%, and a precision score of 80.84%. These metrics collectively underscore the satisfactory performance of the PBS model.

Furthermore, we assess the model's predictive capabilities in discerning interactions between proteins associated with the H1N1 influenza virus and human proteins.

Keywords

Subjects


[1] E. Petersen et al., “Comparing SARS-CoV-2 with SARS-CoV and influenza pandemics,” Lancet Infect Dis, vol. 20, no. 9, pp. e238–e244, Sep. 2020, doi: 10.1016/S1473-3099(20)30484-9.
[2] G. A. Smith and L. W. Enquist, “Break ins and break outs: viral interactions with the cytoskeleton of Mammalian cells,” Annu Rev Cell Dev Biol, vol. 18, pp. 135–161, 2002, doi: 10.1146/ANNUREV.CELLBIO.18.012502.105920.
[3] P. M. Jean Beltran, K. C. Cook, and I. M. Cristea, “Exploring and Exploiting Proteome Organization during Viral Infection,” J Virol, vol. 91, no. 18, Sep. 2017, doi: 10.1128/JVI.00268-17.
[4] G. Gerold, J. Bruening, B. Weigel, and T. Pietschmann, “Protein Interactions during the Flavivirus and Hepacivirus Life Cycle,” Mol Cell Proteomics, vol. 16, no. 4 suppl 1, pp. S75–S91, Apr. 2017, doi: 10.1074/MCP.R116.065649.
[5] S. Sadegh et al., “Exploring the SARS-CoV-2 virus-host-drug interactome for drug repurposing,” Nat Commun, vol. 11, no. 1, Dec. 2020, doi: 10.1038/S41467-020-17189-2.
[6] S. Li et al., “Comprehensive characterization of human–virus protein-protein interactions reveals disease comorbidities and potential antiviral drugs,” Comput Struct Biotechnol J, vol. 20, pp. 1244–1253, Jan. 2022, doi: 10.1016/J.CSBJ.2022.03.002.
[7] N. Goodacre, P. Devkota, E. Bae, S. Wuchty, and P. Uetz, “Protein-protein interactions of human viruses,” Semin Cell Dev Biol, vol. 99, pp. 31–39, Mar. 2020, doi: 10.1016/j.semcdb.2018.07.018.
[8] L. Young, R. L. Jernigan, and D. G. Covell, “A role for surface hydrophobicity in protein‐protein recognition,” Protein Science, vol. 3, no. 5, pp. 717–729, 1994, doi: 10.1002/PRO.5560030501/TITLE/A_ROLE_FOR_SURFACE_HYDROPHOBICITY_IN_PROTEIN_PROTEIN_RECOGNITION.
[9] S. Jones and J. M. Thornton, “Analysis of protein-protein interaction sites using surface patches,” J Mol Biol, vol. 272, no. 1, pp. 121–132, Sep. 1997, doi: 10.1006/jmbi.1997.1234.
[10] F. Pazos, M. Helmer-Citterich, G. Ausiello, and A. Valencia, “Correlated mutations contain information about protein - protein interaction,” J Mol Biol, vol. 271, no. 4, pp. 511–523, Aug. 1997, doi: 10.1006/jmbi.1997.1198.
[11] X. Gallet, B. Charloteaux, A. Thomas, and R. Brasseur, “A fast method to predict protein interaction sites from sequences,” J Mol Biol, vol. 302, no. 4, pp. 917–926, Sep. 2000, doi: 10.1006/JMBI.2000.4092.
[12] H. M. Chen, J. X. Liu, D. Liu, G. F. Hao, and G. F. Yang, “Human-virus protein-protein interactions maps assist in revealing the pathogenesis of viral infection,” Rev Med Virol, vol. 34, no. 1, p. e2517, Jan. 2024, doi: 10.1002/RMV.2517.
[13] S. S. Raj and S. S. V. Chandra, “Significance of Sequence Features in Classification of Protein–Protein Interactions Using Machine Learning,” Protein Journal, vol. 43, no. 1, pp. 72–83, Feb. 2024, doi: 10.1007/S10930-023-10168-8/METRICS.
[14] B. Y. S. Li, L. F. Yeung, and G. Yang, “Pathogen host interaction prediction via matrix factorization,” Proceedings - 2014 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2014, pp. 357–362, Dec. 2014, doi: 10.1109/BIBM.2014.6999185.
[15] E. Nourani, F. Khunjush, and S. Durmuş, “Computational prediction of virus–human protein–protein interactions using embedding kernelized heterogeneous data,” Mol Biosyst, vol. 12, no. 6, pp. 1976–1986, May 2016, doi: 10.1039/C6MB00065G.
[16] F. E. Eid, M. Elhefnawi, and L. S. Heath, “DeNovo: virus-host sequence-based protein-protein interaction prediction,” Bioinformatics, vol. 32, no. 8, pp. 1144–1150, Apr. 2016, doi: 10.1093/BIOINFORMATICS/BTV737.
[17] A. H. Basit, W. A. Abbasi, A. Asif, S. Gull, and F. U. A. A. Minhas, “Training host-pathogen protein-protein interaction predictors,” J Bioinform Comput Biol, vol. 16, no. 4, Aug. 2018, doi: 10.1142/S0219720018500142.
[18] T. N. Dong, G. Brogden, G. Gerold, and M. Khosla, “A multitask transfer learning framework for the prediction of virus-human protein–protein interactions,” BMC Bioinformatics, vol. 22, no. 1, Dec. 2021, doi: 10.1186/S12859-021-04484-Y.
[19] W. Liu-Wei, Ş. Kafkas, J. Chen, N. J. Dimonaco, J. Tegnér, and R. Hoehndorf, “DeepViral: prediction of novel virus–host interactions from protein sequences and infectious disease phenotypes,” Bioinformatics, vol. 37, no. 17, pp. 2722–2729, Sep. 2021, doi: 10.1093/BIOINFORMATICS/BTAB147.
[20] X. Yang, S. Yang, Q. Li, S. Wuchty, and Z. Zhang, “Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method,” Comput Struct Biotechnol J, vol. 18, pp. 153–161, Jan. 2019, doi: 10.1016/J.CSBJ.2019.12.005.
[21] X. Zhou, B. Park, D. Choi, and K. Han, “A generalized approach to predicting protein-protein interactions between virus and host,” BMC Genomics, vol. 19, no. 6, pp. 69–77, Aug. 2018, doi: 10.1186/S12864-018-4924-2/TABLES/8.
[22] X. Yang et al., “Multi-modal features-based human-herpesvirus protein–protein interaction prediction by using LightGBM,” Brief Bioinform, vol. 25, no. 2, Jan. 2024, doi: 10.1093/BIB/BBAE005.
[23] A. Bateman et al., “UniProt: the universal protein knowledgebase in 2021,” Nucleic Acids Res, vol. 49, no. D1, pp. D480–D489, Jan. 2021, doi: 10.1093/NAR/GKAA1100.
[24] A. Behjati, F. Zare-Mirakabad, S. S. Arab, and A. Nowzari-Dalini, “Protein sequence profile prediction using ProtAlbert transformer,” Comput Biol Chem, vol. 99, Aug. 2022, doi: 10.1016/j.compbiolchem.2022.107717.
[25] Z. Ghorbanali, F. Zare-Mirakabad, N. Salehi, M. Akbari, and A. Masoudi-Nejad, “DrugRep-HeSiaGraph: when heterogenous siamese neural network meets knowledge graphs for drug repurposing,” BMC Bioinformatics 2023 24:1, vol. 24, no. 1, pp. 1–31, Oct. 2023, doi: 10.1186/S12859-023-05479-7.
[26] B. Krawczyk, “Learning from imbalanced data: open challenges and future directions,” Progress in Artificial Intelligence, vol. 5, no. 4, pp. 221–232, Nov. 2016, doi: 10.1007/S13748-016-0094-0/TABLES/1.