Abstract
This study reports sequence data mining and analysis, complete coordinate tertiary structure prediction including Deep Learning inspired validation, and in silico functional characterization of the full SARS-CoV-2 proteome based on the NCBI reference sequence NC_045512 (29903 bp ss-RNA). Out of 25 polypeptides analyzed, 3D structures of 15 of them were predicted using comparative protein structure prediction method and ab-initio modelling method due to unavailability of experimentally determined structures. Deep Learning and Neural Network based tools such as QMEANDisCo 4.0.0, MolProbity 4.4, ProQ3D and Procheck were used to verify the predicted 3D structures. Tunnel analysis revealed the presence of multiple tunnels in NSP4, nucleocapsid phosphoprotein, NSP3, membrane glycoprotein, ORF6 protein, NSP1, NSP6, and envelop protein, indicating a large number of transport pathways for small ligands that influence their reactivity. Ligand-binding pockets with high estimates of druggability scores were detected in envelope glycoprotein (0.97), membrane glycoprotein (0.87), NSP6 (0.79), ORF7a (0.79), ORF8 (0.75), ORF3a (0.72), and NSP4 (0.70), indicating the ability to bind drug-like molecules with high affinity indicating that the predicted structures would be useful for protein nanotechnology in understanding protein machinery towards drug repurposing and discovery studies. Moreover, the molecular phylogenetic analysis of orf1ab polyprotein indicates close relatedness of SARS-CoV-2 to the bat coronavirus.
Keywords
Proteome analysis, Protein machinery, Protein nanotechnology, SARS-CoV-2, Tunnel analysis, Deep learning, Neural network