Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag
Prashant Pradhan , Ashutosh Kumar Pandey , Akhilesh Mishra , Parul Gupta , Praveen
1111 Kumar Tripathi , Manoj Balakrishnan Menon , James Gomes , Perumal Vivekanandan* and
1Kusuma School of biological sciences, Indian institute of technology, New Delhi-110016, India. 2Acharya Narendra Dev College, University of Delhi, New Delhi-110019, India
We are currently witnessing a major epidemic caused by the 2019 novel coronavirus (2019- nCoV). The evolution of 2019-nCoV remains elusive. We found 4 insertions in the spike glycoprotein (S) which are unique to the 2019-nCoV and are not present in other coronaviruses. Importantly, amino acid residues in all the 4 inserts have identity or similarity to those in the HIV- 1 gp120 or HIV-1 Gag. Interestingly, despite the inserts being discontinuous on the primary amino acid sequence, 3D-modelling of the 2019-nCoV suggests that they converge to constitute the receptor binding site. The finding of 4 unique inserts in the 2019-nCoV, all of which have identity /similarity to amino acid residues in key structural proteins of HIV-1 is unlikely to be fortuitous in nature. This work provides yet unknown insights on 2019-nCoV and sheds light on the evolution and pathogenicity of this virus with important implications for diagnosis of this virus.
Coronaviruses (CoV) are single-stranded positive-sense RNA viruses that infect animals and humans. These are classified into 4 genera based on their host specificity: Alphacoronavirus, Betacoronavirus, Deltacoronavirus and Gammacoronavirus (Snijder et al., 2006). There are seven known types of CoVs that includes 229E and NL63 (Genus Alphacoronavirus), OC43, HKU1, MERS and SARS (Genus Betacoronavirus). While 229E, NL63, OC43, and HKU1 commonly infect humans, the SARS and MERS outbreak in 2002 and 2012 respectively occurred when the virus crossed-over from animals to humans causing significant mortality (J. Chan et al., n.d.; J. F. W. Chan et al., 2015). In December 2019, another outbreak of coronavirus was reported from Wuhan, China that also transmitted from animals to humans. This new virus has been temporarily termed as 2019-novel Coronavirus (2019-nCoV) by the World Health Organization (WHO) (J. F.- W. Chan et al., 2020; Zhu et al., 2020). While there are several hypotheses about the origin of 2019-nCoV, the source of this ongoing outbreak remains elusive.
The transmission patterns of 2019-nCoV is similar to patterns of transmission documented in the previous outbreaks including by bodily or aerosol contact with persons infected with the virus.
bioRxiv preprint first posted online Jan. 31, 2020; doi: http://dx.doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Casesofmildtosevereillness,anddeathfromtheinfectionhavebeenreportedfromWuhan. This outbreak has spread rapidly distant nations including France, Australia and USA among others. The number of cases within and outside China are increasing steeply. Our current understanding is limited to the virus genome sequences and modest epidemiological and clinical data. Comprehensive analysis of the available 2019- nCoV sequences may provide important clues that may help advance our current understanding to manage the ongoing outbreak.
The spike glycoprotein (S) of cornonavirus is cleaved into two subunits (S1 and S2). The S1 subunit helps in receptor binding and the S2 subunit facilitates membrane fusion (Bosch et al., 2003; Li, 2016). The spike glycoproteins of coronoviruses are important determinants of tissue tropism and host range. In addition the spike glycoproteins are critical targets for vaccine development (Du et al., 2013). For this reason, the spike proteins represent the most extensively studied among coronaviruses. We therefore sought to investigate the spike glycoprotein of the 2019-nCoV to understand its evolution, novel features sequence and structural features using computational tools.
Retrieval and alignment of nucleic acid and protein sequences
We retrieved all the available coronavirus sequences (n=55) from NCBI viral genome database (https://www.ncbi.nlm.nih.gov/) and we used the GISAID (Elbe & Buckland-Merrett, 2017)[https://www.gisaid.org/] to retrieve all available full-length sequences (n=28) of 2019- nCoV as on 27 Jan 2020. Multiple sequence alignment of all coronavirus genomes was performed by using MUSCLE software (Edgar, 2004) based on neighbour joining method. Out of 55 coronavirus genome 32 representative genomes of all category were used for phylogenetic tree development using MEGAX software (Kumar et al., 2018). The closest relative was found to be SARS CoV. The glycoprotein region of SARS CoV and 2019-nCoV were aligned and visualized using Multalin software (Corpet, 1988). The identified amino acid and nucleotide sequence were aligned with whole viral genome database using BLASTp and BLASTn. The conservation of the nucleotide and amino acid motifs in 28 clinical variants of 2019-nCoV genome were presented by performing multiple sequence alignment using MEGAX software. The three dimensional structure of 2019-nCoV glycoprotein was generated by using SWISS-MODEL online server (Biasini et al., 2014) and the structure was marked and visualized by using PyMol (DeLano, 2002).
Read the rest of the scientific report here