Unraveling the genetic variations underlying virulence disparities among SARS-CoV-2 strains across global regions: insights from Pakistan

Over the course of the COVID-19 pandemic, several SARS-CoV-2 variants have emerged that may exhibit different etiological effects such as enhanced transmissibility and infectivity. However, genetic variations that reduce virulence and deteriorate viral fitness have not yet been thoroughly investigated. The present study sought to evaluate the effects of viral genetic makeup on COVID-19 epidemiology in Pakistan, where the infectivity and mortality rate was comparatively lower than other countries during the first pandemic wave. For this purpose, we focused on the comparative analyses of 7096 amino-acid long polyprotein pp1ab. Comparative sequence analysis of 203 SARS-CoV-2 genomes, sampled from Pakistan during the first wave of the pandemic revealed 179 amino acid substitutions in pp1ab. Within this set, 38 substitutions were identified within the Nsp3 region of the pp1ab polyprotein. Structural and biophysical analysis of proteins revealed that amino acid variations within Nsp3’s macrodomains induced conformational changes and modified protein-ligand interactions, consequently diminishing the virulence and fitness of SARS-CoV-2. Additionally, the epistatic effects resulting from evolutionary substitutions in SARS-CoV-2 proteins may have unnoticed implications for reducing disease burden. In light of these findings, further characterization of such deleterious SARS-CoV-2 mutations will not only aid in identifying potential therapeutic targets but will also provide a roadmap for maintaining vigilance against the genetic variability of diverse SARS-CoV-2 strains circulating globally. Furthermore, these insights empower us to more effectively manage and respond to potential viral-based pandemic outbreaks of a similar nature in the future. Supplementary Information The online version contains supplementary material available at 10.1186/s12985-024-02328-8.

This table depicts the NCBI (National Center for Bioinformatics) derived accession numbers of pp1ab polyproteins, their amino acid (aa) length, defined name in literature and taxonomic classification.Dates of submission in NCBI are also given in the last column.Note: The list of corresponding homologous sequences from SARS-CoV-2 are given in Table S1.
This table shows the amino acid substitutions occurred in the polyprotein pp1ab of SARS-CoV-2 strains circulating in Pakistan during the 1 st pandemic wave (March 01, 2020, to June 30, 2020).The amino acid position numbers in the 2 nd column are according to the coordinates of polyprotein pp1ab of SARS-CoV-2.The 6 th column indicates the amino acid substitutions in SARS-CoV-2 sampled from Pakistani population in relation to the reference sequence for the Wuhan strain and closely related bat-CoVs.The 7th column shows the type of nonstructural protein where the particular substitution resides.Please note: the data presented in this table is derived from all available completely sequenced genomes of SARS-CoV-2 sampled from the Pakistani population during the first wave of the pandemic (March 01, 2020, to June 30, 2020).
Macrodomains of SARS-CoV-2 sampled from Pakistan during 1 st pandemic wave have experienced 10 substitutions.The amino acid positions given in 2 nd column are according to the coordinates of polyprotein pp1ab.The 6 th column indicates the amino acid substitutions in Pakistan's isolates in relation to the reference sequence for the Wuhan strain (YP_009724389.1) and closely related bat-CoVs.This table shows the effect of amino acid substitutions on the backbone torsion angles and secondary structure elements of Mac-1 and Mac-2 of SARS-CoV-2 sampled from Pakistan during first pandemic wave (March 01, 2020, to June 30, 2020).Protein structural effects were estimated in relation to the reference sequences/structures (6W02 for Mac-1 and YP_009724389.1 for Mac2).
The amino acid substitutions in pp1ab of SARS-CoV-2 (Pakistani isolates) in relation to the reference sequence for the Wuhan strain (YP_009724389.1) and closely related bat-CoVs.

Table S1 . The pp1ab polyprotein sequences derived from the genomes of SARS-CoV-2 sampled from different locations of Pakistan during first pandemic wave (March 01, 2020, to June 30, 2020).
This table provides the accession numbers of pp1ab polyprotein, their respective amino acid (aa) length, sample collection date, Pangolin lineage, and source locality.The clinical symptoms for subset of samples are also given.Please note; the data presented in this table is derived from all available completely sequenced genomes of SARS-CoV-2 sampled from the Pakistani population during the first wave of the pandemic (March 01, 2020 to June 30, 2020).

Table S5 . Estimation of physicochemical impact of amino acid replacements in the macrodomains.
In total ten amino acid replacements were detected in the macrodomains of SARS-CoV-2 sampled from Pakistani population during first pandemic wave (March 01, 2020, to June 30, 2020).The amino acid positions given in 2 nd column are according to the coordinates of Nsp3 protein.Locations of substitutions with respect to macrodomains type is provided in 3 rd column.The 4th column depicts the putative physicochemical impact of each replacement on protein/structure function, the number within brackets are the log odds associated with changing the amino acids.Positive numbers imply a preferred change, zero implies a neutral change and negative numbers imply an un-preferred change.The 5th column depicts the putative impact of each replacement on the stability of protein structure.The number within brackets depicts the protein stability free energy change (ΔΔG) upon single amino acid substitution.ΔΔG (kcal/mole) is calculated by using structure information.Positive values imply an increase in the stability of protein structure.Conversely, negative numbers imply a decrease in the stability of protein structure.

Table S8 : Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) based free energy calculations.
This table illustrates the Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) based free binding energy calculations of wild-type and mutant Mac1(M265I, G307C, L357I)-ADPr complexes.The column 2 nd .3 rd , 4 th and 5 th represent the Vander Waal energy, electrostatic energy, polar solvated energy by Generalized born model and nonpolar solvation energy, respectively.The 6 th column represents the total binding energy of all complexes.All energies are calculated in kcal/mol.

Table S9 : The non-structural protein-3 sequences derived from the genomes of subsequent variants of SARS-CoV-2 emerged after the first pandemic wave from Pakistan, spanning July 1,2020 to February 2024. S.No Protein accession Variant
PakistanThis table provides the accession numbers of Non-structural protein 3 (nsp3), variant name, sample collection date, and source locality.Please note; the data presented in this table is obtained from GISAID belongs genomes of SARS-CoV-2 that sampled after first pandemic wave from Pakistan (July 01, 2020 till date).

Table S10 : the amino acid substitutions in the macrodomains of SARS-CoV-2 variant that emerged after first pandemic wave in Pakistan, spanning July 1,2020 to February 2024.
This table indicates the nine amino substitutions in the macrodomains of SARS-CoV-2 variants that emerged after first pandemic wave.The amino acid positions given in 2nd column are according to the coordinates of Nsp3 protein.Locations of substitutions with respect to macrodomains type is provided in 3rd column.The 4rth column indicate the accession IDs of sequences in which corresponding mutation reside.