Evolution and heterogeneity of multiple serotypes of Dengue virus in Pakistan, 2006–2011

Background Even though dengue has been recognized as one of the major public health threats in Pakistan, the understanding of its molecular epidemiology is still limited. The genotypic diversity of Dengue virus (DENV) serotypes involved in dengue outbreaks since 2005 in Pakistan is not well studied. Here, we investigated the origin, diversity, genetic relationships and geographic distribution of DENV to understand virus evolution during the recent expansion of dengue in Pakistan. Methods The study included 200 sera obtained from dengue-suspected patients from 2006 to 2011. DENV infection was confirmed in 94 (47%) sera by a polymerase chain reaction assay. These included 36 (38.3%) DENV-2, 57 DENV-3 (60.6%) and 1 DENV-4 (1.1%) cases. Sequences of 13 whole genomes (6 DENV-2, 6 DENV-3 and 1 DENV-4) and 49 envelope genes (26 DENV-2, 22 DENV-3 and 1 DENV-4) were analysed to determine the origin, phylogeny, diversity and selection pressure during virus evolution. Results DENV-2, DENV-3 and DENV-4 in Pakistan from 2006 to 2011 shared 98.5-99.6% nucleotide and 99.3-99.9% amino acid similarity with those circulated in the Indian subcontinent during the last decade. Nevertheless, Pakistan DENV-2 and DENV-3 strains formed distinct clades characterized by amino acid signatures of NS2A-I116T + NS5-K861R and NS3-K590R + NS5-S895L respectively. Each clade consisted of a heterogenous virus population that circulated in Southern (2006–2009) and Northern Pakistan (2011). Conclusions DENV-2, DENV-3 and DENV-4 that circulated during 2006–2011 are likely to have first introduced via the southern route of Pakistan. Both DENV-2 and DENV-3 have undergone in-situ evolution to generate heterogenous populations, possibly driven by sustained local DENV transmission during 2006–2011 periods. While both DENV-2 and DENV-3 continued to circulate in Southern Pakistan until 2009, DENV-2 has spread in a Northern direction to establish in Punjab Province, which experienced a massive dengue outbreak in 2011.


Background
Dengue is an acute febrile illness with a clinical spectrum of mild flu-like illness (dengue fever or DF) to fatal complications of severe dengue hemorrhagic fever (DHF) and dengue shock syndrome (DSS). The disease is caused by the Dengue virus (DENV) complex that consists of 4 genetically and immunogenically distinct serotypes. DENV is a single stranded positive sense RNA virus with a 11.8 kb genome flanked by 5' and 3' untranslated regions and a single coding region for three structural proteins: capsid (C), the precursor of membrane (prM) and envelope (E); and seven nonstructural proteins: NS1, NS2A, NS2B, NS3, NS4A, NS4B and NS5 [1].
DENV is transmitted to humans through the bites of infected female mosquitoes of Aedes aegypti and Ae. albopictus. The transmission of DENV has increased in recent years in urban and semi-urban endemic settings, especially in Americas, South and South-east Asia and the Western Pacific. The magnitude, distribution, and clinical severity of dengue outbreaks have been alarmingly high in countries such as India [2], Sri Lanka [3], Nepal [4] Bangladesh [5] and Pakistan [6] in the Indian subcontinent during the last decade.
Pakistan is endemic to all four serotypes of DENV circulating throughout the year with a peak incidence during the post monsoon period between October and December [7,8]. Factors such as crowded cities, unsafe drinking water, inadequate sanitation and large number of refugees facilitate the spread of dengue in different parts of the country, resulting in increased morbidity and mortality. It is believed that DENV was first introduced into Pakistan through the importation of tyres containing eggs of infected mosquitoes at Karachi sea port [9]. Although serological evidence of DENV infections in Pakistan dates back to 1968 [10], the first confirmed outbreak associated with DHF occurred in the southern Pakistan city of Karachi in 1994 [11]. Serological studies confirmed the circulation of both DENV-1 and DENV-2 during the 1994 outbreak [12]. Since then, the disease has emerged as a major public health problem in the country [13]. DENV-3 was first reported during the 2005-2006 outbreak in Karachi [14,15]. By 2007, dengue started to emerge in the Northern Pakistan. DENV-2 and DENV-3 have been the dominant serotypes in Lahore from 2007 to 2009 [16]. Dengue showed a resurgence in November 2010, especially in Sindh, Punjab and Khyber Pakhtunkhwa regions subsequent to massive floods in July same year [6]. The outbreak escalated in Lahore, Punjab, in 2011 with 22,562 cases and 363 deaths due to severe DHF [6].
So far, there have been a few, comprehensive molecular epidemiological studies that describe DENV circulating in Pakistan. There is only one genome-wide analysis that describes Pakistani DENV-2 circulated during the 2011 outbreak [17]. The studies of molecular epidemiology and evolutionary genetics are important to predict the origin and spread of viruses, to strengthen our understanding on the pathogenesis of disease, the cause of epidemics and the genetic basis of virulence. RNA viruses such as DENV evolve rapidly [18] and on rare occasions, certain mutations lead to phenotypic changes in the viruses that alter their potential to cause outbreaks associated with severe disease [19]. Genotypic characterization has been a useful tool in determining the evolutionary origin of the DENV, identifying the circulating virus strains in an endemic area and detecting the introduction of new genotypes. In recent years, envelope (E) gene sequencing has been widely used to assess the phylogenetic relationships among DENV isolates [20]. In Pakistan, sequencing of NS3 gene [21] and C-prM gene [14,16] junction has mainly been used for the genetic analyses of DENV. Khan and co-workers recently performed a more comprehensive analysis of DENV2 strains associated with 2011 outbreak in Lahore [17]. Nevertheless, the use of relatively short (up to 400 bp) and small number of sequences in available studies have limited the resolution of phylogenetic analyses aimed at determining the origin and evolution of DENV in Pakistan during the last decade.
Therefore, in the present study, we conducted a detailed molecular characterization of Pakistan DENV strains using the complete E gene and whole genome analyses. DENV isolates obtained from different parts of Pakistan between 2006 and 2011 were studied, in order to obtain a temporal evolutionary perspective of DENV in the country over a 6-year period. We describe the likely origin, evolution and geographic distribution of DENV that explain the recent changes of dengue epidemiology in Pakistan.
We detected only DENV-3 among sera collected during the 2006-2007 periods, whereas DENV-2, DENV-3 and DENV-4 were observed in sera collected after 2008. This is in agreement with previous findings that DENV-2 became dominant during the 2007-2009 periods [16]. The absence of DENV-1 in our cohort was not surprising as it has not been documented in Pakistan between 2006 and 2009 [9,16] and was a minor serotype in 2010 in Northern parts of the country [22]. During the 2006-2009 periods, 87.7% (n=71) of analysed sera were collected in Karachi, the capital of Sindh province in Southern Pakistan. On the other hand, 69.2% (n=9) of cases in 2011 were from Lahore in Punjab province of the Northern Pakistan where a major dengue outbreak was reported in the same year. This case distribution pattern reflects the dengue epidemiology in Pakistan, where the disease was initially restricted to the southern part of the country, particularly the port city of Karachi, and subsequently expanded to the north, causing major epidemics. The spread of dengue to other parts of the country is believed to be either due to the movement of people carrying the virus or mosquito eggs carrying vertically-transmitted DENV from dengue-affected areas [6].
In order to understand the origin and evolution of different DENV serotypes during their spread in Pakistan, we performed a detailed analysis of 13 full genomes (6 DENV-2, 6 DENV-3 and 1 DENV-4) and 49 envelope (E) gene sequences detected in Southern and Northern parts during the 2006-2011 period. All E gene sequences were obtained directly from PCR-positive sera and comprised 26 DENV-2, 22 DENV-3 and 1 DENV-4 strains. Full genome sequencing was performed on viruses isolated from sera.

Characteristics of DENV-2 in Pakistan
Phylogenetic analysis of 26 DENV-2 E gene sequences obtained between 2008 and 2011 revealed that all viruses belonged to Indian subcontinent lineage of the cosmopolitan genotype. Although there were more DHF compared to DF among the samples collected, there was no unique signature in the E gene sequences that could differentiate between DF and DHF/DSS samples. With the exception of 1 isolate (D2-PS51) that was closely related to a previous strain from Saudi Arabia, the remaining 25 isolates formed a distinct monophyletic clade with 78% bootstrap support ( Figure 1). In DENV-2 whole genome analysis, the same clade achieved 100% bootstrap support, strengthening the credibility of the clade further (data not shown). This monophyletic clade included DENV-2 isolates obtained from 14 sera Indian/Sri Lankan strains. Furthermore, whole genome analysis revealed a unique combination of 8 amino acid substitutions that was shared between the monophyletic clade of Pakistan DENV-2 and Indian/Sri Lankan isolates (Table 2). These findings indicated the close genetic relationship among DENV-2 strains circulated in the Indian subcontinent during the last decade. Even though the earliest DENV-2 isolate of our study cohort was in 2008, estimation of time to the most recent common ancestor (TMRCA) dated the origin of Pakistan DENV-2 clade to early 2006 [95% highest probability density (HPD) 2004. 5 -2007.4]. Given the fact that genetically similar strains existed in the region even in 2003, it is likely that the 2008-2011 clade of Pakistan DENV-2 viruses descended from an ancestral strain that was previously circulating in other parts of the Indian subcontinent region. Further evolution in subsequent years has deviated Pakistan DENV-2 from its common ancestor to form a monophyletic clade by 2008. During this process, Pakistan DENV-2 strains have acquired 2 additional fixed amino acid substitutions (NS2A-I116T and NS5-K861R), of which NS5-K861R is unique to Pakistan strains ( Table 2). In DENV-2, residue NS5-861 is a strictly conserved Lysine. NS5-861 resides in the "thumb" domain of the Cterminal end of RNA dependent RNA polymerase (RdRp) of DENV [23]. This residue sits outside structural elements of thumb domain that form the RNA template tunnel (residues 740-747 and 782-809) and a zinc binding site (residues 712, 714, 728 and 847). Therefore, even though the functional implications of the Lysine to Arginine substitution in our isolates are not clear, major structural changes are not expected as the K861R substitution does not alter the amino acid properties at the residue. Furthermore, our findings indicated an in-situ spatiotemporal evolutionary pattern of Pakistan DENV-2 during the period from 2008 to 2011. This was evident by the step-ladder pattern topology of the 2008-2011 Pakistan clade (Figure 1), in which the 2009 viruses in Southern Pakistan were ancestral to the distinct subclade (99% bootstrap support) formed by 2011 viruses in the North. This 2011 sub-clade possessed a unique substitution (prM-S70A) that was shown to be under the diversifying selection (p=0.05, Table 2) by the mixed effect model of evolution (MEME) method. In contrast, single likelihhod ancestor counting (SLAC), fixed effects likelihood (FEL) [27] and internal fixed effects likelihood (IFEL) [28] methods showed the same residue to be under purifying selection. MEME is one of the methods available in Hyphy package to perform the codon-based selection pressure analysis [29]. A simulation study using MEME has shown that MEME is preferable to other methods as it is capable of identifying not only pervasive positive selection at the level of an individual site, but also episodic (transient) positive selection affecting only a subset of lineages [30]. Episodic selection is often missed by other methods that rely on the mean nonsynonymous to synonymous ratio (dN/dS) over a period of time [31]. Sequences from another recent study [17] have also shown prM-S70A to be fixed and unique to isolates from Lahore in 2011. Therefore, we postulate that amino acid residue prM-70 is likely to have been selected for during the DENV-2 outbreak in Northern Pakistan in 2011. As the support for diversifying selection in MEME analysis is marginally significant (p=0.05) and we obtained contrasting outputs from SLAC, FEL and IFEL methods, it is difficult to conclude whether the fixation of prM-S70A among Pakistani DENV-2 strains is either due to positive selection or genetic drift. Nevertheless, codon 70 constitutes the variable, middle amino acid residue of the N-glycosylation site sequon (N-S-T) of DENV-2 prM protein [1]. N-glycosylation of prM protein is believed to play an important role in virion assembly and transport before release [1]. The prM-70 residue is a strictly-conserved Serine in human DENV-2 isolates, except for two sylvatic viruses (N-A-T) reported from Malaysia in 2008 [24] and 1970 [25]. Therefore, whether this substitution from a hydrophilic Serine to a hydrophobic Alanine at prM-70 residue alters the conformation and thereby the functionality of the Nglycosylation site providing a selective advantage to DENV is of interest and necessitates further investigation. As illustrated in Figure 1, virus populations from Southern and Northern areas are heterogenous, indicating the possibility of sustained virus transmission in both areas driving the viral diversity. While the majority of 2011 viruses belonged to a distinct clade, some remained within a 2009 virus cluster (Figure 1). Based on these observations, it is likely that DENV-2 first expanded in Southern Pakistan and subsequently spread to the Northern region where certain sub-population/s evolved further to become a distinct viral population in 2011. Therefore, the overall epidemiological understanding of South to North expansion of dengue in Pakistan is supported by our genetic analysis of DENV-2.

Characteristics of DENV-3 in Pakistan
The emergence of DENV-3 in Pakistan was first reported during the 2005 DHF outbreak in Karachi [14]. Previous studies have revealed that DENV-3 sequences from the 2005 and 2006 outbreaks were most closely related to those circulating in India [14,32] and belonged to genotype III [16]. In support, all 22 DENV-3 E gene sequences from our study also belonged to the genotype III ( Figure 2). Furthermore, all of our study isolates that were  Table 3) further supported the previous notion that Pakistan DENV-3 isolates are genetically related to those circulating in India in preceding years. Of these, NS1-L178M was found to be a positively selected site (p: <0.01. Table 3) in the MEME-based selection pressure analysis, suggesting a lineage-specific selective advantage of DENV-3 that circulated in the Indian subcontinent before 2009. SLAC, FEL and IFEL-based slection pressure analyses predicted the same site to be neutral. The residue 178 sits in domain II of flavivirus NS1 protein [34], adjacent to one of the 12 strictly conserved Cysteine residues (DENV-3-Cys179) that form disulfide linkages [1]. Disulfide bridges are believed to play an important role in NS1 dimerization [35]. Interestingly, residue 178 appears to follow a lineagespecific variability as DENV-3 genotype III lineage in South-east Asia (Vietnam, Cambodia, Thailand) shows a fixed NS1-L178S substitution. This observation, together with our data, indicates the likelihood of episodic diversifying selection at DENV-3 NS1-178 residue, as predicted by the MEME analysis for positive selection pressure.
Overall, these findings suggested an Indian subcontin-   (Figure 2). In DENV-3 whole genome phylogeny, the bootstrap support of the same clade improved to 100% (data not shown). Our whole genome analysis revealed that this genetic deviation of Pakistan strains from Indian sub continent isolates was driven by two additional fixed amino acid substitutions (NS3-K590R and NS5-S895L). Of these, NS3-K590R is unique to Pakistan DENV-3 isolates. NS3-K590R falls within a loop structure between alpha helices 9 and 10 of flavivirus helicase domain 3 [36]. Even though domain 3 does not house major functional elements of helicase activity, it contributes to structural stability and the formation of RNA binding cleft [36][37][38][39]. Domain 3 is also believed to act as the RNA-dependent RNA polymerase binding site [36]. As the residue 590 does not carry any known structural significance, the effect of NS3-K590R substitution on the domain 3 functionality of our study isolates is unclear. In summary, our analysis indicates that 2006 DENV-3 strains emerged from an ancestral strain that introduced into Southern Pakistan, possibly

Characteristics of DENV-4 in Pakistan
DENV-4 was first reported in Lahore during the 2008 dengue outbreak in Punjab, Pakistan [40]. Phylogenetic analysis showed that DENV-4 from the Indian subcontinent formed a distinct group within genotype I (Figure 3). A DENV-4 isolate obtained from Karachi in 2009 during our study clustered with another Pakistan sequence reported previously in the same year [41]. These Pakistan isolates shared the highest nucleotide (98.5%) and amino acid (99.3%) with a DENV-4 strain (GenBank: HM237349) from Andra Pradesh in Southren India in 2007 [42]. Therefore, it is evident that DENV-4 strains closely related to those in Pakistan in 2009 have previously been circulating in the Indian subcontinent. Nevertheless, more sequence data from a wider geographical scale in the Indian subcontinent region is needed to confirm the spread and evolutionary history of DENV-4 in Pakistan.   evolution to emerge as distinct, heterogenous virus populations during the same period. While both DENV-2 and DENV-3 continued to circulate in Southern Pakistan until 2009, DENV-2 has subsequently spread in a Northern direction, a move that caused a massive outbreak in Punjab province in 2011. The study, therefore, highlights the implication of human migration and virus evolution in the expansion of dengue to areas where the disease was not previously a major public health threat, and the emergence of new strains that may be associated with severe outbreaks.

Sample collection
Aga Khan University Hospital (AKUH) is a 550 bed tertiary care hospital located in Karachi, Pakistan. The hospital is equipped with a state of the art clinical laboratory that caters not only for the in-patients but also reaches out to the community. It has a unique network of 200 sample collection units spread throughout the country. The laboratory receives samples from inpatients and outpatients of AKUH and from other clinics and hospitals in all major cities of the country. Karachi witnessed its first ever severe dengue outbreak in 2005-06 during which the clinical laboratory of AKUH was at the forefront of dengue diagnosis. The patients presenting to the clinics and in-patients of the AKUH Karachi, with the history of fever, headache, joint pain, vomiting and diarrhoea were recruited based on presumptive clinical diagnosis of DENV infection. In addition, samples obtained from out patients at collection units were also included. After taking informed consent, a complete Performa about patient's demographics and clinical presentation were filled. Serum samples of each patient were stored in three aliquots at −80°C. Two hundred dengue-suspected sera collected from 2006 to 2011 were retrieved from the serum bank at AKUH and were shipped to Environmental Health Institute, National Environment Agency, Singapore for the molecular epidemiological analysis. The work presented here was approved by the Ethical Review Committee, Aga Khan University (AKU), Pakistan.

Virus isolation
DENV was isolated from sera using the Ae. albopictus clone C6/36 mosquito cell line (ATCC CRL-1660). Briefly, the monolayer of C6/36 was inoculated with serum in L-15 maintenance medium containing 3 % FCS at 33°C to allow for virus adsorption and replication. The infected fluid was harvested after 5-10 days. The presence of DENV in cell supernatants was confirmed by an immunofluoresent assay (IFA) during which cells were reacted with either DENV group-specific or serotype-specific monoclonal antibodies derived from hybridoma cultures (ATCC HB-46, HB-47, HB-48, and HB-49). The fluorescein isothiocyanate-conjugated goat anti-mouse antibody was used as the detector. A maximum of three passages was done for each sample based on the original virus load.
RNA extraction and conventional PCR for detection and serotyping of Dengue virus Viral RNA was extracted either from sera or culture supernatants using the QIAGEN QIAamp viral RNA minikit (QIAGEN, Hilden, Germany) according to the manufacturer's recommendations.
Conventional semi-nested PCR was performed using a modified procedure described by Lanciotti and colleagues [43]. A one-step RT-PCR was performed using the AccessQuickTM RT-PCR System (Promega, Madison, WI, USA) in a 50 μl reaction volume containing 1X AccessQuick TM Master Mix, 2.5 units of AMV Reverse Transcriptase and 0.25 μM (each) of Primers D1 and D2 [43]. The reactions were carried out in Tgradient PCR Thermocycler (Biometra, Gottingen, Germany) using the following programme; reverse transcription at 45°C for 30 min, inactivation at 94°C for 2 min, and PCR amplification of 40 cycles under the following conditions: 94°C for 30 sec, 55°C for 30 sec, 72°C for 1 min, and a final extension at 72°C for 5 min. Five microlitres of the 1:100 dilution of the first-round PCR product was further amplified in a 50 μl reaction mixture containing 1X Green GoTaq® Flexi Buffer, 1.5 mM MgCl2, 0.2 mM dNTPs, 1.25 units of GoTaq® DNA Polymerase (Promega, Madison, WI, USA) and 0.5 μM of each primer D1, TS1, TS2, TS3 and TS4 [43]. After denaturation at 95°C for 2 min, the second round amplification was performed for 30 cycles (95°C for 30 sec, 55°C for 30 sec, 72°C for 1 min, and a final extension at 72°C for 10 min). Amplified products were visualized in 2% agarose gels stained with GelRed (Biotium Inc., USA).

PCR amplification of envelope gene for sequencing
The synthesis of complementary DNA (cDNA) from RNA extracted from sera was carried out using Superscript™ III First-strand Synthesis System (Invitrogen, Carlsbad, USA). The complete envelope (E) gene (~1.5 kb) was amplified by PCR using 0.5 μM of DENV serotype-specific primers (Additional file 2) and 1X Phusion™ Flash High-Fidelity PCR Master Mix (Finnzymes, Lafayette, CO). The amplification protocol is as follows: initial denaturation at 98°C for 5 sec, 35 cycles of denaturation at 98°C for 5 sec, annealing at 60°C for 8 sec, extension at 72°C for 25 sec and final extension at 72°C for 1 min. Semi-nested PCR (~1.4 kb, Additional file 2) was performed as above if no desired product yielded in the first-round amplification. Amplified products were visualized in 2% agarose gels stained with GelRed (Biotium Inc., USA).

Whole genome sequencing
A total of 13 DENV isolated from DENV-2 (n=6), DENV-3 (n=6) and DENV-4 (n=1) positive sera were selected for full genome sequencing. Isolates were selected to represent different groups of viruses based on E gene phylogeny. Amplification and sequencing primers were designed to cover the entire genome of different serotypes in overlapping fragments (Additional files 3, 4 and 5). Additional two sets of primers (Additional file 6) were used to capture 5' and 3' untranslated regions of complete genome for all four DENV serotypes. The PCR reaction was performed using the Phusion™ Flash High-Fidelity PCR Master Mix (Finnzymes, Lafayette, CO) as described for E gene amplification. Amplified products were visualized in 2% agarose gels stained with GelRed (Biotium Inc., USA).
Purification and sequencing of PCR products PCR products were purified using the Qiaquick PCR Purification kit (Qiagen, Hilden, Germany) according to manufacturer's instructions. Sequencing of purified PCR products was perfromed by a commercial sequencing company (AITbiotech Pte Ltd, Singapore) according to the BigDye Terminator Cycle Sequencing kit (Applied Biosystems, USA) protocol. Seqeunces were deposited in Genbank database under the accession numbers KF041212 -KF041260.

Phylogenetic and mutation analyses of envelope genes and complete genomes
Nucleotide sequences were assembled using the Lasergene package version 8.0 (DNASTAR Inc., Madison, WI, USA). Contiguous sequences were aligned using ClustalX program [44] and compared with published sequences of DENV isolates in Genbank database. Phylogenetic analysis of E gene sequences was performed in MEGA5 program [45] using the maximum-likelihood method based on the general time reversible (GTR) model with gamma distribution and invariant sites. The robustness of the original tree was tested with 1000 bootstrap replications. Comparison of amino acid sequences deduced from nucleotide alignments was performed by using the BioEdit v7.0.5 software to identify unique mutations in study isolates.
Estimation of the time of viral emergence for DENV-2 and DENV-3