Phylogenetic relationship of dengue virus type 3 isolated in Brazil and Paraguay and global evolutionary divergence dynamics

Background Dengue is the most important mosquito-borne viral disease worldwide. Dengue virus comprises four antigenically related viruses named dengue virus type 1 to 4 (DENV1-4). DENV-3 was re-introduced into the Americas in 1994 causing outbreaks in Nicaragua and Panama. DENV-3 was introduced in Brazil in 2000 and then spread to most of the Brazilian States, reaching the neighboring country, Paraguay in 2002. In this study, we have analyzed the phylogenetic relationship of DENV-3 isolated in Brazil and Paraguay with viruses isolated worldwide. We have also analyzed the evolutionary divergence dynamics of DENV-3 viruses. Results The entire open reading frame (ORF) of thirteen DENV-3 isolated in Brazil (n = 9) and Paraguay (n = 4) were sequenced for phylogenetic analysis. DENV-3 grouped into three main genotypes (I, II and III). Several internal clades were found within each genotype that we called lineage and sub-lineage. Viruses included in this study belong to genotype III and grouped together with viruses isolated in the Americas within the lineage III. The Brazilian viruses were further segregated into two different sub-lineage, A and B, and the Paraguayan into the sub-lineage B. All three genotypes showed internal grouping. The nucleotide divergence was in average 6.7% for genotypes, 2.7% for lineages and 1.5% for sub-lineages. Phylogenetic trees constructed with any of the protein gene sequences showed the same segregation of the DENV-3 in three genotypes. Conclusion Our results showed that two groups of DENV-3 genotypes III circulated in Brazil during 2002–2009, suggesting different events of introduction of the virus through different regions of the country. In Paraguay, only one group DENV-3 genotype III is circulating that is very closely related to the Brazilian viruses of sub-lineage B. Different degree of grouping can be observed for DENV-3 and each group showed a characteristic evolutionary divergence. Finally, we have observed that any protein gene sequence can be used to identify the virus genotype.


Background
Dengue is the most important mosquito-borne viral disease in tropical and subtropical regions. An estimated 50 million dengue infections occur annually and approximately 2.5 billion people live in dengue endemic countries [1]. Dengue virus (DENV) infection can be assyntomatic or lead to a wide spectrum of clinical manifestation, ranging from an undifferentiated fever, the self-limiting non-severe dengue fever (DF), to the severe dengue haemorrhagic fever (DHF), sometimes with fatal outcomes.
DENV-3 was re-introduced into the Americas in 1994, specifically in Nicaragua and Panama, and then spread to other Central American countries, Mexico, the Caribbean countries, and finally South America [4][5][6][7][8][9][10]. In 2001/2002, a large outbreak of DENV-3 occurred in Rio de Janeiro [9,11]. DENV-3 demonstrated its greatest epidemic potential, spreading into most of the Brazilian States and, by March 2002, into the neighboring country, Paraguay [9,12]. Our previous phylogenetic study based on the E protein gene and the 3′UTR has suggested that DENV-3 was introduced into Brazil through Rio de Janeiro as well as by the Northern Region, at least in three different occasions and subsequently has spread to Paraguay [12]. Several phylogenetic studies using partial genomic sequences of DENV were carried out to analyze its molecular epidemiology [13][14][15][16]. However, it is believed that a better picture of the dynamics of viral populations could be analyzed by sequencing the entire viral genome. Recently, complete genome analyses have been performed to study DENV phylodinamics in Singapore and India [17,18]. In the present study, we have analyzed the phylogenetic relationship of 13 DENV-3 isolated in Brazil and Paraguay with viruses isolated worldwide, using their entire RNA genome sequences.

Genome sequencing and phylogenetic analysis of viruses isolated in Brazil and Paraguay
Six fragments with overlapping regions amplified by RT-PCR were subjected to direct nucleotide sequencing to obtain the full-length genome sequence of the 13 viruses included in this study ( Table 1). The assembled sequences showed that most of the viruses have a genome of 10,707 nucleotides, with the exception of D3BR/ MR9/2003 isolate, which showed a deletion of 8 nucleotides between positions 10,276 and 10,284 at the 3′UTR of the viral genome. The same deletion was also observed in sequences retrieved from the GenBank for 7 viruses isolated in Brazil, 1 in Puerto Rico, and 25 in Vietnam (Additional file 1).
For phylogenetic analysis, we retrieved from the Gen-Bank the so-called complete DENV-3 sequences. However, several sequences lack 5′ and 3′ ends or had the UTRs not validated. Therefore, in the alignment we included only the open reading frame sequences (10,168 nucleotides long). Thus, the alignment included our 13 sequences and 527 sequences of DENV-3 deposited in the GenBank. Based on this alignment, a phylogenetic tree was constructed (Figure 1), showing that DENV-3 comprises three genetic groups or genotypes (I, II and III). The 13 isolates described in this study were grouped within genotype III, together with viruses isolated in the Americas.

Phylogenetic relationship of genotype III viruses
To perform a more accurate analysis of the phylogenetic relationships of viruses isolated in Brazil and Paraguay, ORF sequences of genotype III viruses (n = 347) were used to construct other phylogenetic trees using distance and Bayesian methods; these phylogenetic trees showed similar topology ( Figure 2). In addition, the evolutionary divergence among sequences and the presence of amino acid motif were analyzed. Based on the topology of the tree, the frequency distribution profile of the divergence among sequences ( Figure 3) and the presence of characteristic amino acid motifs (Additional file 2), genotype III viruses were clustered into three lineages (I, II and III) ( Figure 2). Viruses of lineage III were further clustered into four monophyletic groups (sub-lineages A, B, C, and D) ( Figure 2). The mean divergence among lineages ranged from 2.9 to 3.4% for nucleotide sequences (Table 2), and from 0.6 to 1.0% for amino acid sequences (Table 2). While the mean divergence among sub-lineages ranged from 1.0 to 1.6% for nucleotide sequences, and from 0.5 to 0.6% for amino acid sequences ( Table 2). All the mean values coincided with the highest peaks in the frequency distribution profile ( Figure 3). The viruses described in this study grouped together with viruses isolated in the Americas within the lineage III, distributed into two different groups, sublineages A and B. The Brazilian viruses isolated in the Northern Region, BR_BV4_02 from Boa Vista, Roraima, and BR_BR8_04 from Belen, Para, grouped with other Brazilian viruses isolated in the same region, with viruses isolated in the Caribbean islands (Martinique, Trinidad and Tobago, St. Lucia, Anguilla and Puerto Rico) and in the north of South America (French Guyana and Venezuela) ( Figure 2, sub-lineage A). The other Brazilian isolates (BR_PV1_03, BR_CU6_02, BR_MR9_03, BR_SL3_02, BR_ACN_07, BR_AL95_09, and BR_RP1_2003) and the Paraguayan isolates (PY_AS10_03, PY_AS12_02, PY_SUS_ 03, PY_PJ4_03) grouped with viruses isolated in other regions of Brazil within the sub-lineage B (Figure 2, sublineage B). Sub-lineage C includes viruses isolated in Nicaragua, Puerto Rico, Venezuela, Peru and Ecuador. Most of the viruses isolated in Venezuela and Puerto Rico belong to the sub-lineage D, which also include viruses isolated in Colombia. Interestingly, BR_V2386_03 is the only Brazilian virus that grouped with viruses of sub-lineage D. Finally, the lineage I is composed only by old isolates of Sri-Lanka (1983)(1984)(1985)(1986)(1987)(1988)(1989) and lineage II by viruses isolated in Asia (1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005). In addition, viruses from Asian (two viruses from SriLanka isolated in 1989 and 1997; one virus from China isolated in 2009) and East Africa (one virus from Mozambique isolated in 1985 and one virus from India isolated in 2003) are located basally in the branch that containing the American isolates (Lineage III) and do not form a monophyletic group.

Phylogenetic relationship among viruses from genotypes I and II
Considering that genotypes III viruses clustered into separate monophyletic groups, which we called lineages and sub-lineage, we analyzed whether a similar clustering could be observed for genotype I and II viruses. The phylogenetic relationship analysis was carried out as mentioned above for genotype III viruses. Genotype I viruses were clustered into two lineages (I and II); viruses of lineage II were segregated into two sublineages (I and II) and sub-lineage II includes three internal monophyletic groups (A, B and C) ( Figure 4, Additional file 3). The mean nucleotide divergence between lineages I and II was 4.9% (Table 3), while the mean amino acid divergence was 1.8% (Table 3). The mean divergence among sub-lineages I and II was 3.6% for nucleotide sequences and 1.2% for amino acid sequences (Table 3). Finally, the divergence among the internal groups (A, B and C) of sub-lineage II ranged from 2.3 to 2.8% for nucleotides sequences and from 0.6 to 0.9% for amino acid sequences (Table 3). Once again, all the mean values coincided with the highest peaks in the frequency distribution profile ( Figure 5).
Genotype II viruses segregated into four lineages ( Figures 6 and 7, Additional file 4). The lineage IV includes four sub-lineages (A, B, C and D). The mean nucleotide divergence among lineages ranged from 2.4 to 3.2% (Table 4), while the mean amino acid divergence ranged from 0.8 to 1.5% (Table 4). The mean divergence among sub-lineages ranged from 1.2 to 1.9% for nucleotide sequences and from 0.4 to 0.7% for amino acid sequences (Table 4).

Evolutionary divergence among genotypes
We have shown above the evolutionary divergence among the different monophyletic groups within each genotype. In this section, we analyzed the evolutionary divergence among genotypes using the entire ORF sequence and, individually, each protein gene sequence ( Table 5 and Figure 8). The mean divergence among genotypes ranged from 6.6 to 6.8% for nucleotide and from 3.1 to 3.4% for amino acid when the entire ORF was analyzed ( Figure 8). Analyzing each viral protein, the mean divergence among genotypes varied from 4.1 to 8.8% for nucleotide and from 0.5 to 5.3% for amino acid ( Table 5). The lowest divergence was observed for C protein (4.1 to 4.9%) and the highest for NS2a (8.4 to 8.8%), NS4a (7.7 to 8.5) and E (7.0 to 7.6) proteins when the nucleotide sequences were analyzed (Table 4). While NS2b (0.5 to 1.0%), NS4b (0.8 to 1.1%) and NS3 (1.2 to 1.5%) proteins showed the lowest divergence and NS2a (3.0 to 3.4%) NS4a (2.8 to 3.6%) and E (2.3 to 3.0%) protein the highest divergences when the amino acid sequences were analyzed ( Table 5). The means p-distance of the each of the genomic regions are coinciding with the highest peaks of the frequency distribution profile (Additional files 5 to 14).

Genotypes identification
To identify which genomic region is responsible for the segregation of DENV-3 into different genotypes, several phylogenetic trees were constructed using, individually, the sequences coding for each viral protein (Additional files 15 to 24). All trees showed the same segregation of  the viruses in three genotypes viruses segregated to the same genetic groups as observed when the entire ORF was used (Figure 1), except for the prM sequences, which the genotype I not formed a monophyletic group. The constructed tree based on the NS4A protein coding sequence, however, showed that CH_80_2, DENV-3 isolated from a patient in China in 1980, and, therefore, others viruses with identical sequences (M93130 and BR/RO1/02, not included in this tree), grouped with viruses from genotype II and not with those from genotype I, as observed in all other phylogenetic trees.

DENV-3 was introduced into the Americas via Panama
and Nicaragua in 1994 [4], spreading rapidly to neighboring countries and reaching Brazil through Rio de Janeiro in December 2000 [9]. Interestingly, DENV-1 and DENV-2 were also introduced into Brazil through Rio de Janeiro [19]. Thus, it seems that the main route of entrance for DENV in Brazil was always Rio de Janeiro. Analyzing the E gene and the 3′UTR sequences, we have previously found that isolates BR/BV4/02 isolated in Boa Vista, Roraima, and BR8/04 isolated Belem, Para, in the Northern Region of the country, were phylogenetically more closely related to viruses isolated in the Caribbean islands rather than to those isolated in Rio de Janeiro, suggesting that DENV-3 was also introduced into Brazil by the northern region [12]. In the present study, now analyzing the entire ORF, the isolates BR/BV4/02 and BR/BR8/04, as well as new viruses isolated between 2006 and 2007 by other groups, were phylogenetically more closely related to viruses isolated in the Caribbean islands than to viruses circulating in Brazil, supporting the hypothesis that DENV-3 was introduced by the Norther Region of the country in addition to the well documented introduction by Rio de Janeiro. According to our phylogenetic analysis, at least two groups of DENV-3 genotype III are circulating in Brazil (Figure 2, sub-lineages A and B). A single virus (BR_V2387_03, FJ850079) isolated in the Northern Region of Brazil was more closely related to viruses isolated in Venezuela, suggesting that was an imported case. In addition to the DENV-3 genotype III, recent studies have shown that genotype I viruses are also circulating in Brazil [20][21][22]. The Paraguayan isolates were closely related to viruses isolated in Brazil within sub-lineage B of lineage III, suggesting that these viruses were introduced into Paraguay from Brazil as previously described [12].
A previous phylogenetic analysis of DENV-3 genotype III isolated in Sri Lanka, based on a 966 nt fragment spanning part of capsid, preM/M and part of E genes, identified the emergence, after 1989, of a new sub-type, which was correlated with severe disease epidemics that spread to Africa and then to the Americas [15]. Our phylogenetic analysis showed a similar distribution of DENV-3 genotype III isolates ( Figure 2). Thus, our lineage I corresponds to DENV-3 genotype III circulating in SriLanka before 1989, which was called as sublineage A by Messer and colleagues (2003). On the other hand, LK_V2411_89 and LK_V2409_97 viruses, located basally in the branch of lineage III, could correspond to the new sub-type described by Messer and colleagues (2003) that they called Group B. The topology of our phylogenetic tree suggests that this last sub-type migrated to East Africa and Indian subcontinent and later to the Americas in agreement with observations made by Messer and colleagues (2003). Interestingly, phylogentic analysis of genotype III showed also that NI/ V2420/1994 virus isolated in Nicaragua is located at the base of the branch containing the American isolates (Lineage III), suggesting that the introduction of genotype III in the Americas occurred through Nicaragua in 1994, in agreement with the epidemiological data [4,23].
Previous studies analyzing the C, PrM, E and NS3 genes have shown that DENV-3 segregated into four genotypes [13,14,24,25]. In this study, we have found that DENV-3 segregated in the same genotypes I, II and III as shown in the previous studies mentioned above, analyzing either the entire ORF or each protein gene sequence. Thus, DENV-3 genotype can be determined by sequencing any part of the ORF. The genotype IV was not observed in our analysis because no complete genome sequence of any of these viruses is available in the GenBank. Klungthong and colleagues (2008) have also suggested that any genomic region can be sequenced to determine the genotype [26]; however, these authors used the entire genomic sequences of only 12 isolates, while our results were supported by the analysis of more than 500 isolates.
Similar to other RNA viruses, DENV exhibit a high degree of genetic variation due to the non-proofreading activity of its RNA polymerase, the high rates of mutation, the immense population size, and the immunological pressure, leading to the emergence of new subtypes of DENV [27]. Recently, we have described the existence of various taxa or viral sub-types within DENV-3 genotypes by analyzing of the E protein gene [28]. In this study, we have found similar, and even new, internal groups within each genotype. The segregation of the viruses into genotypes, lineage, sub-lineage and groups as suggested in this study were supported by high posterior probability, by nucleotide divergence, and by the presence of characteristic amino acid motifs.
The first studies that identify different genotypes within each DENV serotypes were based on the topology of the trees, supported by bootstrap values [13,24,[29][30][31][32]. The nucleotide divergence among the genotypes of DENV-1 and DENV-2 was in average 6% when the genomic sequence corresponding to the junction E/NS1 (240 bp) was analyzed [13], and 7% in mean when the E protein gene (1,485 bp) was analyzed for DENV-1 and DENV-2 [31,32]. In this study, we carried out a more detailed analysis of the nucleotide divergence among the different genetic groups of DENV-3. We have observed that nucleotide divergence varied in average 6.7% for genotypes, 2.7% for lineages and 1.5% for sub-lineages when the complete ORF of genotypes II and III was analyzed. For genotype I, a higher nucleotide divergence rate was observed among lineages and sub-lineages.
In addition, our analysis showed that the nucleotide divergence among genotypes varied depending on the genomic region, ranging from 4.1% for C protein gene to 8.8% for NS2a protein gene. These comparisons showed also that C protein gene sequence is the most conserved, and NS2a, NS4a, E protein genes the more variables. However, when the amino acid divergence was analyzed, a different picture was observed; NS2b, NS4b and NS3 were the more conserved proteins, while NS2a, NS4a, E proteins were the more variables. Knowledge of the rates of divergence among the different taxonomic levels, are an important tool for the detection of new viral groups, as well as, information about the variability of each of the genes among different viral groups, could be used to select targets for: the design of probes for diagnosis, antiviral therapies and the construction of candidate vaccines.
Recently, Wittke and colleagues have suggested the existence of an additional genotype (genotype V) within DENV-3 [33]. Our previous phylogenetic analysis based on the E protein, however, has suggested that the genotype V corresponds to a lineage within the genotype I [28]. In this study, the nucleotide divergence among lineage I (called genotype V by Wittke and colleagues) and lineage II of genotype I was 4.9% in average, lower than the 6% observed among genotypes. Therefore, we suggest the maintenance of the classification of DENV-3 into four genotypes as previously proposed [14,24].
In this work, we performed phylogenetic analysis and evolutionary divergence dynamics of DENV-3 and provided data related to the processes that control the viral evolution. These data will be useful to better characterize the DENV-3 epidemics in future and might even be used for selection of vaccine candidates.

RNA extraction
Viral RNA was extracted from 140 μl of supernatant of infected C6/36 cells using the QIAamp Viral RNA kit (Qiagen, Germany), following the manufacturer's recommendations. The RNA was eluted with 80 μl of DNase/ RNase free water.   The reaction mixture was heated at 94°C for 2 min followed by 45 amplification cycles: 94°C for 10 s, 56°C for 1 min, 68°C for 5 min, and a final extension at 68°C for 7 min. The PCR products were subjected to electrophoresis in a 1% agarose gel and visualized under UV light after staining with ethidium bromide. Bands of DNA were purified from agarose gels using a QIAquick Gel Extraction Kit (Qiagen W , Germany) following the manufacturer's specifications.

Nucleotide sequencing
The gel-purified DNA fragments were sequenced using an Applied Biosystems BigDye ddNTP capillary sequencer ABI 3130 (Applied Biosystems, USA) following the manufacturer's specifications. Both strain of each DNA fragments were sequenced at least three times using walking primers (Additional file 25B). Viral genome sequences generated in this study were deposited in the   (JF808128), PY_AS10_02 (JF808129) and D3BR_RP1_2003 (EF643017). The electropherograms were analyzed using the MEGA 5.0 program and the consensus assemble were carried out using the BioEdit v.7.0.9 program [36,37]. The 30 nucleotides at 5′ and 3′ ends of each fragment were deleted to avoid the influence of primers used for amplification. However, the sequences at 5′ and 3′ ends of the viral genome represent the sequences of the primers used to amplify these regions.

Phylogenetic and evolutionary analysis Database of DENV-3 sequences
A database containing complete genome sequences of DENV-3 retrieved from GenBank was prepared for phylogenetic analysis. Representative sequences of DENV-1 (AB074760), DENV-2 (M20558) and DENV-4 (AF326573) were also included. The database contained the following information: GenBank access number, isolated name, country and year of isolation. A total of 563 sequences of DENV-3 were retrieved from GenBank until March 27, 2010 (Additional file 26), which were indicated as complete genome. However, several of these sequences did not contain the first 39 and last 135 nucleotides of the viral genome, or did not have the UTRs sequences validated. Therefore, in order to use the largest number of sequences possible, we have included only the open reading frame (ORF). The sequences were analyzed using the program DAMBE 5.2.6 in order to identify identical sequences, which were excluded from the analysis (Additional file 27) [38]. In addition, mutants and clones were also excluded, resulting in a total of 527 sequences of DENV-3 available for phylogenetic analysis (Additional file 26).

Phylogenetic analysis
DENV-3 sequences were aligned using the program CLUSTAL X 5.2 [39]. The distance based Neighbor-joining (NJ) and/or the Bayesian inference (BI) methods were used to construct the phylogenetic trees. For NJ method, the sequences were first analyzed using the Modeltest 3.7.MacX program to identify the best nucleotide substitution model [40]. The best nucleotide substitution model was selected under the criterion hierarchical likelihood ratio tests (hLRT). Phylogenetic trees was constructed using the NJ method as implemented in the PAUP * 4.0 b10 program and statistically supported by bootstrap method using 1,000 replicates [41]. For BI   method, the aligned sequences were first analyzed using the MrModeltest 2.3 for Mac OS X 10.4.11 program to identify the best model of nucleotide substitution [42].
The best nucleotide substitution model was selected under the criterion hierarchical likelihood ratio tests (hLRT). Five runs of 4 chains each (one cold and tree heated, temperature = 0.20), generating a total of 1.5 × 10 6 generations (10% removed as burn-in) were done to ensure statistical convergence. The phylogenetic trees were inferred with MrBayes program and statistically supported by calculating the posterior probability, which was expressed in percentage [43].

Analysis of evolutionary divergence between genetic groups
The nucleotide and amino acid divergences were calculated using p-distance model with 1,000 replicates using the MEGA 5 program [36]. The frequency distribution profiles of divergences between populations were calculated as described previously [44]. The mean (%) p-distances between sequences were also calculated.

Identification of amino acid motif for each genetic group
The polyprotein sequences of each genetic group were aligned for the identification of the amino acid motifs, which were defined as amino acid substitutions present in at least 90% of the sequences within each genetic group.