Phylogenetic and recombination analysis of Tobacco bushy top virus in China

Background During the past decade, tobacco bushy top disease, which is mainly caused by a combination of Tobacco bushy top virus (TBTV) and Tobacco vein-distorting virus (TVDV), underwent a sudden appearance, extreme virulence and degeneration of the epidemic in the Yunnan province of China. In addition to integrative control of its aphid vector, it is of interest to examine diversity and evolution among different TBTV isolates. Methods 5’ and 3’ RACE, combined with one step full-length RT-PCR, were used to clone the full-length genome of three new isolates of TBTV that exhibited mild pathogenicity in Chinese fields. Nucleotide and amino acid sequences for the TBTV isolates were analyzed by DNAMAN. MEGA 5.0 was used to construct phylogenetic trees. RDP4 was used to detect recombination events during evolution of these isolates. Results The genomes of three isolates, termed TBTV-JC, TBTV-MD-I and TBTV-MD-II, were 4152 nt in length and included one distinctive difference from previously reported TBTV isolates: the first nucleotide of the genome was a guanylate instead of an adenylate. Diversity and phylogenetic analyses among these three new TBTV isolates and five other available isolates suggest that ORFs and 3’UTRs of TBTV may have evolved separately. Moreover, the RdRp-coding region was the most variable. Recombination analysis detected a total of 29 recombination events in the 8 TBTV isolates, in which 24 events are highly likely and 5 events have low-level likelihood based on their correlation with the phylogenetic trees. The three new TBTV isolates have individual recombination patterns with subtle divergences in parents and locations. Conclusions The genome sizes of TBTV isolates were constant while different ORF-coding regions and 3’UTRs may have evolved separately. The RdRp-coding region was the most variable. Frequent recombination occurred among TBTV isolates. Three new TBTV isolates have individual recombination patterns and may have different progenitors. Electronic supplementary material The online version of this article (doi:10.1186/s12985-015-0340-2) contains supplementary material, which is available to authorized users.


Introduction
Tobacco bushy top virus (TBTV) is a member of the Umbravirus genus, which requires the presence of Tobacco vein-distorting virus (TVDV) for infectivity in the field [1]. TBTV is encapsidated by the coat protein encoded by TVDV, which is needed for transmission via aphids [1,2]. However, mechanical inoculation of sap from diseased tobacco onto healthy plants leads to the loss of TVDV, implying the independent pathogenicity of TBTV [1]. Together, these viruses caused severe stunting and destructive bushy-top disease in tobacco in sub-Saharan Africa in the 1960s [3] and in Asia including China in the 1990s [1]. During 1993 to 2001, there were 51,300 hm 2 of tobacco bushy top-diseased fields including 8,700 hm 2 of total field failure in Yunnan Province of China [4]. In addition to TBTV and TVDV, two components including tobacco bushy top diseaseassociated RNA (TBTDaRNA) and satellite RNA of TBTV were also identified in tobacco with bushy-top disease [5,6], although all four components were not always present together [7].
During the past decade, tobacco bushy-top disease was infrequent with only sporadic cases exhibiting mild symptom in Yunnan province, which may be due to interruption of the natural epidemic cycle through the integrative control of its aphid vectors [7,8]. Sudden appearance, extreme virulence and degeneration of the epidemic of tobacco bushy-top disease was a pattern similar to that of other destructive diseases [9][10][11][12], whose lethal pathogens underwent quick attenuation of pathogenicity. Therefore, it is of interest to determine whether the new TBTV isolates produced mild pathogenicity and how they were evolved. For single-stranded RNA viruses, recombination is a major evolutionary event allowing isolates to adapt to new environmental conditions and hosts [13], and frequent recombination events have been detected for various RNA viruses such as Soybean mosaic virus and potyvirus isolates [14][15][16].
The TBTV genome contains a positive-sense singlestranded RNA of 4152 nt, which encodes four ORFs, and contains a short 5' UTR of 10 nt and a 3' UTR of 645 nt [17]. Based on comparisons with other Umbraviruses, p35 and its frameshift product p98 are responsible for genome replication [18][19][20]. p98 contains the ubiquitous RdRp GDD motif of positive-strand RNA virus and is presumably the RNA-dependent RNA polymerase (RdRp) [18,21]. Based on studies conducted with Groundnut rosette umbravirus, p26 is a long-distance movement-associated protein and is also responsible for stabilization of viral RNAs and nuclear shuttling [22,23], and p27 is likely a cell-to-cell movement protein [22].
In this study, tobacco plants with suspected mild tobacco bushy-top disease were collected in three locations in Yunnan province. Full length genomes of the three new TBTV isolates from Jiangchuan (termed JC) and Midu (termed MD) were cloned and sequenced, revealing a distinctive difference from previously reported TBTV isolates: The first nucleotide of TBTV-JC, TBTV-MD-I and TBTV-MD-II is a guanylate compared with an adenylate reported for the other TBTV sequences. In addition, we compared these three new TBTV isolates with the five available TBTV sequences to study molecular diversity and recombination events among the isolates.

Results
Detection of TBTV, TVDV and TBTD-associated RNA in different sources of tobacco with tobacco bushy top disease Total RNA was extracted from leaves of tobacco with suspected tobacco bushy top disease collected from three locations (JiangChuan county, MiDu county and BaoShan city) in China's Yunnan Province. RT-PCR and subsequent sequencing revealed that TBTV was present only in the samples from JiangChuan and MiDu. None of the samples contained the newly reported TBTDaRNA (Additional file 1: Table S1).

5'-RACE and 3'-RACE of TBTV from JiangChuan and MiDu
To determine the full-length sequences of TBTV from JiangChuan (termed JC) and MiDu (termed MD), 5'-RACE and 3'-RACE were first performed to determine 5' terminal and 3' terminal sequences. The size of the 5'-RACE PCR product with either poly (C) or poly (G) at the 5' end was approximately 500 bp (data not shown). Comparison of the sequencing results revealed that the first nucleotide of the JC isolate is a guanylate, with the sequence beginning with 5'-GGGUUACGAUAUGGAGUU CAUCAAC-3' (Fig. 1a). The MD isolate also has the same sequences at the 5' end of its genome. The first nucleotide of all previously reported TBTV isolates is an adenylate. Most Umbraviruses, as well as Necroviruses and Carmoviruses in the family Tombusviridae, also have 5' terminal guanylates.
The size of the 3'-RACE PCR product of the JC and MD isolates was approximately 950 bp (data not shown). The 3' terminal sequence of the JC and MD isolates is 5'-GGGAGAUGAGCACUCUCUCUCGCGCCC-OH-3' (Fig. 1b). The underlined cytidylate differs from previous TBTV sequences, which contain an urydilate at this position. This substitution of U to C seems to deform the loop of the 3' proximal stem-loop in TBTV (data not shown).

Comparison of the sequences of the three new TBTV isolates and five previously published sequences
The full-length genomes of TBTV-JC (KM016224), TBTV-MD-I (KM016225) and TBTV-MD-II (KM067277) are 4152 nt in length, as previously reported for the other TBTV isolates. Comparison of the in vitro expression level of ORF1 (p35) and ORF1-frameshift protein (p98) for these isolates using wheat germ extracts (WGE) showed that that the translated levels of p35 and p98 for TBTV-JC, TBTV-MD-I and TBTV-MD-II (all mild isolates) was~40 % of the TBTV-Ch level, which was cloned from a sample showing typical tobacco bushy top disease (Wang and Yuan, unpublished data). This suggests that attenuated expression of replicase components (p35 and p98) may be in part responsible for the mild pathogenicity of new TBTV isolates in the field. Nucleotide sequence identities among TBTV-JC, TBTV-MD-I and TBTV-MD-II were 94.8 % to 97.3 % ( Table 1). The highest nucleotide sequence identity among the 8 TBTV isolates was 98.9 % between TBTV-Ch and TBTV-YWSh, while the lowest was 89.0 % between TBTV-MD-II and TBTV-YWDu. The nucleotide sequence of TBTV-YWDu was the most different from the other 7 isolates, with identities ranging from 89.0 % to 90.9 %. Correspondingly, the nucleotide sequence identities among the other 7 isolates were 94.6 % to 98.9 % (Table 1).
TBTV contains a 5' UTR of 10 nt and 3' UTR of 645 nt. The 5'UTR of the three new isolates only diverges in the 5' ultimate nucleotide from the previously reported isolates. In the 3'UTR, identical residues ranged from 94.6 % to 99.8 % (Table 1), which was higher than values found for the full-length genome. In particular, TBTV-YWDu, whose full-length genome differed the most from the other isolates (sharing 89.0 % to 90.9 % identity), had 3'UTR sequences sharing 94.6 % to 99.8 % identity with the other isolates ( Table 1).

Phylogenetic relationship among all TBTV isolates
Phylogenetic trees were constructed based on the fulllength genome, the 3' UTR or ORFs-coding regions of TBTV. The distances of groups in phylogenetic trees of the full-length genome, ORF1 or p98-C were bigger than values of other regions (Fig. 2). It is suggested that RdRp-coding region was the most variable and mainly determined the divergent of TBTV isolates. In the tree of the full-length genome, the 8 isolates were divided into three groups, with the first group only containing TVDV-YWDu (Fig. 2a). The second group contained TBTV-MD-II while the third group contained three sub-groups, one of which included one isolate of TBTV-YYXi; the other two sub-groups included 2 and 3 isolates respectively (Fig. 2a).
The phylogenetic trees based on ORFs or 3' UTR have distinctive patterns. The tree based on ORF1 has the same grouping as the full-length genome ( Fig. 2a and c). The other four trees have different pattern from the full-length genome. It is further confirmed that the divergence of ORF1-coding region is primarily correlated with the divergence of the full-length genome in TBTV.
In addition to the pattern of trees, there are some same clusters in different phylogenetic tree, i.e. the cluster containing TBTV-MD-I, TBTV-YLLi and TBTV-JC in trees of the full-length genome, ORF1, p98-C, ORF3 or ORF4-coding regions (Fig. 2a, c, d, e and f ); the cluster containing TBTV-Ch and TBTV-YWSh in trees of the full-length genome, ORF1 or p98-C-coding regions (Fig. 2a, c and d ); and the cluster containing TBTV-Ch and TBTV-YWDu in trees of the 3'UTR, ORF3 or ORF4coding regions (Fig. 2b, e and f). It is suggested that TBTV-MD-I, TBTV-YLLi and TBTV-JC have the nearest similarity except the 3'UTR, while TBTV-Ch and TBTV-YWSh have the nearest similarity in RdRp-coding region and TBTV-Ch and TBTV-YWDu have the nearest relationship in the 3' half of genome including ORF3 and ORF4-coding regions and 3'UTR. All data of phylogenetic trees and molecular diversity assay suggested that different ORFs-coding regions and 3' UTR were evolved separately.

Recombination analysis of the TBTV isolates
To find potential recombination signals in the TBTV isolates, recombination analysis was performed using RDP4 program. Using six algorithms, 29 recombination events were detected in all 8 isolates ( Fig. 3b and Table 3).TBTV-YYXi had 6 potential recombination signals, while TBTV-JC only had 1 potential recombination signal (Table 3).
In all 29 potential recombination events, three recombination events (events 27, 28 and 29) detected in TBTV-YWDu had remarkable high degree of certainty with P-value of at least three algorithms <1 × 10 −6 ( Table 3). There are the other nine recombination events with a high degree of certainty due to recombinant score > 0. 6 (Table 3). In addition, remaining 17 recombination events have a fairly likelihood since recombinant score is between 0.4-0.6 ( Table 3).
For all 8 TBTV isolates, there are three types based on the location of recombination events. The first type contained TBTV-YWDu, which only had recombinations at 3' half of the genome; the second type contained TBTV-YLLi and TBTV-JC, which had recombinations at 5' half of thegenome; and the third type contained 5 isolates of TBTV-MD-I, TBTV-MD-II, TBTV-Ch, TBTV-YWSh and TBTV-YYXi, which had recombinations throughout the genome (Fig. 3b).
In addition, there are same type of recombinant events with same locations and parents in different TBTV isolates including recombination located at 616-965 with parents TBTV-YYXi/TBTV-MD-II in TBTV-JC (event 1), TBTV-MD-I(event 3) and TBTV-YLLi (event 23), four recombinant events located at 253-385 with parents TBTV-MD-II/ TBTV-YLLi and 652-3207 with parents TBTV-YYXi/ TBTV-JC in TBTV-Ch (events 9 and 10) and TBTV-YWSh (events 13 and 15) ( Fig. 3b and Table 3). Table 1 Nucleotide sequence identities (%) for TBTV-JC, TBTV-MD-I, TBTV-MD-II and previously reported TBTV isolates based on the  full-length genome and 3'UTR sequence   TBTV-JC  TBTV-MD-I  TBTV-MD-II  TBTV-Ch  TBTV-YWSh  TBTV-YYXi  TBTV  the outbreak of tobacco bushy top disease as well as phylogenetic relationship and possible recombination. The three new isolates of TBTV (TBTV-JC, TBTV-MD-I and TBTV-MD-II) had a remarkable difference at the first nucleotide of G from five previously reported TBTV with the first nucleotide of A, while sizes of the genome and ORFs were same in all TBTV isolates. For ORFs encoded by the TBTV isolates, ORF3 and ORF4 was relative stable while ORF1 was the most variable based on the identities of ORFs-coding sequences among 8 TBTV isolates. This strong genetic variability was also identified in the RdRp-coding region of several plant viruses [26][27][28][29].
In all of 8 TBTV isolates, TBTV-YWDu had the most remarkable divergence from the other 7 isolates based on the identities of the full-length genome, ORF1or p98-C-coding sequences. TBTV-MD-II also showed difference from the other 7 TBTV isolates based on the identities of the 3' UTR ( Table 2). The remarkable divergence of TBTV-YWDu (Fig. 2a, c and d) and TBTV-MD-II (Fig. 2b) from the other TBTV isolates were also indicated by the phylogenetic trees. Two phylogenetic trees based on ORF3 and ORF4-coding sequences showed similar branches, which suggested that ORF3 of TBTV isolates underwent similar evolution as ORF4. Based on all data of molecular diversity and phylogentic tree, it is suggested that different ORFs-coding regions and the 3' UTR in TBTV underwent separate evolution, and the diversity of ORF1-coding region mainly determined the diversity of full-length genome.
During evolution of the single strand RNA viruses, recombination is a major evolutionary way for an isolate to adapt to new environmental conditions and hosts [13][14][15][16]. In this study, recombination events were also analyzed among all 8 TBTV isolates. Total 29 potential recombination events were detected, in which three recombination events (events 27, 28 and 29) in TBTV-YWDu showed high reliability with P-value of at least three methods <1 × 10 −6 ( Fig. 3b and Table 3). These three recombination events in TBTV-YWDu were located within 3' half and have same parents of TBTV-Ch/possible TBTV-MD-II, which is supported by the phylogenetic analysis. In phylogenetic trees based on ORF3, ORF4 and 3'UTR in 3' half of TBTV genome, TBTV-YWDu and TBTV-Ch (the minor parent) formed into a cluster, while TBTV-MD-II (the possible major parent) belonged the other different groups (Fig. 2b, e, and f ). However, TBTV-YWDu formed a separate group in phylogentic trees based on ORF1 and p98-C in 5' half of genome ( Fig. 2a and c), which implied why there is no recombination events in 5' half of TBTV-YWDu. The other 9 recombination events (events 4, 10, 15, 19, 20, 22, 24, 25 and 26) seem to have a high degree of certainty due to recombination score > 0.6 ( Table 3). Three recombination events (events 4, 10 and 15) with similar breakpoints (629-3207 in TBTV-MD-I, 652-3207 in TBTV-Ch and  Table 3 TBTV-YWSh) were supported by phylogenetic tree (Table 3 and Fig. 4a), in which TBTV-Ch and TBTV-YWSh formed into a branch with their minor parent TBTV-YYXi of events 10/15 and TBTV-MD-I formed into a branch with its minor parent TBTV-JC of event 4. Recombination event 22 was also supported by phylogenetic tree (Table 3 and Fig. 4b), in which TBTV-YLLi was formed into a branch with its minor parent TBTV-JC. Meanwhile, TBTV-JC was also the minor parent in recombination event 2 in TBTV-MD-I (recombination score is 0.469), which was also supported by phylogentic tree based on the fragment of 200-1550 (Fig. 4b). Three recombination events (events 24, 25 and 26) in TBTV-YLLi also showed some correlation with phylogenetic tree based on the fragment of 1500-2480 (Table 3 and Fig. 4h), in which TBTV-YLLi formed into a branch with its minor parent of event 24 and has near relationship with the minor parent TBTV-YYXi (event 25) or TBTV-JC (event 26). For two recombination events (events 19 and 20) in TBTV-YYXi, there is no correlation between  recombination assay and phylogenetic tree based on ORF3 or ORF4-coding region ( Fig. 2e and f ), in which TBTV-YYXi formed into a separate group. It is suggested that events 19 and 20 are possible with uncertain minor parents (Table 3) . 2b), in which TBTV-Ch and TBTV-YYXi along with TBTV-YWSh formed into a branch. In addition, clue of the event 17 in TBTV-YYXi can be found in the phylogenetic tree based on p98-C-coding sequences (Fig. 2d). Only three events (events 5, 6 and 18) were not supported by the corresponding phylogenetic trees ( Fig. 4g and e). It is suggested that events 5, 6 and 18 are possible with uncertain minor parents ( Table 3). All above data implied the inherent relationship between phylogenetic analysis and recombination assay. Based on the analysis on P-value, recombination score and the correlation with phylogenetic trees, 24 events within 29 possible recombination events are certainly true, while the other 5 events (events 5, 6, 18, 19 and 20) only have low-level likelihood.
For three new isolates of TBTV, they seemed to undergo distinctive evolution. Firstly, TBTV-JC and TBTV-MD-I had near relationship compared with TBTV-MD-II based on the sequence diversity, the phylogenetic analysis and the recombination assay. However, TBTV-MD-I have distinctive characteristic from TBTV-JC in recombination assay (Fig. 3b). TBTV-MD-II may have different ancestor since it always formed a different branch from the branch including TBTV-JC and TBTV-MD-I in all phylogenetic trees (Fig. 2). Although they seemed to undergo different evolution, they showed mild pathogenecity, which may due to following reasons. First, activity of RdRp was altered since RdRp-coding region is the most variable for TBTV. Second, the expression level of replicase components was lower than that of severe pathotype, which is partly confirmed by primary data of in vitro translation of p35 and p98 (Wang and Yuan, unpublished data). In addition, mutagenesis of TVDV may also partially cause the mild symptom of tobacco bushy top diseases, since TVDV could not support TBTV in some samples of tobacco bushy top disease such as sample from Baoshan city (Additional file 1: Table S1) and the other related data [7]. Further attention and effort are necessary to figure out the detailed mechanism on the lower expression of replicase components encoded by TBTV and the absence of interaction between TBTV and TVDV in nature.

Conclusion
Three new TBTV isolates from tobacco bushy top samples with mild symptom were cloned. The first nucleotide of them is a guanylate instead of an adenylte reported in the other TBTV isolates. Identities and phylogenetic analyses indicated that variants ORFs and the 3' UTR in TBTV were evolved separately. RdRp-coding region was the most variable among TBTV isolates and the divergence of ORF1 is mainly correlated with the divergence of the fulllength genome. Frequent recombinations were detected among TBTV isolates. For three new TBTV isolates, they have different recombination pattern and may have different ancestors. RT-PCR detection of TBTV, TVDV and TBTD-assocaited RNA Total RNA was extracted from tobacco leaves using Trizol reagent (TransGen), and reverse-transcribed using M-MLV reverse transcriptase and oligonucleotides corresponding to TBTV, TVDV or TBTD-associated RNA (Additional file 1: Table S2). PCR amplification was performed using Taq DNA polymerase and pairs of oligonucleotides (Additional file 1: Table S2). For the detection of TBTV, two pairs of oligonucleotides, TB-2263-F/TB-3263-R and TB-667-F/TB-1630-R, were designed for RT-PCR assay. Accordingly, two pairs of oligonucleotides (TV-2728-F/TV-3458-R and TV-3454-F/TV-4166-R) for TVDV and one pair of oligonucleotides (TBTD-1507-F/TBTD-2016-R) for TBTD-associated RNA were designed for RT-PCR detection respectively (Additional file 1: Table S2). The PCR products were cloned into pMD18-T vector (TaKaRa) and sequenced by using M13 primers.

5'-RACE, 3'-RACE and full-length RT-PCR of TBTV
For 5'-RACE, the total RNA was reverse-transcribed by M-MLV reverse transcriptase using oligonucleotide TB-943-R and treated with a mixture of RNaseH and RNase A. After purification using a cDNA purification kit (TransGen), the cDNAs were extended using dCTP or dGTP and terminal deoxynucleotidyl transferase (TaKaRa) and then subjected to PCR amplification using Oligo(dG)-anchor primer/TB-943-R or Oligo(dC)-anchor primer/TB-943-R. The first-round PCR products were amplified using Anchor primer/TB-510-R. The final PCR products were cloned into vector pMD18-T and sequenced using M13 primers. At least three RT-PCR clones were sequenced to make sure the reliabilityof 5'-RACE result.
For 3'-RACE, total RNA and Oligo (dA)-linker were ligated with T4 RNA ligase (NEB). The oligo(dA)-linked RNAs were reverse-transcribed using Oligo (dT)-anti linker and then followed by PCR amplification using Oligo (dT)anti linker/TB-3206-F. The final products were cloned into pMD18-T and sequenced using M13 primers. At least three RT-PCR clones were sequenced to make sure the reliabilityof 3'-RACE result.
For full-length RT-PCR of TBTV, total RNAs were reverse-transcribed by PrimeScript reverse transcriptase (Takara) using TBTV-3'-R. The cDNA were then subjected to PCR amplification using LA Taq (Takara) and oligonucleotides TBTV-5'-F and TBTV-3'-R. PCR products corresponding to full-length TBTV genomic RNA were cloned into pMD18-T and sequenced using M13 primers and TBTV-specific primers (detailed information not shown).
All primers used for 5'-RACE, 3'-RACE and full-length RT-PCR of TBTV were shown detailedly in Additional file 1: Table S2.
Sequence assembly and alignment, construction of phylogenetic trees and Recombination analysis Sequence assembly was accomplished by DNAMAN, which is also used to analyze the identities of nucleotide sequences or amino acids among TBTV isolates.
Phylogenetic tree was constructed using MEGA 5.0 software package [30] based on the neighbor-joining method and Kimura 2-parameter method. 1000 replicates of Bootstrap resampling was used to ensure the reliability of individual nodes in phylogenetic tree.
Recombination analysis was achieved using RDP4 [31]. Six methods including RDP, GENECONV, Bootscan, Maxchi, Chimaera and SiScan implemented in RDP4 was used to detect recombination events, likely parental isolates and recombination break points under default settings. If recombination event was supported by at least three methods with P-value <10 −6 or the recombination score is above 0.6, this recombination event is certainly true. If the recombination score is between 0.4-0.6, this recombination event has a fair likelihood.