Detection and frequency of recombination in tomato-infecting begomoviruses of South and Southeast Asia

Background Tomato-infecting begomoviruses are widely distributed across the world and cause diseases of high economic impact on wide range of agriculturally important crops. Though recombination plays a pivotal role in diversification and evolution of these viruses, it is currently unknown whether there are differences in the number and quality of recombination events amongst different tomato-infecting begomovirus species. To examine this we sought to characterize the recombination events, estimate the frequency of recombination, and map recombination hotspots in tomato-infecting begomoviruses of South and Southeast Asia. Results Different methods used for recombination breakpoint analysis provided strong evidence for presence of recombination events in majority of the sequences analyzed. However, there was a clear evidence for absence or low Recombination events in viruses reported from North India. In addition, we provide evidence for non-random distribution of recombination events with the highest frequency of recombination being mapped in the portion of the N-terminal portion of Rep. Conclusion The variable recombination observed in these viruses signified that all begomoviruses are not equally prone to recombination. Distribution of recombination hotspots was found to be reliant on the relatedness of the genomic region involved in the exchange. Overall the frequency of phylogenetic violations and number of recombination events decreased with increasing parental sequence diversity. These findings provide valuable new information for understanding the diversity and evolution of tomato-infecting begomoviruses in Asia.

Begomovirus genomes are composed of either one (monopartite) or two (bipartite) single stranded DNA molecules ranging in size between 2500 and 2800 nucleotides [8]. Most TYLCV of the old world and almost all known new world begomoviruses viruses are bipartite with genomes comprising DNA A and DNA B molecules. Monopartite old world begomoviruses, which are now believed to be the predominant begomovirus form, have only a DNA-A like genome component. The virion-sense strand of DNA A encodes the viral coat protein (AV1, V1 or cp) and, in old-world begomoviruses [9], an AV2 or V2 gene that is necessary for virus accumulation and symptom development [10]. The complementary-sense strand of DNA-A encodes genes responsible for viral replication (AC1, C1 or rep), replication enhancer (AC3, C3 or ren), regulation of gene expression (AC2, C2 or trap) and AC4 or C4 involved in host range determination, symptom determination, symptom severity, and virus movement [11][12][13]. The DNA B of bipartite begomoviruses encodes two proteins, BV1 (a nuclear shuttle protein or NS) and BC1 (a movement protein or MP) involved in intra-and inter-cellular movement within the plant [14].
Begomoviruses exhibit a great deal of geographic dependent but host-independent genomic variation [15][16][17]. Recombination, especially interspecific homologous recombination, is a key contributor to the genomic diversification and evolution of begomoviruses [17]. To date, many natural begomoviruses recombinants have been reported [17][18][19][20]. Although the biological significance of begomovirus recombination is not clearly understood, in many parts of the world epidemics associated with the emergence of recombinant begomoviruses have been reported. These include the devastating cassava mosaic disease epidemic caused by recombinant East African cassava mosaic viruses in Uganda and neighbouring countries [18,21], the currently emerging pathogenic recombinant, tomato yellow leaf curl Malaga virus, in Spain [22] and the cotton leaf curl disease epidemic in Pakistan caused by a species complex including a variety of mostly recombinant begomovirus species [23]. Besides the apparent importance of recombination in begomovirus evolution the marks that it has left on currently sampled begomovirus genome sequences also have major implications when we attempt to use these sequences to infer the evolutionary histories of begomoviruses [24,25]. Consequently, the detailed characterization of recombination amongst tomato-infecting begomoviruses is a prerequisite for understanding how these important pathogens are evolving.
Although a few specific recombination events have been described so far in tomato-infecting begomoviruses [26][27][28][29], a full accounting of recombinants, recombination breakpoints and recombination hotspots in tomato begomovirus species and strains is lacking. For example, it is currently unknown whether there are differences in the number and quality of recombination events that are occurring amongst different tomato infecting begomovirus species. It is also currently unknown whether sequences in particular parts of the begomovirus genomes are more or less exchangeable between different species than sequences in other parts of these genomes. Such variations in recombination frequencies and patterns have been clearly observed in RNA viruses [30]. In this study we employ a variety of recombination analysis methods to characterize recombination in South and Southeast Asian tomato-infecting begomoviruses. We map recombination hotspots and provide evidence that not all tomato-infecting begomoviruses are equally prone to recombination and that specific characteristic of particular recombination events are reliant on both the relatedness of the recombining viruses and the genomic region involved in sequence exchanges

Results and discussion
In this study, we sought to characterise recombination in South and Southeast Asian viruses using a different approach to those used previously: (1) By studying a different set of viruses to those studied previously; (2) Making use of a combination of recombination analysis methods that are both powerful and have low false positive rates; (3) by mapping and estimating the frequency of recombination events in begomoviruses.
The neighbor-net analysis revealed clear evidence of phylogenetic conflicts within the analysed sequences ( Fig. 1). Notably, every sequence represented within the tree was implicated as a potential recipient of horizontally acquired sequences at some time in its evolutionary past. Unsurprisingly, the PHI test strongly supported the presence of recombination in these sequences (p < 0.0001).
Different methods used for recombination breakpoint analysis also provided strong evidence for presence of past recombination events in most of the sequences analysed. For each of the 32 potential recombinant sequences identified, possible breakpoint positions, sequence fragments and parental genotypes are listed in Table 1 In addition, viruses from geographically well separated regions appeared to have recombined at some time in the past. For example, tomato leaf curl virus strains from Bangalore and Gujarat in India contained sequences closely resembling those found in a ToLCTWV isolate from Taiwan. Also, Chinese viruses contained fragments of sequence closely resembling those found in sequences sampled in Thailand, Taiwan, Bangladesh and South India. Further, we used the TreeOrderScan method [31] to investigate the phylogenetic evidence for recombination in the sequence alignment. This analysis revealed major deviations in the branching order of sequences within trees constructed from different portions of the multiple sequence alignment (Fig. 2). Frequent tree order changes were observed at the region of rep and AC4. Importantly, most of the viruses detected as recombinants in the breakpoint analysis exhibited deviations in their branching order indicating that they were most likely correctly identified as recombinants. In addition, the TreeOrderScan analysis also provided evidence for gene flow amongst viruses in geographically separated regions. For instance, sequences found in southern Indian viruses grouped with those found in Thailand and Bangladesh virus positions from 2335-2652. Thai viruses contained sequences resembling those of Chinese viruses between 300-490 and 590-2372, but Indian viruses between 2472-2743. The recombination observed between geographically separated species/strains probably represents older events as they presumably occurred before their present separation [19]. Movement of vectors and/or infected plant materials may also have contributed to the gene flow observed between these widely separated locations [32]. Alternatively, it is possible that current sampling of Asian bego-movirus diversity is so sparse that we do not yet fully appreciate the geographical range of many of the species studied here.  Fig. 2). In addition, there was no phylogenetic support for inter-group recombination event reported for ToLCNDV- [Luc]. Thus there appears to be no or few recombination events in viruses reported from North India, signifying that certain begomovirus species may not recombine as readily as others. There are a Neighbor-Net generated for the tomato-infecting begomoviruses of South and Southeast Asia Figure 1 Neighbor-Net generated for the tomato-infecting begomoviruses of South and Southeast Asia. Evidence for reticulate evolution is reported on pairwise Hamming distances using only parsimonious sites. Networked relationships among the viral species with boxes, instead of bifurcating evolutionary tree indicate to the presence of recombination. number of prerequisites for recombination between begomoviruses. These include shared host ranges (possibly influenced by the emergence of B whitefly biotype), the ability to co-infect the same cells [33][34][35], high levels of viral replication [36], and overlapping geographical ranges. If all of these prerequisites are met for the tomatoinfecting begomoviruses in South and Southeast Asia then one would expect there to be frequent and invariable recombination amongst all of these viruses. However, fitness disadvantages may be associated with some sequence exchanges that would lead to the selective elimination of many newly produced recombinants.
The recombination sites distributed non-randomly along the genome. The recombination breakpoints were detected in all the six reading frames of south Indian viruses and viruses from eastern and western India. The breakpoints in the Chinese and Thai viruses were located in AV1, AV2, AC1 and AC4, whereas ORFs AV1 and AV2 were identified to be cold spots in the Bangladeshi viruses.
The frequency and locations of recombination events measured as topological differences between trees constructed from different parts of the alignment were visualised as a half-diagonal compatibility matrix (Fig. 3). Each X and Y coordinate in the matrix is a gross estimate of the number of topological modifications needed to convert the tree constructed using sequences at position X into that constructed using sequences at position Y [31,37]. It was apparent from this matrix that recombination events are probably not randomly distributed throughout begomovirus genomes.  compared with those constructed using sequences between alignment positions 2451 and 2951. This analysis also indicated the probable absence in certain regions of begomovirus genomes of recombination events that had any substantial phylogenetic effect. For example, all phylogenetic trees constructed using coat protein gene sequences were all in good agreement with one another indicating a relative absence of recombination breakpoints within the CP gene.
We examined phylogeny violations and number of recombination events in our data set from the perspective of parental sequence relatedness. We noted that in general phylogeny violations clustered around the genetic distance 0.30. The observed frequency of phylogeny violations were inversely correlated (r = -0.36 p < 0.05) to the pairwise distances of the fragments involved in exchange (Fig. 4A). In addition, the number of recombination events was also inversely correlated (r = -0.35 p < 0.05) to the diversity between the exchanged fragments ( Fig. 4B), we used only identified parental sequences to estimate the genetic distance between horizontally transferred fragments and the sequences that they replaced. Overall the frequency of phylogenetic violations and number of recombination events decreased with increasing parental sequence diversity. In a study with artificial and natural geminivirus recombinants Martin and co-workers [38] demonstrated that the degree of similarity between a horizontally inherited sequence and the sequence it replaces is an important determining factor of recombinant fitness. Rather than the non-random distribution of break points observed here being due to higher recombination rates in some genome regions than others [39], the distribution seems to have been created by natural selection only allowing the survival of recombinants with high fitness. In the more diverse genome regions where recombination events are not detected it is possible that these regions would not function properly when transferred into foreign genetic backgrounds.

Conclusion
Finally, the variable recombination and diversity-dependent distribution of recombination hotspots in tomatoinfecting begomoviruses is valuable new information that has emerged from this study. Perhaps this is the first report of variable recombination reported among tomatoinfecting begomoviruses found in the same region. Further, recombinant forms, recombination hot spots and frequency of recombination documented in this study would provide new information for understanding the diversity and evolution of tomato-infecting begomoviruses in Asia. In addition to evolutionary considerations, understanding the implications of recombination observed in these viruses on efforts to develop resistant tomatoes through conventional breeding and genetic engineering are important and attempts should be focused on these issues for developing effective disease management strategies. Given that the N-terminal portion Phylogenetic compatibility matrix of tomato-infecting bego-movirus sequences, exhibiting frequencies of phylogeny viola-tions for each pairwise comparison of sequence fragments of rep is highly recombinogenic it is perhaps worrying that so many virus derived transgenic resistance strategies are focusing on this portion of the geminivirus genome [40][41][42][43]. It may be wiser to develop virus derived resistance strategies using genome regions that are less recombinogenic as this will make it more difficult for viruses to overcome resistance by simply replacing targeted genome regions with variants that are not targeted.

Sequence data
The study sequences comprised 35 publically available (as on June 2006) complete Indian, Pakistani, Chinese, Bangladeshi, Sri Lankan, Malaysian, Thai, Philippine and Taiwanese tomato-infecting begomovirus DNA-A and DNA-A-like components ( Table 2). These sequences were aligned using the CLUSTAL W [44] using gap open and extension penalties-of 10.

Phylogenetic network and pairwise homoplasy test
Phylogenetic evidence for recombination was detected with Splits-Tree version 4.3 [45] using the neighbor-Net method [46]. Neighbor-net depicts conflicting phylogenetic signals in the data that are caused by recombination as cycles within unrooted bifurcating trees. Although, we report evidence for reticulate evolution in such phylogenetic graphs obtained using parsimonious sites, pairwise Hamming distances and no gaps, we obtained similar results with other distance measures and settings.
We statistically verified the presence of recombination identified visually in phylogenetic graphs using the pairwise homoplasy test (PHI) implemented in Splits Tree 4.3. PHI has been shown to powerfully identify the pres-ence/absence of recombination within a wide range of sequence samples with a low false positive rate [47].

Detection of recombination breakpoints
The recombination breakpoint analysis was carried out using Recombination detection program RDP [48], GENECONV [19] and MAXIMUM CHI SQUARE [49], selected following the conclusions of studies on evaluation of different methods of recombination detection [50,51]. All these methods are implemented in RDP2 [52,53]. Default RDP2 settings were used throughout (Pvalue cut-off = 0.05 and the standard Bonferroni correction was used), other than that sequences were considered as circular, consensus daughters were found and breakpoints were polished. We used principally the information inferred by more than one method, as evaluation of the performance of these recombination detection methods using simulated and empirical data indicated that one should not rely too heavily on the results of a single method (Posada, 2002). In RDP analysis, the length of the window was set to 10 variable sites, and the step size was set to one nucleotide. P values were estimated by randomizing the alignment 1,000 times. For GENECONV analysis, the g-scale parameter was set to 1 and the number of permutations was set to 10,000.

Phylogenetic congruence
To examine phylogenetic support for each identified recombination event in the breakpoint analysis, we used the retained sequence position version of the TreeOrder Scan method [31] implemented in Simmonics2005 In the present analysis, we recorded the changes in the phylogenetic relationships of clades supported by 70 per cent bootstrap values for sequential 300 base sequence fragments at 100 nucleotide intervals.

Frequency and mapping of recombination
Estimation of the frequency and mapping of the locations of recombination events was achieved by phylogeny compatibility analysis using the TreeOrder Scan method. First, the TreeOrder Scan program produces optimally ordered neighbor-joining trees for fragments of definite length along an alignment. In the next step, a pairwise comparison is made between trees constructed from each sequence fragment along the alignment. Then a phyloge-netic compatibility value is computed as the number of times the phylogeny of one tree has to be violated to match the tree order observed in other trees constructed along the length of an alignment. In our case we assigned sequences to predefined groups based on their geographical origin and a bootstrap value of 70 per cent was used as threshold for scoring phylogeny violations. All pairwise compatibility values were calculated using trees constructed for 300 nucleotide sequence fragments separated by 100 nucleotides across the length of the analysed alignment. These compatibility values were then plotted on a phylogenetic compatibility matrix.

Competing interests
The author(s) declare that they have no competing interests.