Frequent migration of introduced cucurbit-infecting begomoviruses among Middle Eastern countries

Background In the early 2000s, two cucurbit-infecting begomoviruses were introduced into the eastern Mediterranean basin: the Old World Squash leaf curl virus (SLCV) and the New World Watermelon chlorotic stunt virus (WmCSV). These viruses have been emerging in parallel over the last decade in Egypt, Israel, Jordan, Lebanon and Palestine. Methods We explored this unique situation by assessing the diversity and biogeography of the DNA-A component of SLCV and WmCSV in these five countries. Results There was fairly low sequence variation in both begomovirus species (SLCV π = 0.0077; WmCSV π = 0.0066). Both viruses may have been introduced only once into the eastern Mediterranean basin, but once established, these viruses readily moved across country boundaries. SLCV has been introduced at least twice into each of all five countries based on the absence of monophyletic clades. Similarly, WmCSV has been introduced multiple times into Jordan, Israel and Palestine. Conclusions We predict that uncontrolled movement of whiteflies among countries in this region will continue to cause SLCV and WmCSV migration, preventing strong genetic differentiation of these viruses among these countries.


Background
The frequent movement of seeds and vegetative plant material across national borders is known to contribute to the spread of plant diseases [1]. Plant diseases that are vectored by arthropods can also move where their vectors move, from field to field, and between adjacent countries. Despite the economic importance of maintaining phytosecurity, the movement of plant pathogens between countries is poorly studied [2]. We usually do not know whether the isolates of emerging plant pathogens within a region are part of a single, cohesive population on a genetic level, or if they reflect geographic structuring, nor do we know if the pathogen was introduced once or multiple times into the region (e.g., [3]).
Since the turn of the millennium, two new viruses have emerged in cucurbits in the eastern Mediterranean basin. Both are bipartite begomoviruses: single-stranded DNA viruses transmitted by members of the Bemisia tabaci species complex. Squash leaf curl virus (SLCV) was first observed in California in the late 1970's, and has subsequently been well-characterized [4]. Its arrival in Israel marked the first time a New World begomovirus was found causing disease in the Old World [5,6]. Watermelon chlorotic stunt virus (WmCSV) is an Old World virus, originally isolated in Yemen in the Arabian peninsula [7], but not observed prior to 2002 in Egypt, Israel, Jordan, Lebanon and the portions of Palestine governed by the Palestinian Authority (in the West Bank). Both of these migrants rapidly spread among the countries of the Middle East. SLCV was first noticed in Israel in 2002 and became epidemic there in 2003 [5.6]. By 2005 it had spread to Egypt [8], Jordan [9], and then was isolated from Lebanon [10] and Palestine [11,12] in 2008. The WmCSV epidemic ballooned from a single symptomatic watermelon plot in southern Israel in 2002 [6] to affect cucurbit production in Lebanon in 2009 [13], Palestine in 2010 [14] and Jordan in 2011 [15]. These viruses produce more severe symptoms when they co-infect the same plant [16], making their spread even more significant for agriculture in the Middle East.
These contemporaneous emergences afford the unique opportunity to compare the population dynamics of two begomoviruses that use some of the same hosts and vectors in the same region at the same time. SLCV has established itself thousands of miles away from its relatives, while WmCSV moved a much shorter distance, within the Old World. Despite their different journeys, our results show that both viruses may have entered the region only once, are similarly diverse, and the gene flow among nations in the Mediterranean basin is similar for both SLCV and WmCSV.

Results
Sequences of SLCV were obtained from all five countries in this study (Table 1, Figure 1), but sequences of WmCSV were only obtained from three (Table 2, Figure 1). Although WmCSV has been found previously in Lebanon [13], no symptomatic watermelons were observed there during the sampling times. WmCSV has not yet been reported in Egypt. Sampling sites were located at least 9 km apart, with one exception: as WmCSV was only found in one location in Jordan, its three sampling sites were less than half a kilometer apart.
A total of 149 SLCV (136 haplotypes) and 106 WmCSV (93 haplotypes) DNA-A genome sequences were analyzed in separate datasets (Table 3). Both viral data sets were not particularly diverse (nucleotide diversity, π = 0.0077 for SLCV, π = 0.0066 for WmCSV, Table 3). Very few putative recombinants were identified in the SLCV dataset and none in the WmCSV data set (Table 4).
Within each country, viruses were isolated from multiple locations. This facilitates studying geographic structuring within each species. SLCV showed very low levels of genetic differentiation between sample sites within each country, indicating frequent migration of viruses between sites within a single country (Table 3). This was especially true within Jordan, where no differentiation by site of isolation was observed. The barriers to SLCV migration are apparently larger between countries, as evidenced by a higher fixation index (Table 3). WmCSV showed a much more geographically structured distribution than SLCV. Within each country there was more evidence for viruses at the same site being more similar to each other than to those at other sites. However, the barriers to migration among the neighboring countries of Israel, Jordan and Palestine are low, and overall there was more evidence for migration of WmCSV across borders than for SLCV (SLCV F ST = 0.253 compared to WmCSV F ST = 0.151, Table 3). The homogeneity of WmCSV is all the more surprising, given the substantial distance between the WmCSV sampling sites in Israel and Palestine and sampling sites in southern Jordan ( Figure 1).
The biogeography of SLCV sequences is visible in their maximum likelihood (ML) phylogenetic relationships ( Figure 2). SLCV sequences show some clustering by country, most notably the Egyptian isolates forming a wellsupported clade. Isolates from Palestine largely grouped together, but without the same bootstrap support. Isolates from Israel, Jordan and Lebanon were thoroughly intermingled. There was no pattern of clustering based on site of isolation, consistent with the low F ST values in Table 3. The longest branch on the tree belonged to Israeli isolate IL1-1, which has a large deletion (in Rep: starting around residue 209, and ending just before the overlapping reading frame for AC2, encompassing most of the nuclease domain) compared to the other sequences (2356 nt, instead of the more typical >2600 for the DNA-A segments of bipartite begomoviruses). Looking past national borders, some of the sampling sites in different countries were fairly close together ( Figure 1). There was an overall trend of SLCV sequences being more genetically similar when sampled closer together (r = 0.41, p = 0.0001). However, this correlation was driven by the monophyletic, divergent Egyptian sequences, which were sampled at least 428 km away from sites in other countries. When only the sequences from Israel, Jordan, Lebanon and Palestine were considered there was a dramatic drop in the relationship between geographic and genetic distance, though it was still statistically significant (r = 0.15, p = 0.043). This result quantifies the qualitative intermingling of isolates from these nations in the phylogenetic tree ( Figure 2).
The biogeography of the WmCSV sequences ( Figure 3) showed a clustering pattern similar to that predicted by the results in Table 3. Some of the well-supported groupings contained sequences from only a single isolation site; most notably, most of the sequences from site 1 in Jordan formed a clade. The sequences were again fairly well mixed among the countries of isolation, consistent with migration among these nearby nations ( Figure 3). The relationship between sampling location and genetic distance was again significant (r = 0.24, p = 0.0001), but it was not as strong as it was for the full SLCV data set.
We then combined our datasets with whole DNA-A sequences available in GenBank. ML phylogenies including both isolates from this study and those isolated previously, or in other countries, are shown in Figures 3 and 4. SLCV in the Middle East is reciprocally monophyletic with isolates from the New World, supporting a single introduction into the Old World ( Figure 4). The additional GenBank sequences showed that more migration events have occurred than were found in our survey, for instance, a Jordanian isolate (EF532620) now nests within the previously all-Egyptian clade. For another example, an Egyptian sequence from GenBank (KC895398) did not group with the other Egyptian isolates. A Shimodaira-Hasegawa (SH) test indicated that this likely reflects two separate migrations of SLCV into Egypt: the ML tree in Figure 4 is a significantly better fit to the dataset than one that forces all the Egyptian isolates to be in a single clade, p = 0.013. We employed an identical test to see if the two groupings of SLCV from Palestine could be  explained by one single migration event. The SH test results strongly rejected that in favor of the two introductions implied by Figure 4 (p = 0.009). Our analysis supports a single introduction of SLCV into the eastern Mediterranean basin from the New World, but none of the five countries studied had SLCV isolates originating from a single introduction of the pathogen. WmCSV was introduced into these five countries from a source much closer than that of SLCV. WmCSV is an Old World bipartite begomovirus that has been isolated in Iran, Sudan, Yemen and Oman [17,18]. Isolates in our study were found to be most closely related to a WmCSV sequence from Sudan, although without strong bootstrap support ( Figure 5). The isolates from the studied countries were monophyletic with respect to these other countries, but as was the case with SLCV among Israel, Jordan, Lebanon and Palestine, the sequences isolated in these four countries showed no particular geographic structure.

Discussion
While pathogen movement between the Old and New World is on the rise [19], there have been very few instances where a begomovirus from one part of the world becomes a successful 'invasive species' in another. To date, SLCV is the only case of a New World virus establishing in the Old World [8], and Tomato yellow leaf curl virus would be the only example of the reverse [20]. In this survey, we compared the emergence of the New World SLCV with the Old World WmCSV. Our results showed remarkable similarity between these two viruses, indicating that a New World virus does not face any particular hurdles when emerging in the Old World.
The low sequence variation observed is consistent with the low levels of recombination detected in the data set. Interspecific recombination increases a population's average pairwise nucleotide differences (π) [21]. The  nucleotide diversity values are similar to some previous studies that quantified intraspecific sequence variation of Tomato severe rugose virus and Tomato yellow vein streak virus within Brazil, over regions comparable to the size sampled in this study [21,22]. However, the values are lower than those measured for many plant virus species, [23,24], and these two Brazilian begomoviruses also showed low levels of detectable recombination. The Brazilian begomoviruses are themselves newly recognized and emergent after a host shift into tomato crops, which could mean that emergent begomoviruses are less diverse than their more well-established counterparts. There are no appropriate datasets of SLCV variation in the Americas or WmCSV in the eastern Middle East with which to compare and evaluate whether emergent viruses have smaller amounts of sequence variation. Arguing against this possibility is the fact that begomoviruses are able to diversify rapidly [25]. Nonetheless, this remains an intriguing possibility, and calls for population diversity surveys over time in the same locations. Some of the isolates sequenced had premature stop codons or alternative start codons for some of the overlapping reading frame genes. While there has been some characterization of aberrant isolates [6], overall we are not certain whether some of these genomes would be capable of replicating and causing infection on their own. These sequences may not be mere sequencing errors since complementation can occur during begomovirus infection [26]. For instance, most Egyptian SLCV isolates shared a common premature stop codon at residue 122 in Rep (nine of 15 total isolates, split between the two isolation sites), which indicates that this mutation is circulating at high frequency in Egyptian fields.
The major difference between the two viruses was in population substructure. SLCV and WmCSV have been in the region for roughly the same length of time, but SLCV isolates are homogenous over sites within the same country. Different sites were used for the SLCV and WmCSV samplings, and increased differentiation among WmCSV sampling sites would imply increased distance between them. However, the WmCSV sampling sites within each country were actually closer together than the SLCV sites were (Tables 1 and 2), and WmCSV had an overall low correlation between genetic and geographic distance. It is worth noting that the three Jordanian sites were very close together (less than half a kilometer apart), and yet Jordanian WmCSV sequences showed more within-site genetic variation compared to the more distantly located SLCV sites.
Movement of these viruses over large distances is likely occurring through movement of infected whiteflies and not infected plant material. With the possible exception of exchange between Israel and Palestine, there are strict limitations on the movement of seedlings among these countries. Consequently, differences in migration rates may be due to biological factors affecting whitefly movements within each country, perhaps related to the different time of year when the watermelon crops are planted. However, since both pathogens share the same vector and can coinfect the same cucurbit hosts, it is difficult to imagine the additional hurdles WmCSV faces when dispersing compared to SLCV.

Conclusions
Our study of emergent cucurbit-infecting begomovirus diversity shows that these pathogens frequently migrate between Middle Eastern nations. This bolsters observations in the field, where skipping a planting season does not diminish the presence of the virus in the next yearthe pathogens merely move back in whenever the crop becomes available [13]. Without a greater understanding of the factors that lead to the reduced biogeography of WmCSV, we predict that WmSCV will expand its range to Egypt in the coming years.

Methods
Symptomatic watermelons (Citrullus lanatus) were sampled for WmCSV in June 2011 and squash (Cucurbita pepo) were sampled for SLCV in mid September 2011. These times were chosen to correspond to previously observed symptoms in each crop. In each case, approximately 0.1 g of tissue was collected, from the tip of symptomatic watermelon vines and from the 4th leaf from the top of symptomatic squash plants.
Sampling sites were in Egypt (EG), Israel (IL), Jordan (JO), Lebanon (LB) and the regions of the Palestine governed by the Palestinian Authority (PA). In each country, sampling was attempted in up to four separated sites, up to 20 samples per site. Not all of these samples yielded a fully sequenced DNA-A component of a begomoviral Figure 4 Maximum likelihood midpoint-rooted phylogeny of SLCV DNA-A both isolated in this study and from GenBank, created in PAUP* using a Tamura-Nei nucleotide substitution model with a gamma distribution of site heterogeneity. The locations from which various GenBank sequences were isolated, where available, were obtained from the GenBank file or from the associated publications. Sequences from Egypt (EG) are shown in red, from Israel (IL) in blue, Jordan (JO) in green, Lebanon (LB) in yellow and the regions of Palestine governed by the Palestinian Authority (PA) in purple. Sequences from all other countries are shown in black, with their two-letter country code preceding the accession number. Bootstrap support of ≥85% from both PAUP* and RaxML trees is shown by a solid circle; an open circle denotes ≥85% support from only RaxML. The closely related Rhynchosia golden mosaic Sinaloa virus (RhMSV) is also included, and serves as an outgroup.
genome. Additional collections from 2010 (June for WmCSV, and September for SLCV) were processed for IL and JO.
Agricultural practices differ among the sampled countries, and in some cases crops were grown from seed, some from seedling, and in the case of watermelons, are grown by grafting in Israel and sometimes by grafting in the other countries. Sampled fields within a country did not always have the same cropping practices.

Sample preparation and sequencing
Total nucleic acids were extracted from leaf samples according to Dellaporta et al. [27]. Samples were tested initially for the presence of SLCV by PCR using primer pair SLCVPL (5′-CCAGGAGGT GTCCTCTCAAC-3′, nucleotides 53 to 72) and SLCVPR (5′-AGAGCGTGA GACCTTTGAGG-3′, nucleotides 444 to 425), which amplifies a 391 bp fragment, and for WmCSV using the primer pair WmCSVPL (5′-TTTCGATACATGGGCC TGTT -3′, nucleotides 49 to 69) and WmCSVPR (5′-TAGCTGGAAATGGGGTTTTG -3′, nucleotides 399 to 379), which amplifies a 350 bp fragment. All amplicons included the common region of their respective DNA-A components. Amplicons were sequenced and compared with DNA-A sequences of SLCV and WmCSV. Plants from which amplicons with a 95% or greater nucleic acid sequence identity to SLCV were considered infected with SLCV, and those with 95% or greater nucleic acid sequence identity to WmCSV were considered infected with WmCSV.
Total nucleic acid extracts of SLCV-infected plants were amplified with two primer pairs, each amplifying approximately half of the viral DNA-A to obtain the full sequence of SLCV DNA-A. Primer pair Xho-SLCV-A-F (5′-CATGATTCTCGAGTACATAATTTAC-3′) and SL CVA2314R (5′-CTGCCTCATTCAATTATCTG-3′) was used to amplify a 1300 bp fragment of SLCV DNA-A and primer pair, SLCVA2295F (5′-CAGATAATTGAA TGAGGCAG-3′) and Xho-SLCV-A-R (5′-TGTACTCG Figure 5 Maximum likelihood midpoint-rooted phylogeny of WmCSV DNA-A both isolated in this study and from GenBank, created in PAUP* using a Kimura two-parameter nucleotide substitution model with a gamma distribution of site heterogeneity and a parameter for the proportion of invariant sites. The locations from which various GenBank sequences were isolated, where available, were obtained from the GenBank file or from the associated publications. Sequences from Israel (IL) are shown in blue, Jordan (JO) in green, Lebanon (LB) in yellow and the regions of Palestine governed by the Palestinian Authority (PA) in purple. Sequences from all other countries are shown in black, with their two-letter country code preceding the accession number. Bootstrap support of ≥85% from both PAUP* and RaxML is shown by a solid circle.
AGAATCATGAAATAAAATTC-3′), were used to amplify a fragment of 1500 bp of SLCV DNA-A.
All amplified viral DNAs were cloned into pTZ57R plasmid (Thermo Fisher Scientific, Waltham, MA), and sequenced. The two amplified halves from each infected plant were combined to make a single sequence. This method does potentially create artificial chimeric sequences, but we did not see any evidence of recombination corresponding to these halves in our results (Table 4). Sequences were deposited into GenBank with details about their isolation (SLCV:KM595091-KM595239; WmCSV: KM820183-KM820288).

Sequence analysis
Treating each species as a separate dataset, whole genome (DNA-A) sequences were aligned manually in Se-Al v2.0a11, using the nick site in the invariant nonanucleotide origin of replication as the site of linearization. Sequences with unique indels that caused frameshift mutations in the CP and Rep were conservatively eliminated from analysis as presumptive amplification and/or sequencing errors. Similarly, sequences with unique premature stop codons in CP and Rep were usually excluded on the same grounds. There were cases where the premature stop codon was fairly close to the end of the full-length protein, and where more than one sequence from the same country contained the identical premature stop codons. In these instances, the sequences remained in the dataset.
The resulting datasets were analyzed for recombination with RDP 3.44a using default settings, except for Kimura 2 parameter nucleotide substitution models instead of Jukes-Cantor where possible [28]. Sequences were considered recombinant if at least three algorithms (of seven: RDP, GENECONV, Bootscan, MaxChi, Chimaera, SiScan, 3Seq) showed statistical support using a Bonferronicorrected p-value.
Each species' dataset was analyzed for pairwise nucleotide diversity (π) and divergence among populations (Dxy, Nei 1987; Wright's F ST , [29]) using DnaSP [30]. We examined the phylogenetic relationship of the SLCV and WmCSV isolates with and without previously characterized sequences in GenBank. Appropriate nucleotide substitution models for each dataset were selected using the hierarchical likelihood ratio test in Modeltest [31]. These were used to create Maximum Likelihood (ML) trees using a tree-bisection-reconnection approach in PAUP* [32]. ML trees were bootstrapped 1000 times using nearest neighbor interchange. ML trees were also constructed and were rapid bootstrapped in RaxML 7.2 [33,34] on the CIPRES server (www.phylo.org) assuming the general-time-reversible nucleotide model and a gamma distribution of multiple substitutions. Trees were visualized and edited using FigTree (tree.bio.ed.ac.uk/software/ figtree) and Adobe Illustrator. Hypothesis testing on these trees using the Shimidaro-Hasegawa test was conducted using RaxML and CONSEL [35].
Pairwise distance matrices of each viral dataset were generated in MEGA 5.22 [36] assuming a Kimura 2 parameter model. Geographic distances between sampling sites were calculated using the great circle distance method as implemented by a National Oceanic and Atmospheric Administration applet (http://www.nhc.noaa.gov/gccalc.shtml). Perl scripts (available upon request) generated a symmetric matrix of distances between isolation sites to match each genetic distance matrix. The relationship between genetic and geographic distance was correlated with a Mantel test in PASSaGE v2 [37], using 999 permutations (α = 0.05).