Molecular biodiversity of cassava begomoviruses in Tanzania: evolution of cassava geminiviruses in Africa and evidence for East Africa being a center of diversity of cassava geminiviruses

Cassava is infected by numerous geminiviruses in Africa and India that cause devastating losses to poor farmers. We here describe the molecular diversity of seven representative cassava mosaic geminiviruses (CMGs) infecting cassava from multiple locations in Tanzania. We report for the first time the presence of two isolates in East Africa: (EACMCV-[TZ1] and EACMCV-[TZ7]) of the species East African cassava mosaic Cameroon virus, originally described in West Africa. The complete nucleotide sequence of EACMCV-[TZ1] DNA-A and DNA-B components shared a high overall sequence identity to EACMCV-[CM] components (92% and 84%). The EACMCV-[TZ1] and -[TZ7] genomic components have recombinations in the same genome regions reported in EACMCV-[CM], but they also have additional recombinations in both components. Evidence from sequence analysis suggests that the two strains have the same ancient origin and are not recent introductions. EACMCV-[TZ1] occurred widely in the southern part of the country. Four other CMG isolates were identified: two were close to the EACMV-Kenya strain (named EACMV-[KE/TZT] and EACMV-[KE/TZM] with 96% sequence identity); one isolate, TZ10, had 98% homology to EACMV-UG2Svr and was named EACMV-UG2 [TZ10]; and finally one isolate was 95% identical to EACMV-[TZ] and named EACMV-[TZ/YV]. One isolate of African cassava mosaic virus with 97% sequence identity with other isolates of ACMV was named ACMV-[TZ]. It represents the first ACMV isolate from Tanzania to be sequenced. The molecular variability of CMGs was also evaluated using partial B component nucleotide sequences of 13 EACMV isolates from Tanzania. Using the sequences of all CMGs currently available, we have shown the presence of a number of putative recombination fragments that are more prominent in all components of EACMV than in ACMV. This new knowledge about the molecular CMG diversity in East Africa, and in Tanzania in particular, has led us to hypothesize about the probable importance of this part of Africa as a source of diversity and evolutionary change both during the early stages of the relationship between CMGs and cassava and in more recent times. The existence of multiple CMG isolates with high DNA genome diversity in Tanzania and the molecular forces behind this diversity pose a threat to cassava production throughout the African continent.


Background
Geminiviruses are a large family of plant viruses with circular, single-stranded DNA (ssDNA) genomes packaged within geminate particles. The family Geminiviridae is divided into four genera (Mastrevirus, Curtovirus, Topocuvirus, and Begomovirus) according to their genome organizations and biological properties [1,2]. Members of the genus Begomovirus have caused significant yield losses in many crops worldwide [3] and are transmitted by whiteflies (Bemisia tabaci) to dicotyledonous plants. The genome of cassava mosaic geminiviruses (CMGs) in the genus Begomovirus consists of two DNA molecules, DNA-A and DNA-B, each of about 2.8 kbp [1], which are responsible for different functions in the infection process. DNA-A encodes genes responsible for viral replication [AC1 (Rep), and AC3 (Ren)], regulation of gene expression [AC2 (Trap)] and particle encapsidation [AV1 (CP)]. DNA-B encodes for two proteins, BC1 (MP) and BV1 (NSP) involved in cell-to-cell movement within the plant, host range and symptom modulation [1]. CMGs have been reported from many cassava-growing countries in Africa and the cassava mosaic disease (CMD) induced by them constitutes a formidable threat to cassava production [4].
Representatives of six distinct CMG species have been found to infect cassava in Africa: African cassava mosaic virus (ACMV), East African cassava mosaic virus (EACMV), East African cassava mosaic Cameroon virus (EACMCV), East African cassava mosaic Malawi virus (EACMMV), East African cassava mosaic Zanzibar virus (EACMZV) and South African cassava mosaic virus (SACMV) [5]. Recent studies have uncovered much variation in CMGs including evidence that certain CMGs, when present in mixtures, employ pseudo-recombination or reassortment strategies and recombination at certain hot spots such as the origin of replication [6][7][8][9][10] resulting in the emergence of 'new' viruses with altered virulence. For instance, an ACMV-EACMV recombinant component A, designated EACMV-UG2, and a pseudo-recombinant component B, designated EACMV-UG3 [10], have been implicated in the pandemic of severe CMD currently devastating cassava in much of east and central Africa [4]. In 1997, only ACMV and EACMV were known to occur in Tanzania with the former occurring only in the western part of the country [11]. The discovery of EACMZV on the island of Zanzibar [12] together with the recent spread into Tanzania of the EACMV-UG2 associated pandemic of severe CMD [4,13] has aggravated the CMD situation. Consequently, there is much to be learned about the identity, distribution, molecular variability, and the threat that these emerging geminiviruses pose to cassava production in Tanzania and more generally in Africa.
In 1997, the first recombination between two species of geminiviruses was recorded [7,8]. This mechanism is now known to be widely used by all geminiviruses and is probably the most important molecular mechanism for generating genetic changes that allow novel geminiviruses to exploit new ecological niches [2,14]. This paper describes the results of a molecular study of the sequences of CMGs collected from the major cassavagrowing areas of Tanzania in an effort towards identifying, determining molecular variability and mapping the distribution of CMGs. In addition, because East Africa seems to be unusually rich in virus biodiversity and because the most recent cassava pandemic was first reported in East Africa, we investigated the extent of inter-CMG recombinations and examined their role in the evolution of CMGs in Africa.

Assessment of CMD symptoms
Over 80% of the cassava plants in the fields showed severe CMD symptoms with cassava in the Lake Victoria basin expressing the most severe symptoms followed by that from the southern regions. Symptoms of infected cassava samples collected in the field were reproduced in controlled conditions to examine symptom variability. From a total of 35 selected cuttings planted, 25 (71%) were successfully established in the growth chamber. In all cases, regardless of the cultivar, symptoms expressed in the field, whether moderate or severe, were reproduced in the growth chamber and plants did not recover from the disease even 12 months after planting (Fig. 2). Likewise, plants that displayed moderate symptoms in the field showed a similar symptom in the growth chamber as was the case for plants singly-infected with ACMV-[TZ] (Fig.  2).

Detection of viral genomic components
PCR amplification products (2.7-2.8 kbp) were observed for all the CMG isolates tested using primer UNIF/UNIR (Table 1) designed to amplify near-full-length DNA-A of CMGs. Bands were not observed with the negative control (nucleic acid preparation from healthy cassava plants). Similarly, a specific (2.7 kbp) product was observed when using abutting primers TZ1B-F/R designed from a 560 bp DNA-B fragment initially PCR-amplified using universal primers EAB555/F and EAB555R for general detection of CMGs DNA-B. DNA-B partial fragments (544-560 kbp) were consistently amplified by PCR using primers EAB555-F and EAB555-R (Table 1) for all the CMD-diseased samples previously shown to contain EACMV isolates collected from major cassava-growing areas in Tanzania [13].

Complete nucleotide sequence characteristics of CMGs from Tanzania
The complete DNA-A sequences of seven representative CMGs from the major cassava-growing areas were determined from the representative isolates selected and grown in the growth chambers. An ACMV isolate from Tanzania (ACMV- [TZ]) was shown to be most closely related to ACMV-UGMld from Uganda with a sequence identity of 97%. Its DNA-A nucleotide (nt) sequence was established to be 2779 nts in length. It has a high overall sequence identity (> 90%) with all other published sequences of ACMV isolates (Table 2) with which it clusters in the phylogenetic tree presented in Figure 3. The DNA-A sequence organization was typical of a begomovirus, with two open reading frames (ORFs) (AV2 and AV1) in the virion-sense DNA, and four ORFs (AC1 to AC4) in the complementary sense, separated by an intergenic region (IR). Complete nt sequences of the DNA-A genomes of the different Tanzanian EACMV and ACMV isolates were compared with published sequences (Table 2) (Fig. 3). The complete nt sequence of the EACMCV-[TZ1] DNA-B component was determined to be 2726 nts long and had the highest sequence identity (85%) with EACMCV-[CM] DNA-B with which it is grouped in the phylogenetic tree (Fig. 4). It had less than 72% homology with DNA-Bs of other EACMV isolates from East Africa.
The complete DNA-A genome of CMG isolates from Yombo Vituka (YV) and Tanga (TZT) in the coastal area of Tanzania were determined to be 2800 and 2801 nts long

Determination of genetic diversity of EACMV DNA-B using partial sequences
The diversity of different CMG isolates was analyzed using a partial DNA-B genomic region spanning the N-terminal region of BC1 to the intergenic region (IR). Identities of these sequences with those of the corresponding DNA-B genomic regions of other CMGs in GenBank were determined. Generally, the EACMV isolates showed little genetic divergence amongst one another and isolates collected from the same area displayed high nt sequence identity. Isolates TZB1 and TZB7 from the southern part of Tanzania shared the highest (98%) nt sequence identity followed by TZB3 and TZB8 (94%) as well as TZB and TZB10, all from the east coast area. TZB2 was most closely related to and shared 91% sequence identity with TZB4, both collected from the coastal area. None of the isolates from the south or coastal areas shared >85% nt sequence identity with those from the Lake Victoria basin (TZB9 and TZB12).
The phylogenetic tree generated from a multiple alignment of 13 EACMV isolates with selected bipartite begomovirus sequences and EACMCV-[TZ1] B component is shown in Figure 4. All 13 Tanzanian isolates studied clustered with the reference EACMVs, with TZB6 being most closely related to Ugandan isolates (EACMV-UG3Svr, EACMV-UG3Mld and EACMV-UG1) (Fig. 4) sharing 97% nt sequence identity. Four isolates (TZB3, TZB5, TZB8 and TZB9) formed a closely related group, with TZB8 and TZB9 being the most closely related. Isolates TZMB, TZB5 and TZB11 each grouped separately. None of the EACMV isolates grouped with ICMV and SLCMV from the Indian subcontinent (Fig. 4). Phylogenetic tree (1000 boot strap replications) showing the DNA-A complete nucleotide sequence relationships between the seven Tanzanian cassava mosaic geminivirus isolates (in blue) and other cassava mosaic geminiviruses

Capsid protein (CP) gene sequence analysis and comparison with selected viruses
The CP gene sequences of the seven CMGs identified in our study were compared to published sequences (  (Table 3), suggesting it contained the recombination at the CP gene level previously reported [7,8] for EACMV-UG2.
A phylogenetic analysis of the CP of Tanzanian CMGs yielded a tree (Fig. 5) that was in agreement with the relationship predicted by pairwise sequence comparison (  (Fig. 5).

The common regions (CRs) of the Tanzanian CMGs
The conserved nonanucleotide in the hairpin-loop, TAATATTAC, that is characteristic of the members of the family Geminiviridae and the AC1 TATA box, were identified in the CR sequences of all the Tanzanian CMGs (Fig.  6a,6b). The CR of ACMV-[TZ] was 170 nts long while those for EACMV were between 152 and 157 nts in length.
When the CR sequence of ACMV-[TZ] was compared and aligned to the published CR sequences of other cassavainfecting ACMV isolates from Africa (Fig. 6a), it was apparent that ACMV- [TZ] was virtually identical to all ACMV isolates. The repeated motif upstream the TATA box for all the published ACMV isolates was AATTGGAGA (Fig. 6a). The motif for ACMV-[TZ], AATTGGAGA, was identical. Figure 6b presents the alignment of the CRs of the Tanzanian EACMVs with sequences of all published EACMVs. It was found that all the isolates contained the various features characteristic of begomoviruses. The putative Rep-binding sequences (iterons) were GGT-GGAATGGGGG for all the Tanzanian isolates except EACMV-[TZ/YV] that had different iterons (GGGGGAACGGGGG) and a total of 23 mismatches in the entire CR. It is worth noting that although the genomes of the two isolates of EACMZV are EACMVbased, their CRs are more similar to ACMV than to EACMV and the iteron is AATTGGAGA.  (Table 4).

Geographical distribution of the CMGs in Tanzania
The representative isolates sequenced here have been chosen because they represent a range of different RFLP patterns found during a large set of 485 samples collected throughout Tanzania [13]. However, the selection of isolates to sequence was based on the differences in RFLP patterns and not on their frequency of appearance in the country. Figure (Fig. 7).

Comparisons of the East African and West African isolates of EACMCV i) Comparisons of the A components of EACMCV-[TZ]
The East African cassava mosaic Cameroon virus isolates from Tanzania (EACMCV-[TZ1, TZ7]) are very typical isolates of the species East African cassava mosaic Cameroon Phylogenetic tree of the coat protein gene (CP) nucleotide sequences of the cassava mosaic geminivirus isolates from Tanzania and other cassava begomoviruses (1000 bootstrap replications) Figure 5 Phylogenetic tree of the coat protein gene (CP) nucleotide sequences of the cassava mosaic geminivirus isolates from Tanzania and other cassava begomoviruses (1000 bootstrap replications). Sequence of tomato golden mosaic virus (TGMV-YV) was used as the out-group. Abbreviations and accession numbers can be found in Figure 3.
virus. The A component was 89 to 90% identical to the isolates from Cameroon and Ivory Coast and the 300 nts that differ are scattered all along the genome. In addition, the A components from East Africa showed the typical recombination already noted in the West African isolates, i.e. a fragment of about 800 nts not of EACMV origin, covering AC2-AC3 and the C-terminus of AC1 (Fig. 8A).

ii) Comparisons of the B components of EACMCV
The EACMCV West African isolates had only a stretch of 800 nts in the BC1 region in common with EACMV isolates from Uganda, the only B component available for EACMV: the rest of the sequence was completely different. The DNA-B of the East African EACMV isolates is ± 85% homologous to the West African isolates. The pairwise profile (Fig. 8B) showed the same recombinant fragment of about 800 nts with above 90% identity with West African isolates of EACMCV and other East African isolates such as EACMV-UG3, EACMZV and SACMV. The rest of the genome showed greater relatedness to the West African isolates of EACMCV, above the "species threshold" limit. Overall, the EACMCV-[TZ1] B component can be considered a non-closely related strain of the B component of EACMCV-[CM], but much closer than the B components of other East African cassava viruses.

iii) Comparisons of the common regions (CRs) of EACMCVs from Cameroon and Tanzania
The common region of A components (CRAs) were 82% to 89% identical to those of West African isolates, which is low but not abnormal as the West African isolates were 91% identical to one another ( Table 4). The differences are mostly in the variable region between the TATA box and the TAATATTAC stem-loop, but also in the rest of the sequence. The CR of B components (CRBs) of the EAC-MCV-[TZ1] isolate was more distantly related, at between 78% and 80% homology to the CRBs of the West African isolates, while they were 97% homologous to one another. The differences were mostly in the variable region. When both (CRAs and CRBs) were compared, it was apparent that CRs of the East African isolates were more similar to the CRAs of West Africa than the CRBs of West Africa. This arises mainly from a deletion of GAAAA, and from a more similar sequence in the region between the TATA box and the stem-loop. The putative replication protein binding sequences (iterons) were GGTGG-AAT-GGGGG for all the isolates except for the Bs of West Africa where it is GGTGG-AAC-GGGGG. There is a repeat of GGGGG in the 5' end of the CRs for all the isolates (Fig.  6B).

Recombination analysis of cassava mosaic geminiviruses
The pairwise analysis performed on all African cassava viruses sequenced so far, with two Indian cassava viruses as out-groups, and including the viruses isolated in Tanzania (here described), showed a number of putative recombinant fragments for both components. Figure 9 shows a genomic map for each component and summarizes the results obtained for the A and B components.

i) Pairwise analysis of the A components African cassava mosaic virus
None of the ACMV sequences obtained so far exhibited a possible recombinant fragment. An isolate of ACMV was involved in a recombination between EACMV and ACMV to produce the EACMV-UG2 isolate, which was associated with the epidemic in Uganda in the 90s [7,8]. But it is worth noting that ACMV acted as a donor of DNA, not a receiver, in the recombination. The situation for the EACMV-like viruses is very different, as they exhibit mul-

East African cassava mosaic Zanzibar virus
Two isolates of EACMZV from Zanzibar and Kenya [12] have most of their genomes from EACMV; approximately 200 nts (2050 to 2250 nts) are similar to SACMV and the rest of the genome, covering AC1, AC4 and the CR, is unique and therefore attributed to EACMZV or an ancestor of EACMZV (Fig. 9A).

East African cassava mosaic Cameroon virus
Several EACMCV isolates from Cameroon, Ivory Coast and now Tanzania (this report) belong to the species East African cassava mosaic Cameroon virus (see paragraph 3.6; [9]); all share the same putative recombinant fragment, i.e. a fragment of 800 nts (AC3-AC2-CterAC1), that is unique and therefore attributed to EACMCV (Fig. 9A) or a common ancestor. However, the three isolates from West Africa do have a small recombinant fragment (100-250 nts) that is also unique to EACMCV, but this fragment is not present in the Tanzanian isolates.   (Fig. 9A). The fragments 550-800 and 900-1050 nts are therefore attributed to EACMMV or an ancestor. The major difference with the SACMV isolates resides in the fact that the rest of the genome is purely EACMV-like, with the exception of 100 nts in the AC1 gene (1950-2050 nts).

South African cassava mosaic virus
One virus isolate of the species South African cassava mosaic virus from South Africa (SACMV-[ZA]) [16] exhibited a putative recombination, i.e. most of the first 1000 nts (CR, AV2 and most of AV1) and then the last 800 nts (NterAC1, AC4 and CR) are unique for this virus and con-sequently attributed to SACMV, or an ancestor of SACMV. The rest of the genome, covering AC3-AC2 and the C-terminus of AC1, is typical of EACMV (Fig. 9A). Another two isolates of SACMV, one from Madagascar (SACMV-[M12]) and one from Zimbabwe (SACMV-[ZW]), although belonging to the same species as the virus from South Africa, have a different recombination pattern, i.e. the first 1050 nts are similar to EACMMV with portions that are SACMV-type and portions that are EACMMV-type (Fig. 9).
The SLCMV-[Col] and ICMV-[Mah] isolates, here used as out-groups [17], exhibited a large recombinant fragment of 1200 nts, possibly originating from ICMV [18] and encompassing NterAC1, AC4 and all the CR. Noticeably, several recombination sites are aligned among the different genomes, possibly indicating "hot spots" for recombination and possibly also delineating fragments in which variation led to selective evolutionary advantage. have not been cloned yet and therefore we have only partial information. The ACMV B sequences available did not show any recombination. The EACMCV isolates from Cameroon, Ivory Coast and Tanzania all showed the same putative recombinant fragment, i.e. between 1700 and 2300 nts, corresponding to part of the BC1 gene. Interestingly, and a contrario to the EACMCV A component, most of the B genome is unique and only the recombinant fragment originates from EACMV (Fig. 9B); the rest of the genome is therefore marked as the EACMCV-type (Fig. 9B) [17].

Quantification of the percentage of EACMV-type and SACMV-type sequences in each virus
From the recombination analysis and phylogenetic results, it is clear that all EACMV-like viruses share a portion of the EACMV backbone sequence. The recombination map was used to calculate these percentages, indicated in Figure 9 for each component. This percentage varies from 38 to 100% depending on each virus for the A components and from 22 to 100% for the B components. A similar calculation can be made for sequences that are SACMV-type and the results vary between 0 and 60% for the A components and from 0 to 16% for the B components (Fig. 9). Figure 11 shows a repartition of these percentages according to the different viruses cloned and according to a transect between Uganda and South Africa, indicating that the EACMV backbone sequence decreases towards South Africa while the SACMV-type sequence increases.

Discussion
The present study confirmed the presence of representatives of 3 species of CMGs in Tanzania: one isolate of ACMV, four isolates of EACMV, and two additional isolates of EACMCV. The complete DNA-A nucleotide sequences of these isolates were determined.

ACMV
It is apparent from the results of this study that several CMGs exist in Tanzania showing a high genetic diversity. The ACMV characterized from Tanzania was found to have very high overall DNA-A nt sequence identity to all the other isolates of ACMV sequenced so far. As there is no relation between the origin of ACMV isolates and their sequence relationship with other isolates, it is impossible to tell if the one found in Tanzania is more related to one ACMV isolate than another. As it is the first isolate to be sequenced from Tanzania Tanzania show from two to three times more sequence variability and two extra recombinant fragments, together with the fact that the parent EACMV has not been found so far in West Africa, suggests an East African origin of this virus species, and therefore a possible spread from the East to the West as indicated in Figure 10.

EACMV-TZ, -KE, -UG
The rest of the CMGs cloned in this study were closely related to those reported in the neighboring countries of Map of Africa depicting the putative inter-species recombinations of components A and B of cassava mosaic geminivirusess identified in different parts of Africa, either from this study or from GenBank accessions Figure 11 Map of Africa depicting the putative inter-species recombinations of components A and B of cassava mosaic geminivirusess identified in different parts of Africa, either from this study or from GenBank accessions. The significance of the color codes is given in the figure. Where the component B of a particular virus has not been cloned, it is indicated in letters for a different species representative or as a faded drawing for a different isolate. For simplification of the drawing, not all the ACMV isolates have been shown as they are very similar. Similarly, the EACMV-UGs associated with the CMD pandemic now present in several central African countries have not been depicted as they are of very recent introduction (less than 10 years). The solid blue arrows represent the possible "route" of evolution of the EACMCV viruses, and the green arrows represent the possible "route" of evolution of the EACMV viruses. shared very high DNA-A and CR sequence identity with EACMV-UG2Svr from Uganda. The CR also showed 100% nt sequence identity with EACMV-UG2Svr as well as high CP sequence identity to ACMV isolates because it has the same recombination as its closest relative, EACMV-UG2Svr, that was proven to involve two viruses (ACMV and EACMV) [7,8]. This CMG was localized in the northwestern part of Tanzania in the post-epidemic area. It is noticeable that this virus, which has invaded a large portion of Central Africa in just a few years [4] (Fig. 8), has not yet reached the southern and eastern part of Tanzania.

Recombination of A and B components
Using all the CMG sequences available so far, we have shown that both A and B components of most of the CMGs exhibit putative recombinant fragments from various known or unknown origins. Despite the smaller number of sequences of DNA-B components and the smaller number of putative recombinant fragments, it is interesting to note that, as for the A components, it seems that there are "hot spots" for recombination. These apparent hotspots for recombination could result from physical constraints in the virus sequences or could simply result from the functional constraints of having recombinant proteins that keep structural and biological functions. These hotspots have already been mentioned in other general studies of geminivirus recombination [14] as well as in specific studies of particular groups of geminiviruses [20].

Two categories of CMGs in Africa
Based on recombination analyses, it is apparent that there are really two different categories of CMGs. The ACMV group does not have fragments of foreign geminivirus DNA in their genomes. By contrast, all other African CMG species groups show evidence of extensive recombination. It is also significant that EACMCV isolates obtained from each side of the African continent appear to share a similar genetic make-up and recombination pattern. This suggests that these viruses had a common origin, probably in East Africa, but diverged a long time ago. Recombination events have been shown to be key factors in the development of CMD epidemics [7,8,19] and it has been suggested that recombination is a significant contributor to geminivirus evolution [14]. Recombination involving the CP sequence has been reported for EACMV-UG2 from Uganda [7,8,10], a virus that has been associated with the current CMD pandemic that has devastated cassava in eastern and central African countries [4,21], although there is currently no proof that this event has been the key factor driving the pandemic's spread.

B components of CMGs in Tanzania
The diversity of DNA-B components of EACMV from Tan EACMMV have yet to be sequenced. However, these short fragments indicated a clustering, apart from EACMV-UG and EACMCV, into 4 additional clusters that could reflect an even greater molecular diversity in the B components of CMGs in East Africa than we currently recognize.

EACMV evolution
The clustering of all the EACMV-like viruses into one species has been the topic of much scientific debate in recent years. ICTV (International Committee on Taxonomy of Viruses) finally decided to split them into 5 species (for now), mostly to comply with the ICTV guidelines for species demarcation, but clearly these viruses are closely related and had common ancestors. All EACMV-like viruses with the exception of EACMCV occur in East Africa, and mostly east of the Rift Valley. Evidence presented here and elsewhere now provides a strong case for an East African origin for the EACMVs. EACMCV is widely-distributed across West Africa, albeit at low incidence [4]. Whilst it seems likely that this is the result of an early introduction or introductions from East Africa, it is not currently clear when such an introduction(s) might have taken place. It is even possible that the spread of this virus occurred in another host, long before cassava was introduced into Africa.
Finally, the rapidly expanding EACMV-UG2 associated pandemic of severe CMD in East and Central Africa represents a contrasting, and currently probably unique, scenario in which the combination of a virulent recombinant virus, superabundant vector populations and susceptible local cassava germplasm have led to a rapid expansion in the geographic range of EACMV-UG with a concomitant devastating impact on cassava cultivation. Furthermore, it is significant that when considering the proportion of pure EACMV backbone sequences in the A components of all the EACMV-like viruses, there is a clear gradient from East Africa to South Africa, going from 100 to 38%, suggesting firstly that these viruses are highly related and secondly that the origin of the EACMVs might have been East Africa, hence the green arrows in Figures 8  and 11. Similarly, a reverse gradient for the SACMV-like sequence, going from 8 to 60% from Zanzibar to South Africa, suggests that the SACMV ancestor was located in South Africa. Because recombination can only occur when the two parent viruses are in the same plant, it is logical to expect a spatial relation between the different viruses and their genetic make up. It is, however, the first time that such gradients have been demonstrated for geminiviruses. The situation for the B components is completely different. There is no EACMV-like gradient from North to South, as most of the available sequences show a great proportion of EACMV-like sequences. However, it is evident that EACMCV has captured a B component completely different from EACMV, with only a small EACMVlike fragment. This result is concordant with the idea that B components can be recruited independently from the genetic nature of A components as already suggested for SLCMV and ICMV [17].

East Africa
East Africa has been the cradle for many biological organisms beginning with humanity. From this work, it is also apparent that Tanzania may also be a potential source of origin of the family of EACMVs. The revealed strain diversity further exemplifies the wealth of this part of Africa with respect to cassava geminiviruses. Some of these viruses have been introduced very recently, such as EACMV-UG2 [TZ10], while others, such as ACMV and the EACMVs, have clearly been present much longer. The East-African Arc Mountain is known to be the main biodiversity hot-spot in Africa, and an important refuge for plants and animals [22], therefore it is plausible that some of the geminiviruses that were invading local host plants were spread throughout Africa in their local hosts (for many millions of years), as it was suggested for Rice yellow mottle virus [23] colonizing the domesticated host in very recent history (a few hundred years). The same type of geminiviruses would have colonized cassava wherever they might be, beginning with that crop's introduction into the African continent in the XVI th century, as these viruses would have had the same potential for such colonization. This might have been the case for EACMCV for which our data presented here suggest an old East African origin for the now widely distributed EACMCV in West Africa. In addition to this scenario, it is certain that cassava geminiviruses have been exchanged throughout the movement of virus infected cassava cuttings via human intervention and by the natural vector Bemisia tabaci. The latter may account for the EACMV/SACMV gradient between East Africa and South Africa, favoured by a natural corridor along the eastern Rift Valley and created by the recombination capacity of CMGs present in the same region.
However, more sequences are required in order to compare and contrast variability within and between the virus populations and to strengthen the understanding of their evolutionary interrelationships. The rapid spread of the EACMV-UG2 associated pandemic has been driven through superabundant whitefly populations [24], but other important forces in CMG movement and evolution include movement of cassava cuttings and transmission from and into alternative weed hosts. Although cassava was brought to Africa in the XVI th century, it attained its current Africa-wide distribution as recently as the XIX th century. Most current movement occurs informally as farmers move cuttings locally. Wider distribution is less frequent but may be more significant in enabling major displacements of CMGs, such as that hypothesized for the introduction of EACMCV from East to West Africa. Although rapid spread of up to 100 km per year has been reported for the EACMV-UG associated pandemic [4], elsewhere there appears to be much less local spread of CMGs by whitefly, and physical barriers including lakes, forests and regions where cassava is not grown, appear to be effective in curtailing local spread of CMG. This would seem to account for the apparent 'island' of EACMCV in southern Tanzania as well as the absence of ACMV from coastal East Africa.

Conclusion
In conclusion, we have established the existence of different CMG isolates, strains and species in Tanzania with some isolates resembling those reported previously in East African countries and two isolates very similar to the geographically distant EACMCV from West Africa. This study demonstrates that East Africa is rich in CMGs and could be the cradle for CMG diversification in Africa. It also highlights the urgent need for more information.
Only through building a thorough understanding of these important plant pathogens and the evolutionary processes underpinning their emergence can we hope to develop effective and sustainable approaches to managing the disease they cause.

Symptom reproduction in the growth chamber
Selected cassava cuttings collected from the fields were planted in a growth chamber at 25°C with a 16 hours day length and 50% relative humidity and watered twice weekly. CMD symptoms were recorded daily on the newly formed leaves for the first three months and every three days in the subsequent months for an eight month period. Symptom severity on the top five fully-expanded leaves was scored using a scale described by Fauquet et al [25].

DNA extraction
Total DNA was extracted from the symptomatic cassava leaves collected in the field and growth chamber as described by [26].

Polymerase chain reaction, cloning, and sequencing
Full-length copies of DNA-A were amplified from total cassava plant DNA extracts using sets of primers (Table 1). UNI/F and UNI/R are degenerate primers with annealing positions in the AC1 gene designed to amplify near fulllength DNA-A of CMGs (2.7-2.8 kbp) leaving an unamplified portion of ~17 nts. From the near full-length CMG sequences, primers were designed to amplify the remaining partial DNA-A sequences including the missing 17 nts from the original samples. Partial fragments consisting of a region between the BC1 gene and intergenic region (IR) of DNA-B components of EACMV isolates from different cassava-growing areas were amplified by universal primers EAB555-F and EAB555-R (Table 1) designed to amplify PCR products of about 540-560 kbp depending on the virus isolate. In order to amplify the DNA-A and DNA-B full-length, PCR was performed with 94°C denaturation followed by 35 cycles of 1 min at 94°C, 59°C for 1 min and 2 min at 72°C. For amplification of the partial DNA-B fragment (BC1/IR), PCR conditions were 30 cycles of 94°C for 1 min, 55°C for 1 min, 72°C for 1 min and an extension cycle of 10 min at 72°C. PCR products of the expected sizes were electrophorezed in a 1% agarose gel in TAE buffer (10 mM tris-acetate, 1mM NaEDTA, pH 8.0), purified, and cloned into the pCR 2.1 vector using the TA cloning kit (Invitrogen, San Diego, CA). Clones containing putative viral sequences were identified by miniprep screening and confirmed positive for inserts by PCR amplification using their respective PCR primers, and inserts were subsequently sequenced in both directions. The complete and partial nucleotide sequences of CMGs were determined by the dideoxynucleotide chain termination method using an ABI automatic sequencer on both orientations at the Protein and Nucleic Acid Chemistry Laboratories (PNACL), Washington University School of Medicine, St. Louis, Missouri, USA (ABI377 DNA sequencer, Perkin Elmer, Foster City, CA). Sequence fragments of < 600 kbp were generated using M13 universal primers. Moreover, to obtain overlapping data from opposite strands of large or full-length fragments, single primers were constructed for genome walking. Sequences were submitted to GenBank and the accession numbers are as follows:

Computer analysis of CMG sequences
Virus sequences were edited using BioEdit Sequence Alignment Editor (Hall, 1999) and SeqEdit (DNAStar, Madison, WI) to obtain a consensus sequence for each. Reference geminiviruses for full length CP and CR sequence alignments were compiled by extracting the complete DNA-A and DNA-B sequences, the CP ORF (approximately 765-777 bp) and CR sequences (approximately 150-170 bp) from sequences available in Gen-Bank (accession # are provided in the figures). Multiple sequence alignments of the full-length DNA-A, DNA-B, capsid protein (CP) gene and common region (CR) were carried out using the Clustal Program (MegAlign, DNAStar). The phylogenetic trees were constructed from the multiple alignments by the neighbor-joining majority rule consensus. Multiple alignments were analyzed by maximum parsimony with full-length DNA-A, DNA-B and CP phylogenetic trees using Phylogenetic Analysis Using Parsimony (PAUP) [27] and a bootstrap analysis with 1000 replicates was performed. Only values above 50% were reported on the trees in the figures. Virus specific iterons in the CR of selected CMGs were identified and compared with the analogous iterons of the Tanzanian isolates of CMGs.

Recombination analysis for cassava mosaic geminiviruses
The pairwise comparison sequence analysis (PCSA) method compares the profile of a pair of sequences to that of an average profile of sequences that are selected a priori, based on knowledge that the selected sequences are related to the species or the isolate levels [20]. According to the guidelines established by the ICTV Geminiviridae Study-Group, two geminivirus sequences sharing more than 89% identity of their A component sequences are considered strains or isolates of the same species. Where homology is less than this, they are considered to be members of different species [5]. However, viruses that share between 80 and 90% sequence identity are often found to be recombinants [2], therefore, in the PCSA, we consider viruses sharing less than 80% identity as different species. PCSA profiles were carried out between sequences of different species and of different isolates and an average profile for the considered cluster of viruses was calculated for these two categories with increments of 50 nts along the genome sequence. A standard deviation value for each segment was calculated and minimum and maximum values corresponding to two standard deviation values were also calculated (Fig. 1). Each chosen pairwise analysis for putative recombinant sequences was then compared to the species average profile and the pertaining of each 50 nts fragment to this category is examined. Segments different from more than 2 standard deviation values were considered to be putative recombined fragments. For each PCSA, a putative recombination percentage for the genome is calculated and a corresponding map can be drawn. It is verified (a posteriori) that the particular representatives of species and isolates selected for the 'Species and Isolate Average Curves' are 100% non-recombinant at the time of the analysis [20]. No statistical test is applied to PCSA.