Molecular diversity of Cotton leaf curl Gezira virus isolates and their satellite DNAs associated with okra leaf curl disease in Burkina Faso

Okra leaf curl disease (OLCD) is a major constraint on okra (Abelmoschus esculentus) production and is widespread in Africa. Using a large number of samples representative of the major growing regions in Burkina Faso (BF), we show that the disease is associated with a monopartite begomovirus and satellite DNA complexes. Twenty-three complete genomic sequences of Cotton leaf curl Gezira virus (CLCuGV) isolates associated with OLCD, sharing 95 to 99% nucleotide identity, were cloned and sequenced. Six betasatellite and four alphasatellite (DNA-1) molecules were also characterized. The six isolates of betasatellite associated with CLCuGV isolates correspond to Cotton leaf curl Gezira betasatellite (CLCuGB) (88 to 98% nucleotide identity). One isolate of alphasatellite is a variant of Cotton leaf curl Gezira alphasatellite (CLCuGA) (89% nucleotide identity), whereas the three others isolates appear to correspond to a new species of alphasatellite (CLCuGA most similar sequence present 52 to 60% nucleotide identity), provisionally named Okra leaf curl Burkina Faso alphasatellite (OLCBFA). Recombination analysis of the viruses demonstrated the interspecies recombinant origin of all CLCuGV isolates, with parents being close to Hollyhock leaf crumple virus (AY036009) and Tomato leaf curl Diana virus (AM701765). Combined with the presence of satellites DNA, these results highlight the complexity of begomoviruses associated with OLCD.


Findings
Okra leaf curl disease (OLCD) is commonly observed among okra (Abelmoschus esculentus) crops in Burkina Faso (BF) and several African countries [1][2][3][4][5]. Affected plants are severely stunted with apical leaf curl (upward or downward), distortion and thickening of the veins. In BF, okra is widely grown in both rainy and dry seasons. It is a major source of income particularly for smallscale farming. Viral diseases are important constraints in the production of this crop [6]. Recently, it was shown that OLCD in Africa is associated with a complex of begomoviruses: Cotton leaf curl Gezira virus (CLCuGV; [7,4,5]), Okra yellow crinkle virus (OYCrV; [8]) and Hollyhock leaf crumple virus (HoLCrV; [9,10]).
Viruses of the genus Begomovirus belong to the family Geminiviridae and are transmitted by the whitefly vector Bemisia tabaci to dicotyledonous plants [11]. They have emerged as a major constraint for many vegetable and fibre crops throughout the world [12]. Begomoviruses are either bipartite with two genomic components, designated as DNA-A and DNA-B or monopartite with only DNA-A like components [13]. Some of the monopartite begomoviruses are also associated with additional circular ssDNA molecules, such as betasatellite or alphasatellite (previously known as DNA-1) that are nearly half the size of DNA-A. Betasatellites have been involved in pathogenicity but alphasatellites have no known function and are certainly not involved in symptom induction [14][15][16]. Alphasatellites have only been shown to be present in plants infected with monopartite begomoviruses in association with betasatellites [17].
The aim of our study was to characterize at the molecular level the complex of viruses involved in OLCD in BF and their relationship with other begomoviruses. In association with a single Old World begomovirus, we describe their associated satellite DNAs.
During May 2008 to April 2009, 74 leaf samples exhibiting typical OLCD symptoms were collected from okra fields in the major growing regions of BF around Tiébélé, Kampala, Pô, Kamboinsé, Bazèga and Bama (Kou valley) localities. Total DNA was extracted using DNeasy® Plant Minikit (Qiagen) before detection of begomoviruses using polymerase chain reaction (PCR) with specific primers of either the DNA-A [18] or betasatellite and alphasatellite [19,20]. Full-length viral genomes were amplified from the PCR-positive samples by rolling-circle amplification (RCA) [21]. The amplified DNAs were digested with endonucleases BamHI or PstI, and the DNA fragments of the expected size (~2.8 kb for DNA-A and~1.4 kb for satellites) were cloned into pGEM®-3Zf (+) vector (Promega Biotech). Cloned genome components were sequenced by Macrogen Inc. (South Korea). Contigs were assembled with the DNAMAN software (Lynnon, Quebec, Canada) and subsequently aligned using the ClustalW tool [22] implemented in MEGA 4 [23]. Sequence comparisons were performed in MEGA 4 with pairwise deletion of gaps. The optimal model of sequence evolution, defined with ModelTest [24], was used for maximum likelihood (ML) phylogenetic reconstruction using PHYML_v2.4.4 [25]. The degree of support for individual branches within the resulting phylogenetic trees was assessed with 1000 full ML bootstrap iterations. The trees were visualized using FigTree v1.1.1 software.
Recombination was analyzed using our sequences and a set of sequences representing the whole African begomovirus diversity (representing an alignment of 121 sequences). Detection of potential recombinant sequences, identification of likely parental sequences, and localization of possible recombination breakpoints was carried out using RDP [26], GENECONV [27], BOOTSCAN [28], MAXIMUM CHI SQUARE [29], CHIMAERA [28], SISTER SCAN [30] and 3Seq [31] recombination detection methods as implemented in RDP3 [32]. The analysis was performed with default settings for the different detection methods and a Bonferroni corrected P-value cut-off of 0.05. Only events detected with 3 methods or more were accepted.
Despite a very poor preservation of samples (high necrosis), 48 samples of the 74 were detected as being infected with begomovirus using PCR amplifications with the universal primer pair VD360-CD1266 recovering the conserved CP ORF [18]. From the positive samples, 23 [34]. The IR sequences located between the start codons of the C1 and V2 are 289 to 300 nt. In this region, they present a typical replication origin (↓), including an inverted repeat sequence containing the highly conserved nanonuclotide sequence TAATAT-T↓AC [35,5].
Based on the presently applicable species demarcation threshold of 89% for begomoviruses [36], we conclude that the 23 begomovirus isolates isolated from okra in BF belong to the species Cotton leaf curl Gezira virus and the Niger strain (See Table 1 for percentage of similarities and Table 2 for isolates description and accession numbers). In addition, a maximum-likelihood phylogenetic tree constructed using PHYML and the GTR+I+G model of sequence evolution (ModelTest), confirms that okra begomoviruses reported here cluster with the isolates of Cotton leaf curl Gezira virus (CLCuGV) (Figure 1). A clear phylogeographic separation is observed between the diversity of CLCuGV isolates of okra: West Africa (Niger strain), Central Africa (Cameroon strain), East Africa (Sudan strain) and northeast of the Africa (Egypt strain).
Betasatellites were found associated to all isolates from     Table 2 for isolates name and acronyms), plus additional sequences from African and Asian monopartite and bipartite begomoviruses.   1348, 1347, 1349, 1348, 1347 and 1347 nucleotides, respectively. All betasatellites showed typical features consisting of the presence of a single ORF βC1 in the complementary-sense, a region of sequence rich in adenine (A) (nt 703-892 with 58.4 to 58.7% A residues) and a satellite conserved region (SCR) with a predicted stem-loop structure containing the geminivirus nonanucleotide sequence (TAATATTAC) [37]. The nucleotide sequence comparison showed that our sequences had nucleotide identities ranging from 88.1 to 98.7% with betasatellites from Cameroon, Egypt, Mali, Niger and Sudan. In a phylogenetic analysis based upon alignments of the complete betasatellites sequences, the BF betasatellite sequences segregated with betasatellites associated with okra begomoviruses from Africa ( Figure 2). Based on the recently established species demarcation threshold for betasatellites (78% nucleotide sequence identity; [38]), the betasatellites reported in this study belong to the same species Cotton leaf curl Gezira betasatellite (see Table 3 for betasatellites isolates description and accession numbers). Interestingly and under our knowledge, this species represent the only known betasatellite described in Africa on malvaceous and tomato plants. Associated to the absence of betasatellites in the New World and the existence of a high diversity of betasatellites in Asia, this result confirms that the centre of diversity appears to be in southern Asia [39].
The      Figure 3) and has an arrangement typical of characterized alphasatellites [40], containing a single ORF in the virion sense, an A-rich region with 51% adenine and a hairpin structure with the loop sequence TAGTATTAC.  [17], these alphasatellites represent isolates of a new species provisionally named Okra leaf curl Burkina Faso alphasatellite, clustering together in the phylogenetic tree ( Figure 3; see Table 3 for aphasatellites accession numbers). These particular alphasatellite isolates contain a single ORF in the virion sense and a predicted hairpin structure with the loop sequence CAGTATTAC. Further to the sequence description of the viral isolates, we were interested in their possible recombinant origin. Three distinct recombination events (a, b and c) were detected within the full genome sequences of CLCuGV isolates (Figure 4), using a large sequence alignment of geminiviruses [41]. The presence or absence of these recombination events has identified four genetic groups of viruses (G1 to G4; Figures 1 and  4). Recombination event b present in all CLCuGV isolates involves a major parent being related to the HoLCrV described in north Africa (Egypt; [9]) and a minor parent related to ToLCDiaV described in the south-west Indian Ocean Islands (Madagascar; [41]). Compared to events a and c based on intra-strain recombination, event b seems to be more ancient. The recombination events a and c specific to isolates G1, G3 and G4 have been characterized in Burkina Faso and in Niger and appear to represent a specific geographic signature. The distribution of the recombination breakpoints observed here confirm the existence of recombination hot spots over the intergenic region (IR) and the centre of C1 ORF (Figure 4) as described by Lefeuvre et al. [41]. The recombination event c of Figure 4 Recombinant regions (a, b and c) detected within the African isolates of CLCuGV sequences using RDP3. Four genetic groups (G1 to G4) have been defined on the presence or absence of recombination events. The genome at the top of the figure corresponds to the schematic representation of sequences below. Region coordinates are nucleotide positions of detected recombination breakpoints in the multiple sequence alignment used to detect recombination. Wherever possible, parental sequences are identified. "Major" and "Minor" parents are sequences that were used, along with the indicated recombinant sequence, to identify recombination. Whereas for each identified event the minor parent is apparently the contributor of the sequence within the indicated region, the major parent is the apparent contributor of the rest of the sequence. Note that the identified "parental sequences" are not the actual parents but are simply those sequences most similar to the actual parents in the analysed dataset. Recombinant regions and parental viruses were identified using the RDP (R), GENECONV (G), BOOTSCAN (B), MAXIMUM CHI SQUARE (M), CHIMAERA (C), SISTER SCAN (S) and 3Seq (T) methods. Whereas upper case letters imply a method detected recombination with a multiple comparison corrected P-value < 0.01, lower case letters imply the method detected recombination with a multiple comparison corrected P-value <0.05 but > = 0.01. isolates G3 and G4 covers the N terminus of the replication associated protein (Rep) which contains the iteron-related domain (IRD) [42]. This domain is involved in the specificity of interaction with iterated DNA motifs (iterons) of the geminivirus origin of replication (ori), functioning as essential elements for specific virus replication. Since the IRD domain of G3 and G4 isolates (MAPTKKFRINSKNYFL) is different from the IRD domains of G1 and G2 isolates (MPPSKRFLINA-KNYFL or MPFGTHYILSTDILER), the biological aspects of recombination events should be investigated in the future.
In conclusion, in Burkina Faso OLCD is mainly caused by a single begomovirus species and a complex of beta and alpha satellite species, contrary to what happens in the neighbouring countries Mali and Niger (respectively, [5,4]). Taken together, the current molecular results highlight the complex aetiology of the OLCD in Africa and the need for further investigations.