Experimental evidence indicating that mastreviruses probably did not co-diverge with their hosts
© Harkins et al; licensee BioMed Central Ltd. 2009
Received: 05 May 2009
Accepted: 16 July 2009
Published: 16 July 2009
Despite the demonstration that geminiviruses, like many other single stranded DNA viruses, are evolving at rates similar to those of RNA viruses, a recent study has suggested that grass-infecting species in the genus Mastrevirus may have co-diverged with their hosts over millions of years. This "co-divergence hypothesis" requires that long-term mastrevirus substitution rates be at least 100,000-fold lower than their basal mutation rates and 10,000-fold lower than their observable short-term substitution rates. The credibility of this hypothesis, therefore, hinges on the testable claim that negative selection during mastrevirus evolution is so potent that it effectively purges 99.999% of all mutations that occur.
We have conducted long-term evolution experiments lasting between 6 and 32 years, where we have determined substitution rates of between 2 and 3 × 10-4 substitutions/site/year for the mastreviruses Maize streak virus (MSV) and Sugarcane streak Réunion virus (SSRV). We further show that mutation biases are similar for different geminivirus genera, suggesting that mutational processes that drive high basal mutation rates are conserved across the family. Rather than displaying signs of extremely severe negative selection as implied by the co-divergence hypothesis, our evolution experiments indicate that MSV and SSRV are predominantly evolving under neutral genetic drift.
The absence of strong negative selection signals within our evolution experiments and the uniformly high geminivirus substitution rates that we and others have reported suggest that mastreviruses cannot have co-diverged with their hosts.
It is becoming increasingly apparent that single-stranded DNA (ssDNA) viruses such as the anelloviruses [1–3], geminiviruses [4–9], parvoviruses [10–12] and microviruses [13, 14] are probably evolving as rapidly as many RNA viruses . While the inherent infidelities of RNA polymerases and reverse transcriptases drive the high rates of evolution seen in RNA viruses, all known ssDNA viruses replicate using presumably high-fidelity host DNA polymerases. It is surprising, therefore, that the basal mutation rates of ssDNA viruses are orders of magnitude higher than those of their hosts .
The best supported, non-exclusive theories that have so far been put forward to explain discrepancies between basal mutation rates of ssDNA viruses and their hosts are that: (1) when in a ssDNA state the genomes of these viruses are subject to mutagenic processes that are less frequently experienced in dsDNA ; (2) geminivirus genomes, and those of some other ssDNA viruses, are not sufficiently methylated such that normal host mechanisms of mismatch repair may not function during their replication [16, 17]; and (3) when replicating, ssDNA virus genomes are only transiently double stranded such that when errors occur they are not efficiently repaired by host base-excision pathways .
Evidence is mounting that the rapid evolution of geminiviruses is, at least in part, driven by mutational processes that act specifically on ssDNA. Controlled evolution experiments involving Maize streak virus (MSV), a geminivirus in the Mastrevirus genus, have revealed a strand specific G → T mutation bias that is possibly attributable to oxidative damage to guanines . Similarly, analyses of nucleotide substitution biases in natural tomato and cassava infecting geminivirus isolates (in the Begomovirus genus) have, in addition to similar G → T mutation biases, identified overrepresentations of C → T and G → A transitions. These biases indicate that geminivirus DNA may experience elevated rates of spontaneous damage while in a single stranded state [4, 5]. Although it remains to be determined in a larger scale study whether an excess of C → T and G → A transitions have occurred during mastrevirus evolution, all these studies are consistent with the hypothesis that viral ssDNA is subjected to greater oxidative stresses (such as oxidative deamination of guanine and cytosine or oxidation of guanine to 8-oxoguanine) compared to host dsDNA.
High geminivirus basal mutation rates do not, however, necessarily imply that these viruses are also evolving rapidly. Rather than simply being the rate at which mutations occur, evolutionary rates are also influenced by (1) the rate at which deleterious mutations are purged from a population by negative, or purifying, selection, (2) the efficiency with which advantageous adaptive mutations are fixed in a population by positive, or diversifying, selection and (3) the rate at which neutral mutations (i.e. those mutations with no effect on fitness) are fixed in or lost from a population by random genetic drift. Adopting the convention of Duffy et al.  we differentiate between the biochemical or basal rate at which mutations arise (mutation rate, measured in rounds of genomic replication or units of time), and the usually slower rate at which mutations accumulate in wild populations evolving under natural selection (substitution rate, usually measured in years).
Geminiviruses have either one (monopartite, species in the Begomovirus, Mastrevirus, Topocuvirus and Curtovirus genera) or two (bipartite, species in the Begomovirus genus) ~2.7 Kb genome components. These compact genomes are among the smallest of any known viruses and encode only a small number of usually multifunctional and often overlapping genes . Mastreviruses such as MSV and Wheat dwarf virus (WDV), for example, express only four distinct proteins: a movement protein (MP), a coat protein (CP), a replication associated protein (Rep) and a RepA protein, expressed from an alternative spliceform of the rep gene transcript such that it shares ~70% of its amino acid sequence with Rep . The compactness of mastrevirus genomes is further emphasised by the fact that, with the exception of MP, these proteins have multiple known functions . Given that many, if not most, mutations that occur in such compact genomes will be at least slightly deleterious and therefore subject to negative selection, it is expected that mastrevirus nucleotide substitution rates will be at least slightly lower than their basal mutation rates.
It is currently a matter of dispute as to how much lower geminivirus substitution rates are relative to their basal mutation rates. Experimental analyses of highly adaptive point mutations [19–21] and mutation frequencies in genomes sampled after 30–60 days of replication within infected plants [6, 8, 22] imply that the basal mutation rates of geminiviruses are in excess of 10-3 mutations per site per year (mut/site/year). Correspondence between the phylogenies of certain mastrevirus species and those of their grass hosts has, however, prompted speculation that mastreviruses may have co-diverged with grasses and that their substitution rates may therefore be as low as 10-8 substitutions per site per year (subs/site/year; ) – i.e. ten thousand times lower than their basal mutation rates. It is possible that very short-term evolution experiments (<0.2 years) produce inflated estimates of long-term substitution rates, because they are measuring adaptation (positive selection) to a novel host (e.g., [6, 9]), or have not allowed sufficient time for negative selection to have effectively purged mildly deleterious mutations . However, the co-divergence hypothesis demands a long-term substitution rate four orders of magnitude lower than the approximately 2 × 10-4 to 7 × 10-4 subs/site/year rates that have been estimated in short-term (<5 years) evolution experiments [7, 9] and longer term (over tens of years) substitution rates estimated from temporally structured tomato and cassava infecting begomovirus datasets sampled from nature [4, 5].
The ten-thousand-fold discrepancy between directly-calculated geminivirus substitution rate estimates and those implied by the co-divergence hypothesis is difficult to reconcile. It has been suggested that different evolutionary forces are operating over short- (less than one year), long- (tens of years) and very long-term (thousands of years) evolutionary timescales: even though point mutations rapidly accumulate in geminiviruses over observable timescales, over the millennia mastreviruses experience an almost complete absence of positive selection and neutral genetic drift, coupled with almost unfalteringly efficient negative selection . This argument relies on the strange circumstance of mastrevirus species having had long co-evolutionary histories within their hosts, but without their having engaged in arms races with those hosts.
Here we describe a series of evolution experiments involving MSV and Sugarcane streak Réunion virus (SSRV – a mastrevirus species closely related to MSV ) that lasted between 6 and 32 years. Our results provide extensive additional support for the hypothesis that, as with other geminiviruses, MSV and SSRV basal mutation rates are possibly elevated by unrepaired oxidative damage inflicted on ssDNA. We additionally show that, contrary to expectations under the co-divergence hypothesis, neutral genetic drift and not negative selection appears to be a dominant process determining the fate of new mutations.
Results and discussion
Long term mastrevirus evolution experiments
In 1971, a sugarcane plant presenting with foliar streak symptoms later attributed to SSRV  was collected in Mauritius. In 1976, viruses were leafhopper transmitted from this plant to both a plant of the sugarcane variety H44-3098 and the wild grass species Coix lachryma-jobi. Both sugarcane and Coix plants were maintained in an insect free glasshouse over the next 32 years at the Mauritius Sugar Industry Research Institute. At some time between 1977 and 1986 viruses were retransmitted by leafhopper from the Coix to sugarcane, and in 1987 leaf samples from this sugarcane plant were shipped to Institut de Biologie Moleculaire et Cellulaire du CNRS in France, where total DNA was extracted and stored until 2008. In 1984, two stalks cut from the H44-3098 plant were sent to the John Innes Centre in the United Kingdom where they were planted and maintained until 1997. Total DNA was extracted from one of these plants in 1991, and symptomatic leaves from the other were cut in 1997 and stored at -80°C until DNA was extracted from them in 2007. In 1989, leaf samples from the H44-3098 plant were also shipped to the University of Cape Town in South Africa where total DNA was extracted and stored until 2008. Finally, in 2008 we obtained total leaf DNA samples from the originally infected Coix and H44-3098 plants in Mauritius.
In an unrelated experiment, two naturally-infected perennial Digitaria sp grasses with mild streak symptoms (later attributed to the MSV-strains MSV-B and MSV-F in each plant, respectively ) were maintained under insect-free conditions at the John Innes Centre in the United Kingdom between 1984 and 1997 . Total genomic DNA was isolated and stored from each of these plants in 1991 and again in 1997.
Breakdown of full genome sequences sampled during three separate evolution experiments and the results of neutrality tests indicate no significant deviation from neutral evolution in any of the samples.
Neutrality tests a
Fu and Li's F*
The amount of genetic variability observed in the two six-year-long experiments involving MSV-F and MSV-B in Digitaria spanned that previously observed in a five- year experiment involving MSV-B in sugarcane . It was immediately apparent, however, that the virus population within the MSV-B infected plant was substantially less diverse over the course of the experiment than that within the MSV-F infected plant (Figure 1b).
It is important to point out that none of the three evolution experiments was initiated using cloned viruses and that we have no samples that were taken within two years of the start of the experiments. Therefore, the diverse virus populations within the infected plants could have arisen through rapid evolutionary rates, or as a result of the plants having been co-infected with divergent virus lineages – a situation that may have resulted in lineage sorting or founder effects.
However, when we compared the phylogenetic relationships of virus genomes sampled at consecutive time-points from individual plants (represented by blue and orange coloured branches on the trees in Figure 1b), we noted that samples from later time-points (orange branches in Figure 1b) were generally situated further from the presumed root-nodes than were those sampled at earlier time-points (blue branches in Figure 1b). Such a temporally-structured phylogenetic pattern indicated that, despite our knowing neither the precise genotypes of the viruses that initiated our experimental populations, nor the exact time of infection, we should still be able to accurately infer nucleotide substitution rates from our data.
Geminiviruses have uniformly high nucleotide substitution rates
These rates are slightly lower than those of ~7 × 10-4 subs/site/year previously estimated for MSV-A, MSV-B and MSV-C in one- to five-year long evolution experiments involving cloned virus genomes . They are, however, approximately equivalent to those estimated within a natural temporally-structured tomato infecting begomovirus dataset employing the same methodology used here (Figure 2; ). Our results in relation to these other studies are entirely unsurprising: it is expected that substitution rate estimates from shorter term evolution experiments will be closer to the basal mutation rate than those estimated either from longer term experiments, or from natural sequences sampled over a number of decades .
Irrespective of the demographic and clock models used, the mean estimated date of the 1984 sugarcane lineage split was within 4 years of the actual date, and the estimated mean date of the sugarcane to Coix transmission event was within 8 years of the actual date. In all cases the 95% HPD intervals included the actual dates (Figure 3). The constant size and exponential growth strict-clock models provided a significantly better fit to the data than the relaxed-clock models while the opposite pattern was observed for the Bayesian skyline plot model (see additional file 1). The exponential growth and constant population size strict molecular clock models both fitted the data equally well however, with the former recovering a marginally higher likelihood than the latter model. These models yielded more accurate estimates of the 1976 sugarcane to Coix transmission event and the 1984 sugarcane lineage split (within five and one years of the actual dates, respectively), as well as narrower 95% HPD intervals.
These fairly-precise recapitulations of a known bifurcation and a known trifurcation in our experiment serve as independent confirmation that, at the very least, our substitution rate estimates for SSRV using the strict-clock model (between 2.27 × 10-4 and 2.86 × 10-4 subs/site/year) were reasonably accurate irrespective of the demographic models used.
The SSRV results are the first substitution rate estimates from a plant virus maintained in laboratory/greenhouse settings that allowed the same heterochronous sampling over the tens of years that are used to estimate rates from field-isolated viruses. The agreement between the laboratory substitution rate of a mastrevirus and the field substitution rate of begomoviruses (Figure 2) indicates that the different, potentially relaxed, selection pressures viruses face in greenhouse-maintained plants do not lead to different rates of evolution.
Specific nucleotide substitution biases are conserved across the geminiviruses
Analyses of virus genome sequences both sampled from nature and in controlled evolution experiments have indicated that higher than expected geminivirus mutation rates are at least partially attributable to the susceptibility of ssDNA to oxidative damage [4, 5, 9]. The signatures of such damage are elevated rates of C → T, G → A and G → T mutations. Whereas ssDNA is known to be more prone than dsDNA to the oxidative deamination reactions that cause C → T and G → A transitions [30–32], it is also more prone to reactions that convert guanine to 8-oxoguanine and cause G → T transversions [33–35].
To determine whether specific types of mutation occur more or less frequently during MSV and SSRV evolution than could be accounted for by chance, we collectively considered all 238 mutations observed to have occurred during our three evolution experiments using the chi square test outlined by van der Walt et al. . This analysis revealed that whereas C → T, G → A and G → T mutations were indeed significantly over-represented (chi square p = 4 × 10-4, 7 × 10-3, and < 1 × 10-5, respectively), C → A, T → A and T → G transversions were significantly under-represented (chi square p = 7 × 10-3, 2 × 10-2 and < 4 × 10-3 ; Figure 4).
All four possible transition mutations, including C → T and G → A, are generally thought to occur at higher frequencies than the eight possible transversion mutations . Indeed, our results across all the evolution experiments indicate individual transition substitutions occurred at approximately twice the frequency of individual transversion substitutions (Figure 4). Accordingly, when we restricted our chi square test to include only either transitions or transversions the frequency of G → A mutations was no longer significantly higher than that of the other transition mutations. Similarly, whereas the frequency of T → G mutations was not significantly lower than those of other transversion mutations, the frequency of A → G mutations was inferred to be significantly lower than those of other transition mutations. However, the C → T and G → T substitutions remained significantly higher than expected and the frequencies of the C → A and T → A substitutions still lower than expected.
Despite the relatively good agreement of overrepresented substitutions between begomovirus studies [4, 5] and our evolution experiments, there isn't perfect concordance among substitution biases in different geminiviruses. For example, whereas both our study and a Tomato yellow leaf curl virus (TYLCV) study indicate that T → G substitutions are significantly underrepresented during the evolution of some geminiviruses, this type of substitution has been significantly over-represented during East African cassava mosaic virus evolution .
Substitution biases are strand specific
As only the virion strands of geminivirus genomes spend significant time in a single stranded state, an additional signature that would indicate that ssDNA is more prone than dsDNA to mutation should be the existence of strand specific substitution biases. While the overrepresented C → T and G → A transitions are likely occurring on the virion strand, these two transitions are complementary and cannot be used to determine strand-specificity. However, G → T substitutions occur at a higher frequency than C → A substitutions (i.e. the complement of G → T) providing clear evidence either that: (1) C → A mutations occur much more frequently on the complementary strand than they do on the virion strand; or (2) G → T mutations occur much more frequently on the virion strand than they do on the complementary strand. It is possible to choose between these two alternatives if, as is the case with geminiviruses, only one strand spends an appreciable amount of time in a single-stranded state.
We devised a likelihood ratio test to determine whether there was significant evidence of a strand-specific substitution bias in our three evolution experiments. This simply involved determining the relative likelihoods of observing our data given either (1) a six rate substitution matrix in which complementary mutations were constrained to occur at the same rate (i.e. a situation with no strand specific substitution biases) or (2) a twelve rate substitution matrix in which all substitution types were free to occur at different rates.
For both the SSRV and MSV-F experiments this test inferred the existence of significant strand specific nucleotide substitution biases (chi square p = 8.5 × 10-3 and 5.7 × 10-4 respectively) strongly indicative of mutational processes operating specifically on ssDNA. Possibly because of the low numbers of polymorphisms considered, the test failed to reveal any such evidence for the MSV-B dataset.
Such strand specific substitution biases taken together with increased rates of specific substitutions such as G → T, C → T and G → A amongst both mastrevirus and begomovirus datasets indicate very strongly that (1) all geminiviruses probably experience roughly equivalent mutagenic stresses and (2) high geminivirus substitution rates are, in part, driven by shared mutagenic processes independent of polymerase error, operating on ssDNA.
Negative and positive selection against a background of neutral genetic drift
The co-divergence hypothesis of Wu et al.  demands that, over thousands of years, at least 99.999% of all arising mutations and 99.99% of all substitutions that appear dominant in populations over tens of years are ultimately purged from mastrevirus populations by negative selection. Although it is impossible to directly test this hypothesis by running controlled evolution experiments over such long time-periods, it is possible to directly test this supposition by looking for the predicted signal of overwhelming negative selection in our evolution experiments.
Site-by-site signals of positive and negative selection acting on movement protein (mp), coat protein (cp) and replication associated protein (rep) gene codons during the SSRV evolution experiment
Motif/domain (site underlined where relevant)
C-terminal boundary of hydrophobic domain
DNA Binding domain
DNA Binding domain
DNA Binding domain
DNA Binding domain
DNA Binding domain
RCR motif I (FLTYPHC)
Rep-Rep oligomerisation domain (ASKLFPD TVEEY)
In fact, the degree of negative selection implied by the co-divergence hypothesis would be expected to produce a situation in which all mutants would only be detectable for a short period of time after they arise – thereafter they would be expected to become extinct due to their inability to compete effectively with wild-type viruses. Under such conditions the overwhelming majority of detectable mutations should be unique to the mutant genomes that carry them. This pattern of genetic variation is generally detected using population genetic neutrality tests such as Tajima's D  or Fu and Li's F* statistics  that describe the representation in datasets of mutations that are found only in individual sequences relative to those that are found in multiple sequences. If these statistics have a significantly negative value for a group of sequences randomly sampled from a population of constant size, it implies that the accumulation of mutations within the sequences was more strongly influenced by negative selection than it was by neutral genetic drift.
We were unable to find any significant deviation from zero for either Tajima's D or Fu and Li's F* statistics in any of the virus populations we sampled during our evolution experiments (Table 1). Although negative scores for both these statistics for most of the populations imply that sequences were subjected to some degree of negative selection, it is apparent that random genetic drift is the dominant process determining the relative frequencies of particular mutations in these populations. For example, although only one sequence differed from all the rest at 53 out of 128 variable nucleotide sites in the SSRV dataset, the remainder were sites at which mutations were present in multiple sequences and were therefore not significantly deleterious.
From our evolution experiment data it is very simple to directly infer the action of genetic drift and/or positive selection acting on mutations by tracking changes in the population-wide frequency of particular mutants over time. For example, in the SSRV experiment, we observed 8 instances where mutations that were present in <25% of sequences sampled in 1989, were present in 100% of sequences sampled from the same plant in 2008 – these mutations could only have reached fixation by 2008 through either genetic drift or positive selection. Taken collectively, all our data clearly indicate the mutations that arose during our controlled evolution experiments were not uniformly subject to anywhere near the degree of negative selection required by the co-divergence hypothesis.
Congruent phylogenies are necessary, but not sufficient, to demonstrate virus-host coevolution
As has been pointed out by the originators of the mastrevirus-host co-divergence hypothesis, it very difficult to prove virus-host co-speciation [23, 40]. For example, it is usually impossible to confirm that phylogenetic signals superficially indicative of co-divergence are not instead caused by other epidemiological and ecological factors [see  for specific examples of how these can be confused with co-divergence]. Mismatched substitution rates between viruses and their hosts have provided evidence against some long-assumed co-divergence pairs, including hantaviruses and their rodent hosts  and JC virus, whose phylogeny had been used as a proxy for early human migration patterns . For example, the close relationships between Human immunodeficiency virus and other closely related lentiviruses isolated from simians are also superficially indicative of co-divergence. Despite this it is now clear that the apparent correspondence of such virus and host relationships is as a result of viruses being more capable of adapting to new host species if the new host species are genetically similar to their old host species . The ability of geminiviruses to adapt rapidly to novel hosts, and the polyphagy of their insect vectors also argue both against the hypothesis of widespread co-speciation among these viruses and in favour of the hypothesis that apparent co-speciation signals simply reflect the fact that genetically more similar viruses just happen to infect, and become specifically adapted to, genetically more similar hosts. The balance of evidence therefore still strongly favours geminiviruses having RNA-virus-like substitution rates that exclude the possibility of their having co-diverged with their hosts.
We have used long-term evolution experiments to investigate the credibility of recent suggestions that mastreviruses may have co-diverged with their host species over millions of years. We have shown that both the mutational processes and the substitution rates they drive are conserved across the geminivirus family, and are orders of magnitude higher than the rates implied by the co-divergence hypothesis. Additionally, we have provided evidence against potent negative selection as a plausible mechanism by which very-long-term mastrevirus substitution rates could be more than 10,000 fold lower than both their basal mutation rates and directly measured substitution rates. While some of the genetic variation in our three evolution experiments is under statistically significant positive selection, much of it appears nearly neutral. In short, all available evidence suggests that mastrevirus evolution is no more severely constrained by negative selection than is that of other rapidly evolving viruses .
A sugarcane plant presenting with streak symptoms was collected in 1971 from a multiplication plot at Médine, Mauritius, and was used in 1976 as a source of inoculum to infect both a sugarcane plant (variety H44-3098) and a Coix lacryma-jobi plant. These were maintained in an insect free glass house for the next 32 years at the Mauritius Sugar Industry Research Institute. Virus was retransmitted from the Coix plant to a second sugarcane plant at some time between 1977 and 1986. Samples were taken from the original H44-3098 plant in 1989 and 2008; from the second sugarcane plant in 1987; and from the Coix plant in 2008. In 1984 two separate cuttings from the H44-3098 plant were taken and maintained separately – samples were taken from one of these cuttings in 1991 and from the other in 1997.
Two Digitaria plants with mild streak-like symptoms were collected in Rwanda and Burundi by R.H. Markham (the then plant pathologist at the CAB International Institute of Biological Control, Kenya) in 1984. After transferring them to the John Innes Centre in Norwich, UK, viruses were leafhopper transmitted from these plants to Digitaria sanguinalis. These two newly infected D. sanguinalis plants were maintained under insect free conditions between 1984 and 1997 with samples being taken from each plant in both 1991 and 1997.
Isolation, cloning and sequencing of viral DNA
Total DNA was isolated from preserved sugarcane or Digitaria samples by either a modified CTAB method [43, 44] or the Extract-N-Amp™ Plant (Sigma-Aldrich) method as described by Shepherd et al. . The virus was amplified using phi29 DNA polymerase (TempliPhi™, GE Healthcare, USA; ), the amplified concatemers were digested with Sal I (sugarcane virus isolates) or Bam HI (Digitaria virus isolates) to yield ~2.7-kb linearised viral genomes which were cloned into pGEM3Zf+ (Promega Biotech) cloning vector. Both strands of cloned genomes were commercially sequenced (Macrogen Inc., Korea) by primer walking. Sequences were assembled and edited using DNAMAN (version 5.2.9; Lynnon Biosoft) and MEGA (version 4 ).
Detection of recombination and phylogenetic tree construction
Sequences from all three evolution experiments were tested for evidence of recombination using LDHAT and various methods implemented in the program RDP3. These analyses failed to detect any significant evidence of recombination in our datasets. Phylogenetic trees were constructed using PHYML with best fit models automatically selected by RDP3.
Estimation of nucleotide substitution rates
A co-estimate of the nucleotide substitution model parameters, phylogeny and time to the most recent common ancestor (tMRCA) was obtained for the MSV-B, MSV-F and SSRV datasets using the Bayesian Markov chain Monte Carlo (MCMC) method implemented in BEAST v1.4.8 . Six different coalescent demographic models were employed including both parametric (constant population size, exponential population growth) and non-parametric (Bayesian skyline plot; BSP) models, with both a strict and relaxed (uncorrelated LogNormal prior) molecular clock.
For each evolutionary model, two independent runs of length 5 × 107 steps in the Markov chain were performed using BEAST and checked for convergence using TRACER v1.4 . The estimated sample sizes for each run were almost always > 200 indicating sufficient mixing of the Markov chain and parameter sampling. When similar results were produced from independent runs of the Markov chain, the log files were combined with the program LOGCOMBINER v1.4.7 available in the BEAST package .
Demographic and clock model comparisons
Models were compared by calculating a measure known as the Bayes factor, which is the ratio of the marginal likelihoods of the two models being compared [51, 52]. Bayes factors allow the comparison of non-nested models (such as the non-parametric Bayesian skyline plot vs. the parametric constant or exponential growth demographic models) that cannot be validly compared using the mean log posterior probabilities.
Analysis of nucleotide substitution biases
where π j is the equilibrium frequency of nucleotide j assumed to be in equilibrium and constant across lineages; and θ ij the instantaneous rate of substitution of nucleotide i with nucleotide j. These models typically assume time-reversibility such that θ ij = θ ij . Here we use standard model comparison techniques to compare reversible with non-reversible models of evolution as applied to mastreviruses. We implemented a standard GTR model (where forward and reverse substitutions are constrained to have the same rate, for example, C → T substitution rates must be the same as T → C substitution rates) and a different non-reversible model of evolution with six rates, in which rates are shared by complementary substitutions (e.g. C → T rates are constrained to be the same as G → A rates). Both six rate models are nested within the non-reversible twelve-rate model (where all 12 substitutions are free to occur at different rates), and thus a likelihood ratio test with degrees of freedom equal to the difference in the number of parameters is appropriate for model comparisons between each of the six rate models and the 12 rate model. Phylogenetic models and statistical tests were implemented in the HYPHY batch language  and are available from the authors on request.
We reconstructed ancestors at internal nodes using maximum likelihood and a non-reversible substitution model, and counted substitutions along branches of the phylogeny using HYPHY. The relative counts of each mutation type over the 32 years of the SSRV experiment and the 6 years of the MSV-B and MSV-F experiments were compared using the 2 × 2 chi square test described by van der Walt et al. . This takes into account nucleotide composition biases but not inherent differences in rates of transition vs. transversion mutations. We therefore also used a modified version of this test where transitions and transversions were treated separately such that, for example, the number of times that a particular transversion mutation was estimated to have occurred was only compared to the collective number of times that the seven other transversion mutation types were estimated to have occurred.
Site by site analysis of natural selection
We used three methods implemented on the DATAMONKEY webserver  that examine ratios of non-synonymous (dN) and synonymous mutations (dS) to identify signals of positive (dN > dS) and negative selection (dN <dS) operating on individual codons within genes. Single likelihood ancestor counting (SLAC) infers selection by comparing observed rates of non-synonymous and synonymous mutation at each codon to that expected under a binomial distribution (SLAC). Fixed effects likelihood (FEL) compares model fit in which non-synonymous and synonymous mutations are constrained to be equal, to an unconstrained model (FEL). Random Effects Likelihood (REL) methods approximate the distribution of non-synonymous to synonymous rates across all sites into classes, and calculate the posterior probability that each site belongs to each of the rate classes. Since these methods perform better on larger data sets  we only conducted these analyses on sequences obtained during the SSRV experiment. We tested alignments of genes for the movement protein (mp, 79 sequences, 327 nucleotides long), coat protein (cp, 78 sequences, 741 nucleotides long) and the replication-associated protein (rep, 80 sequences, 888 nucleotides long, excluding the alternate reading frame overlap between rep and repA codons 217–282). The number of sequences varied between alignments because we excluded sequences with apparent indels or premature stop codons.
Tajima's D and Fu and Li's F* statistics [38, 39] were calculated and tested for significance using the program DNASP version 4.0 . Between 8 and 20 full length genomes randomly cloned from each of the six SSRV samples and the 2 MSV-B and MSV-F samples were tested. All the samples from each of the SSRV, MSV-B and MSV-F experiments were also analysed together. Both D and F* statistics identify the contribution of rare variants to total genetic diversity. Significantly negative statistics are indicative of an excess of rare variants and are a signature of very strong negative selection against the survival of mutant genomes [38, 39].
Bayesian skyline plot
- cp :
coat protein gene
East African cassava mosaic virus
general time reversible
highest probability density
Markov chain Monte Carlo
- mp :
movement protein gene
Maize streak virus
replication associated protein
- rep :
replication associated protein gene
single stranded DNA
Sugarcane streak Réunion virus
time to the most recent common ancestor
Tomato yellow leaf curl China virus
Tomato yellow leaf curl virus
Wheat dwarf virus.
The authors wish to thank the South African National Research Foundation (NRF) for funding this research. AV was supported by the Carnegie Corporation of New York, DPM was supported by the NRF and the Wellcome Trust. SD was supported by US NSF DBI 0630307. GH was supported by the National Research Foundation of South Africa and the Atlantic Philanthropies Grant (number 62302). WD is supported by National Institutes of Health (AI47745) and by a University of California, San Diego Center for AIDS Research/NIAID Developmental Award.
- Biagini P: Human circoviruses. Vet Microbiol 2004, 98: 95-101.View ArticlePubMedGoogle Scholar
- Gallian P, Biagini P, Attoui H, Cantaloube JF, Dussol B, Berland Y, de Micco P, de Lamballerie X: High genetic diversity revealed by the study of TLMV infection in French hemodialysis patients. J Med Virol 2002, 67: 630-635.View ArticlePubMedGoogle Scholar
- Umemura T, Tanaka Y, Kiyosawa K, Aller HJ, Shih JW: Observation of positive selection within hypervariable regions of a newly identified DNA virus (SEN virus). FEBS Lett. 2002,510(3):171-174.View ArticlePubMedGoogle Scholar
- Duffy S, Holmes EC: Phylogenetic evidence for rapid rates of molecular evolution in the single-stranded DNA begomovirus tomato yellow leaf curl virus. J Virol 2008, 82: 957-965.PubMed CentralView ArticlePubMedGoogle Scholar
- Duffy S, Holmes EC: Validation of high rates of nucleotide substitution in geminiviruses: Phylogenetic evidence from East African cassava mosaic viruses. J Gen Virol 2009, 90: 1539-47.PubMed CentralView ArticlePubMedGoogle Scholar
- Ge LM, Zhang JT, Zhou XP, Li HY: Genetic structure and population variability of Tomato yellow leaf curl China virus. J Virol 2007, 81: 5902-5907.PubMed CentralView ArticlePubMedGoogle Scholar
- Isnard M, Granier M, Frutos R, Reynaud B, Peterschmitt M: Quasispecies nature of three Maize streak virus isolates obtained through different modes of selection from a population used to assess response to infection of maize cultivars. J Gen Virol 1998, 79: 3091-3099.View ArticlePubMedGoogle Scholar
- Urbino C, Thébaud G, Granier M, Blanc S, Peterschmitt M: A novel cloning strategy for isolating, genotyping and phenotypinggenetic variants of geminiviruses. Virol J 2008, 5: 135.PubMed CentralView ArticlePubMedGoogle Scholar
- Walt E, Martin DP, Varsani A, Polston JE, Rybicki EP: Experimental observations of rapid Maize streak virus evolution reveal a strand-specific nucleotide substitution bias. Virol J 2008, 5: 104.PubMed CentralView ArticlePubMedGoogle Scholar
- Lopez-Bueno A, Villarreal LP, Almendral JM: Parvovirus variation for disease: a difference with RNA viruses? Curr Top Microbiol Immunol 2006, 299: 349-370.PubMedGoogle Scholar
- Shackelton LA, Holmes EC: Phylogenetic evidence for the rapid evolution of human B19 erythrovirus. J Virol 2006, 80: 3666-3669.PubMed CentralView ArticlePubMedGoogle Scholar
- Shackelton LA, Parrish CR, Truyen U, Holmes EC: High rate of viral evolution associated with the emergence of carnivore parvovirus. Proc Natl Acad Sci USA 2005, 102: 379-384.PubMed CentralView ArticlePubMedGoogle Scholar
- Drake JW: A constant rate of spontaneous mutation in DNA-based microbes. Proc Natl Acad Sci USA 1991, 88: 7160-7164.PubMed CentralView ArticlePubMedGoogle Scholar
- Raney JL, Delongchamp RR, Valentibe CR: Spontaneous mutant frequency and mutation spectrum for gene A of phi X174 growth in. E coli Environ Mol Mutag 2004, 44: 119-127.View ArticleGoogle Scholar
- Duffy S, Shackelton LA, Holmes EC: Rates of evolutionary change in viruses: patterns and determinants. Nat Rev Genet. 2008,9(4):267-276.View ArticlePubMedGoogle Scholar
- Roossinck MJ: Mechanisms of plant virus evolution. Annu Rev Phytopathol 1997, 35: 191-209.View ArticlePubMedGoogle Scholar
- Su S-S, Lahue RS, Au KG, Modrich P: Mispair specificity of methyl-directed DNA mismatch correction in vitro . J Biol Chem 1988, 263: 6829-6835.PubMedGoogle Scholar
- Jeske H: Geminiviruses. In TT Viruses: The Still Elusive Human Pathogens. Edited by: de Villers E-M, zur Hausen H. Berlin: Springer Verlag; 2009:185-226.View ArticleGoogle Scholar
- Arguello-Astorga G, Ascencio-Ibáñez JT, Dallas MB, Orozco BM, Hanley-Bowdoin L: High-frequency reversion of geminivirus replication protein mutants during infection. J Virol 2007, 81: 11005-11015.PubMed CentralView ArticlePubMedGoogle Scholar
- Shepherd DN, Martin DP, Varsani A, Thomson JA, Rybicki EP, Klump HH: Restoration of native folding of single-stranded DNA sequences through reverse mutations: an indication of a new epigenetic mechanism. Arch Biochem Biophys 2006, 453: 108-122.View ArticlePubMedGoogle Scholar
- Shepherd DN, Martin DP, McGivern DR, Boulton MI, Thomson JA, Rybicki EP: A three-nucleotide mutation altering the Maize streak virus Rep pRBR-interaction motif reduces symptom severity in maize and partially reverts at high frequency without restoring pRBR-Rep binding. J Gen Virol 2005, 86: 803-813.View ArticlePubMedGoogle Scholar
- Walt E, Rybicki EP, Varsani A, Polston JE, Billharz R, Donaldson L, Monjane AL, Martin DP: Rapid host adaptation by extensive recombination. J Gen Virol 2009, 90: 734-746.PubMed CentralView ArticlePubMedGoogle Scholar
- Wu B, Melcher U, Guo X, Wang X, Fan L, Zhou G: Assessment of codivergence of Mastreviruses with their plant hosts. BMC Evol Biol 2008, 8: 335.PubMed CentralView ArticlePubMedGoogle Scholar
- Holmes EC: Patterns of intra- and interhost nonsynonymous variation reveal strong purifying selection in degue virus. J Virol 2003, 77: 11296-11298.PubMed CentralView ArticlePubMedGoogle Scholar
- Bigarre L, Salah M, Granier M, Frutos R, Thouvenel J, Peterschmitt M: Nucleotide sequence evidence for three distinct sugarcane streak mastreviruses. Arch Virol 1999, 144: 2331-2344.View ArticlePubMedGoogle Scholar
- Varsani A, Shepherd DN, Monjane AL, Owor BE, Erdmann JB, Rybicki EP, Peterschmitt M, Briddon RW, Markham PG, Oluwafemi S, Windram OP, Lefeuvre P, Lett JM, Martin DP: Recombination, decreased host specificity and increased mobility may have driven the emergence of maize streak virus as an agricultural pathogen. J Gen Virol 2008, 89: 2063-2074.PubMed CentralView ArticlePubMedGoogle Scholar
- Pinner MS, Markham PG, Markham RH, Dekker L: Characterization of maize streak virus: description of strains; symptoms. Plant Path 1988, 37: 74-87.View ArticleGoogle Scholar
- Ramsell JNE, Boulton MI, Martin DP, Lindsten K, Valkonen JPT, Kvarnheden A: Studies on the host range of the barley strain of Wheat dwarf virus using an agroinfectious viral clone. Plant Path 2009, in press.Google Scholar
- Drummond AJ, Rambaut A: BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol 2007, 7: 214.PubMed CentralView ArticlePubMedGoogle Scholar
- Frederico LA, Kunkel TA, Shaw BR: A sensitive genetic assay for the detection of cytosine deamination: determination of rate constants and the activation-energy. Biochemistry 1990, 29: 2532-2537.View ArticlePubMedGoogle Scholar
- Caulfield JL, Wishnok JS, Tannenbaum SR: Nitric oxideinduced deamination of cytosine and guanine in deoxynucleosides and oligonucleotides. J Biol Chem 1998, 273: 12689-12695.View ArticlePubMedGoogle Scholar
- Xia X, Yuen KY: Differential selection and mutation between dsDNA and ssDNA phages shape the evolution of their genomic AT percentage. BMC Genet 2005, 6: 20.PubMed CentralView ArticlePubMedGoogle Scholar
- Kamiya H: Mutagenic potentials of damaged nucleic acids produced by reactive oxygen/nitrogen species: Approaches using synthetic oligonucleotides and nucleotides. Nucleic Acids Res 2003, 31: 517-531.PubMed CentralView ArticlePubMedGoogle Scholar
- Kalam MA, Basu AK: Mutagenesis of 8-oxoguanine adjacent to an abasic site in simian kidney cells: Tandem mutations and enhancement of G → T transversions. Chem Res Toxicol 2005, 18: 1187-1192.View ArticlePubMedGoogle Scholar
- Klapacz J, Bhagwat AS: Transcription promotes guanine to thymine mutations in the non-transcribed strand of an Escherichia coli gene . DNA Repair 2005, 4: 806-813.View ArticlePubMedGoogle Scholar
- Kosakovsky Pond SL, Frost SDW, Muse SV: HyPhy: hypothesis testing using phylogenies. Bioinformatics 2005, 21: 676-679.View ArticleGoogle Scholar
- Kimura M: Estimation of evolutionary distances between homologous nucleotide sequences. Proc Natl Acad Sci USA 1981, 78: 454-458.PubMed CentralView ArticlePubMedGoogle Scholar
- Tajima F: Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 1989, 123: 585-595.PubMed CentralPubMedGoogle Scholar
- Fu YX, Li WH: Statistical tests of neutrality of mutations. Genetics 1993, 133: 693-709.PubMed CentralPubMedGoogle Scholar
- Holmes EC: Evolutionary history and phylogeography of human viruses. Annu Rev Microbiol 2008, 62: 307-328.View ArticlePubMedGoogle Scholar
- Ramsden C, Holmes EC, Charleston MA: Hantavirus Evolution in Relation to Its Rodent and Insectivore Hosts: No Evidence for Codivergence. Mol Biol Evol 2009, 26: 143-153.View ArticlePubMedGoogle Scholar
- Shackelton LA, Rambaut A, Pybus OG, Holmes EC: JC virus evolution and its association with human populations. J Virol 2006, 80: 9928-9933.PubMed CentralView ArticlePubMedGoogle Scholar
- Kiprop EK, Baudoin JP, Mwang'ombe AW, Kimani PM, Mergeai G, Maquet A: Characterization of Kenyan isolates of Fusarium udum from pigeonpea [ Cajanus cajan (L.) Millsp.] by cultural characteristics, aggressiveness and AFLP analysis. 150th edition. 2002, 517-525.Google Scholar
- Owor BE, Shepherd DN, Taylor NJ, Edema R, Monjane AL, Thomson JA, Martin DP, Varsani A: Successful application of FTA Classic Card technology and use of bacteriophage phi29 DNA polymerase for large-scale field sampling and cloning of complete maize streak virus genomes. J Virol Methods 2007, 140: 100-105.View ArticlePubMedGoogle Scholar
- Shepherd DN, Martin DP, Lefeurve P, Monjane AL, Owor B, Rybicki EP, Varsani A: A protocol for the rapid isolation of full geminivirus genomes from dried plant tissue. J Virol Methods 2008, 149: 97-102.View ArticlePubMedGoogle Scholar
- Inoue-Nagata AK, Albuquerque LC, Rocha WB, Nagata T: A simple method for cloning the complete begomovirus genome using the bacteriophage phi29 DNA polymerase. J Virol Methods 2004, 116: 209-211.View ArticlePubMedGoogle Scholar
- Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular evolutionary genetics analysis (MEGA) software version 4.0. Mol Biol Evol 2007, 24: 1596-1599.View ArticlePubMedGoogle Scholar
- McVean G, Awadalla P, Fearnhead P: A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics 2002, 160: 1231-1241.PubMed CentralPubMedGoogle Scholar
- Martin DP, Williamson C, Posada D: RDP2: recombination detection and analysis from sequence alignments. Bioinformatics 2005, 21: 260-262.View ArticlePubMedGoogle Scholar
- Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 2003, 52: 696-704.View ArticlePubMedGoogle Scholar
- Kass RE, Raftery AE: Bayes Factors. J Amer Stat Assoc 1995, 90: 773-795.View ArticleGoogle Scholar
- Suchard MA, Weiss RE, Sinsheimer JS: Bayesian selection of continuous-time Markov chain evolutionary models. Mol Biol Evol 2001, 18: 1001-1013.View ArticlePubMedGoogle Scholar
- Whelan S, Goldman N: A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 2001, 18: 691-699.View ArticlePubMedGoogle Scholar
- Tavaré S: Some probabilistic and statistical problems in the analysis of DNA sequences. Lect Math Life Sci 1986, 17: 57-86.Google Scholar
- Kosakovsky Pond SL, Frost SDW: Datamonkey: rapid detection of selective pressure on individual sites of codon alignments. Bioinformatics 2005, 21: 2531-2533.View ArticleGoogle Scholar
- Kosakovsky Pond SL, Frost SD: Not so different after all: A comparison of methods for detecting amino acid Sites under selection. Mol Biol Evol 2005, 22: 1208-1222.View ArticlePubMedGoogle Scholar
- Rozas J, Sánchez-DelBarrio JC, Messeguer X, Rozas R: DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 2003, 19: 2496-2497.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.