A divergent variant of Grapevine leafroll-associated virus 3 is present in California

Background Grapevine leafroll-associated viruses are a problem for grape production globally. Symptoms are caused by a number of distinct viral species. During a survey of Napa Valley vineyards (California, USA), we found evidence of a new variant of Grapevine leafroll-associated virus 3 (GLRaV-3). We isolated its genome from a symptomatic greenhouse-raised plant and fully sequenced it. Findings In a maximum likelihood analysis of representative GLRaV-3 gene sequences, the isolate grouped most closely with a recently sequenced variant from South Africa and a partial sequence from New Zealand. These highly divergent GLRaV-3 variants have predicted proteins that are more than 10% divergent from other GLRaV-3 variants, and appear to be missing an open reading frame for the p6 protein. Conclusions This divergent GLRaV-3 phylogroup is already present in grape-growing regions worldwide and is capable of causing symptoms of leafroll disease without the p6 protein.


Introduction
Grapevine leafroll disease (GLRD) is observed in all wine-making regions worldwide [1,2], limiting grape production by up to 40 percent [3]. Besides leaf rolling, other GLRD symptoms include abnormal pigmentation of the leaf interveinal area, disruption of the phloem and delayed grape maturation [3]. GLRD is caused by several related positive single-stranded RNA virus species in the family Closteroviridae, which contains the largest known plant RNA virus genomes [4]. All GLRDcausing viruses are phloem-limited [5] and infect Vitis hosts [6]. The mealybug-transmitted viruses are in the genus Ampelovirus, and Grapevine leafroll-associated virus 2, which has no known vector, is in the genus Closterovirus [6]. An additional GLRD-causing virus, Grapevine leafroll-associated virus 7, is still unclassified [7], although a recent proposal will place it in a new genus [8]. In fact, Closteroviridae recently underwent a taxonomic revision, and it is anticipated that the number of tentative GLRaV species will be reduced to five [8].
Grapevine leafroll-associated virus 3 (GLRaV-3) is the type species of the genus Ampelovirus. Two distinct isolates, GP18 [9] and WA-MR, [10] have become representative of two major clades of GLRaV-3, but more intensive sampling revealed many genetically separated well-supported clades, potentially leading to seven subclades within GLRaV-3 [11]. The overall genomic diversity amongst GLRaV-3 had remained fairly limited [8] until the recent publication of a South African isolate (GH11), which had~68% nucleotide identity with other GLRaV-3 variants [12], but showed higher identity to a partial sequence of GLRaV-3 from New Zealand (NZ-1).
During a recent survey of vineyards in Napa Valley, California USA, we found plants with divergent partial genome sequences of GLRaV-3, with close homology to NZ-1 (GLRaV-3e cluster) [11,13]. These plants were subsequently vegetatively propagated in our greenhouse at the University of California, Berkeley, and an isolate found in a symptomatic Merlot plant from Rutherford, California was selected to be fully sequenced. This plant was tested periodically for the presence of other GLRaV species by PCR of the coat protein-coding region from total nucleic acid (TNA) extractions as in [11]; no other GLRaV species was detected. Transmission experiments using the vine mealybug (Planococcus ficus, Hemiptera, Pseudococcidae) showed that this isolate is mealybug transmissible (Almeida, data not shown).

Isolation and sequencing
RNA and TNA were purified as previously described [13]. TNA was purified for GLRaV detection and for sequencing all of the genome, except for the ends. The ends were sequenced using 3' and 5' RACE kits (Invitrogen, Carlsbad, CA) on purified RNA that was treated with a DNAse I, as suggested by the manufacturer. These and subsequent sequencing reactions were performed at the Barker Hall Sequencing Facility located on the U.C. Berkeley campus.
Sequencing of the full genome was performed using a primer walking strategy and reverse transcription was initiated outward from the coat protein-coding region. Forward primers (Table 1) were designed by aligning all available GLRaV-3 full genome sequences, including Napa Valley survey sequences where possible [13]. Virus-specific primers for reverse transcription were designed from sequencing data obtained above and to meet the manufacturer's specifications of the Superscript II reverse transcriptase used in this study (Table 1). Four reverse transcription reactions were carried out per sample.
Primers for PCR were designed using conserved regions from the alignments above and with high melting temperatures to allow for a two-step PCR procedure using the Phusion Hot Start II Polymerase (Thermo-Fischer, Waltham, MA). Reverse transcription reactions from above were used as template. An initial two-minute, 98°C complete denaturation step was performed followed by 35 cycles of denaturing for 8 seconds at 98°C , followed by a joined primer annealing and extension step at 72°C for 30 seconds per kb of expected product. A final extension step for 7 minutes at 72°C was carried out to ensure complete extension of template. Amplicon sizes used to assemble the genome ranged between 3.5 kb and 8 kb, however, we were able to generate amplicons as large as 12 kb. A second round of PCR was carried out as above using the diluted 1 st PCR reactions as the template, amplifying with nested primers, and reducing the extension time to 20 seconds/kb. For each 1 st PCR sample, eight 2 nd PCRs were performed. All end products were visualized on a gel and then subsequently purified and concentrated using a kit (Zymo Research, Irvine, CA), and sent for sequencing. PCR products from the initial four or more RT-products were sequenced independently in both directions. The results were then manually checked and assembled using Vector NTI v.11 (Invitrogen). The assembly was then inserted into the alignment above and used to design new reverse transcription primers and reverse primers for PCR.
For both genomic ends, primers were designed using the sequencing data obtained above. For the 3' end, poly-A tailing was performed prior to using the 3' RACE Kit using a modified version of the manufacturer's instructions to  partially extend the ends (Ambion, Foster City, CA). Due to the appearance of multiple secondary products resulting from the lowered PCR specificity, the final product was treated with a T4 polymerase to blunt the 3' overhangs for subsequent blunt cloning (New England Biolabs, Ipswich, MA). The product was cloned using a Zero Blunt Topo PCR cloning kit and Top10 chemically competent cells (Invitrogen). Colony PCRs and sequencing reactions were performed from 25 randomly chosen colonies using M13 primers. All colonies contained variable lengths of poly-A tailed product from the virus genome but only those with clean reads were utilized for assembly. For the 5' end, the 5' RACE kit instructions were followed. The PCR product was purified using a DNA Clean and Concentrator kit (Zymo Research) and sequenced.

Sequence analysis
Annotation of the predicted open reading frames in the newly sequenced isolate, named CA7246 [GenBank: JQ796828], was done using MacVector (Cary, NC). ORFs were named according to sequence similarity and synteny with ORFs in GLRaV-3 [12]. Despite using an additional program (ORF Finder, http://www.ncbi.nlm.nih.gov/gorf/ gorf.html) we could not find an ORF homologous to the GLRaV-3 ORF2 (encoding p6). The absence of this ORF was confirmed by sequencing of that region from additional five independent isolates. While this manuscript was in review, the sequence of GH11 [GenBank: JQ655295] was released, and was added to the analysis in revision. No ORF2 was detected in GH11 or the partial NZ-1 as well [12], indicating that p6 may not be an essential protein for GLRaV-3.
We then conducted a phylogenetic analysis on four important ORFs in GLRaV-3, and downloaded all available full-length GLRaV-3 RdRp, HSP70h, CP, and CPm sequences from GenBank on August 15, 2011 (GH11 was added in revision). The nucleotide sequences were manually aligned in Se-Al v2.0a11 (http://tree.bio.ed.ac. uk/software/seal/), appropriate nucleotide substitution models were then selected by ModelTest [14] based on Akaike's Information Criterion, and used to infer maximum likelihood gene trees with 1000 bootstrap replicates in PAUP* v4.0beta [15].
These trees clearly show that CA7246 is more closely related to GH11 and the partial NZ-1 sequences than to other GLRaV-3 isolates (Figure 1). However, it is not known how these GLRaV-3 variants evolved to be so distinct from other GLRaV-3 strains. In order to assess whether any of the divergence of CA7246 was due to interspecific recombination, 200-base portions of the entire CA7246 genome were individually subjected to BLAST analysis to determine if any portion matched to any other taxa than GLRaV-3. The same analysis was conducted for the genome of GH11. All of these regions consistently showed homology to GLRaV-3 with no significant hits (BLAST score of ≥200) to other sequences in the non-redundant nucleotide collection in GenBank. The divergence of GH11/CA7246 from other GLRaV-3 variants appears to have arisen through mutation rather than recombination with any other characterized sequence.
The molecular weights of CA7246's predicted protein products were calculated with the Sequence Manipulation Suite (http://www.bioinformatics.org/sms2/) [16] and are given in Table 2. Several of the GLRaV-3 Table 2 Percent amino acid and nucleotide identities between the untranslated regions and protein-coding genes (non-gapped columns) of CA7246 and isolates GH11, GP18, WA-MR and the partially sequenced isolate NZ-1 The percent identities between sequences with gapped alignments were calculated using only the common non-gap columns. Protein names and ORF numbering are as in the type sequence of GLRaV-3, though ORF2 (and its product, p6), do not appear in CA7246 or GH11.
proteins are named for their inferred protein molecular weights, and two of CA7246's homologues differed in molecular weight: 19.4 kDa and 6.2 kDa for the "p19.6", and the "p7" proteins, respectively. The predicted ORFs and untranslated regions from CA7246 were also aligned and compared to three other GLRaV-3 complete sequences (Table 2): to GH11 [Gen-Bank: JQ655295], WA-MR [GenBank: GU983863] and GP18 [GenBank: EU259806], and to the partial sequence of NZ-1 [GenBank: EF508151]. Nucleic and amino acid percent identities between CA7246 and the four GLRaV-3 sequences were calculated using the Percent Identity tool in UCSF Chimera's MultAlign Viewer [17]. These ORF-by-ORF comparisons show that CA7246 and GH11 are more closely related than they are to other GLRaV-3 variants across their genomes.
However, the CA7246 genome is 9.6% divergent from GH11 by nucleotide sequence, indicating they did not recently diverge from one another. Their 3'UTRs were more identical than their 5'UTRs, which is consistent with the wider diversity of 5'UTR structures that are observed among GLRaV-3 isolates [10,18]. The amino acid identities of their predicted protein products were higher, with the notable exception of p4, which was only 77.8% identical (Table 2). p4 was also the site of the greatest difference between GH11/CA7246 and the other GLRaV-3 variants, with at most 30.6% amino acid identity ( Table 2). This bolsters our previous observation of completely neutral evolution in this ORF [13], and further suggests that this annotated ORF may not be translated, or that it may have a non-essential function.
Isolates of a new phylogroup of GLRaV-3 are present on three continents, and their sequences have diverged sufficiently that it is clear that these isolates dispersed from one another some time ago. We suspect this divergent GLRaV-3 variant has a wide geographic range, and may already be present in other wine-growing regions.