Sixteen wild-type rubella virus isolates from the Centers for Disease Control and Prevention (CDC) were chosen for whole genome sequencing at the J. Craig Venter Institute (JCVI) (Table 1). All viruses in this study were collected in the United States, most as a result of routine surveillance, and dates of collection range from 1961 to 2009, with the majority coming from the late 1990′s and 2000′s. One of the virus isolates is a laboratory strain (NJ.USA/61/1a), commonly known as M33 (ATCC, Manassas, VA) which was the first rubella virus isolate . There are currently 4 entries in GenBank for the M33 strain: X05259 (3382 nts), X72393 (6600 nts), J02620 (1822 nts), and AJ438491 (948 nts). Although the combined X05259 and X72393 sequences cover most of the genome (9749 nts), they contain errors making it difficult to align and use the sequences. Therefore, a resequencing of this historic virus was deemed beneficial. Ten of the isolates were known to have been imported into the United States from other countries, either as acute cases acquired abroad or as CRS cases whose mothers spent time in other countries during the early stages of their pregnancies. Two of the isolates, LA.CA.USA/91/1C and Seattle.WA.USA/16.00/2B, serve as WHO reference viruses (Table 1).
Stocks of the 16 viruses were inoculated into T75 flasks of confluent Vero cells. Five to seven days post-infection, the culture medium was removed and total cellular RNA was extracted using Tri-Reagent (Molecular Research Center, Cincinnati, OH) according to the manufacturer’s protocol. The RNA was resuspended in nuclease-free water and stored at −80°C.
Oligonucleotide primers were designed using an automated primer design tool [24, 25]. Clade 1 primers were designed from an alignment of the following reference sequences: 1a_AF435865, 1a_AB222609.1, 1B_DQ085339.1, 1C_DQ085341.1, 1D_DQ388281 and Clade 2 primers were designed from 2A_AY258322.1, 2B_DQ085338.1, and 2C_DQ085340.1. Primers, with M13 tags added, were designed at intervals along both the sense and antisense strands, and provided amplicon coverage of at least 4-fold (Additional file 1: Table S1). RT-PCRs were performed with 1 ng of RNA using OneStep RT-PCR kits (Qiagen, Valencia, CA) according to manufacturer’s instructions with minor modifications. Reactions were scaled down to 1/5 the recommended volumes, the RNA templates were denatured at 95°C for 5 min, and 1.6 U RNase Out (Invitrogen, Carlsbad, CA) was used. The RT-PCR products were sequenced with an ABI Prism BigDye v3.1 terminator cycle sequencing kit (Applied Biosystems, Carlsbad, CA). Raw sequence traces were trimmed to remove any primer-derived sequence as well as low quality sequence, and gene sequences were assembled using Minimus, part of the open-source AMOS project . The gene sequences were then manually edited using ClOE (Closure Editor; JCVI) and ambiguous regions were resolved by additional sequencing when possible. Finally, the Viral Genome ORF Reader  was used to check segment lengths, perform alignments, ensure the fidelity of open-reading frames, correlate nt polymorphisms with amino acid changes, and detect any potential sequence errors. The 5′ and 3′ termini were determined using the 5′/3′ RACE kit as directed by the manufacturer (Roche Diagnostics, Mannheim, Germany) and oligo dT priming, respectively. The termini sequences were incorporated into the whole genome sequences as described above.
Sequence and phylogenetic analyses
The sequences were aligned by ClustalW in the MEGA 4.0 software . Phylogenetic analysis was performed using the Mega 5.05 program . The first 21 nts of the 30 sequences were deleted from the sequence alignment due to a gap in one sequence (AF435865). Analysis of variable and conserved positions relative to a consensus sequence was done using Microsoft Excel by comparing alignments of the nt and aa sequences of the 30 viruses. The comparison does not necessarily reflect the evolutionary distance between the viruses. Recombination analysis was performed using the RDP3 program .