Sequencing and characterization of Varicella-Zoster virus vaccine strain SuduVax

Background Varicella-zoster virus (VZV) causes chickenpox in children and shingles in older people. Currently, live attenuated vaccines based on the Oka strain are available worldwide. In Korea, an attenuated VZV vaccine has been developed from a Korean isolate and has been commercially available since 1994. Despite this long history of use, the mechanism for the attenuation of the vaccine strain is still elusive. We attempted to understand the molecular basis of attenuation mechanism by full genome sequencing and comparative genomic analyses of the Korean vaccine strain SuduVax. Results SuduVax was found to contain a genome that was 124,759 bp and possessed 74 open reading frames (ORFs). SuduVax was genetically most close to Oka strains and these Korean-Japanese strains formed a strong clade in phylogenetic trees. SuduVax, similar to the Oka vaccine strains, underwent T- > C substitution at the stop codon of ORF0, resulting in a read-through mutation to code for an extended form of ORF0 protein. SuduVax also shared certain deletion and insertion mutations in ORFs 17, 29, 56 and 60 with Oka vaccine strains and some clinical strains. Conclusions The Korean VZV vaccine strain SuduVax is genetically similar to the Oka vaccine strains. Further comparative genomic and bioinformatics analyses will help to elucidate the molecular basis of the attenuation of the VZV vaccine strains.


Background
Varicella-zoster virus (VZV) is an alpha-herpesvirus and the cause of chickenpox (varicella) and shingles (zoster). Chickenpox is characterized by fever and generalized rash, and is most prevalent in children due to primary infection. VZV can establish a latent infection in nerve cells of dorsal root ganglia and its reactivation from latency causes shingles in older adults and in immunocompromised people.
Isolation and propagation of VZV in cell culture was first reported in 1953 [1], and the first determination of the complete nucleotide sequence was made from the Dumas strain [2]. As of August 2010, complete nucleotide sequences had been determined and were available from NCBI GenBank database from 23 VZV strains including three vaccine strains derived from the Oka strain. Comparison of the full nucleotide sequences of clinical with vaccine strains has enabled researchers to suggest putative regions that might be responsible for attenuation in vaccine strains [3][4][5][6].
In Korea, the pharmaceutical company GCC has been manufacturing an attenuated VZV vaccine for chickenpox since 1994. The live-attenuated vaccine strain, SuduVax ® , was obtained through serial passage of wild-type virus in cell culture. The original wild-type virus was isolated in primary human embryonic lung (HEL) cell culture from a 33-month-old boy with chickenpox in 1989 in Seoul, Korea [7]. The virus was attenuated by 10 passages in HEL cells, 12 passages in guinea pig embryonic lung cells, and passaged five times in HEL cells to prepare an attenuated strain, designated MAV06, for vaccine production [8]. The attenuated viruses were stored in liquid nitrogen (master virus banks). Working virus banks are routinely produced after five passages of master virus bank stocks in HEL cells. The final vaccine (SuduVax) is manufactured after passaging of the working virus bank five times in HEL cells.
SuduVax has been marketed in Korea since 1994 and internationally since 1998. Although the efficacy and safety of SuduVax have been proved in the marketplace, molecular studies explaining the mechanism of attenuation or the efficacy of the vaccine have not been available. In this study, the complete nucleotide sequence of Sudu-Vax was determined and compared with those of 23 VZV strains whose full genomic sequences are registered in the NCBI GenBank database.

Results
Overall genome structure of the Korean vaccine strain SuduVax The genome of the VZV strain SuduVax was determined to be 124,759 bp. The architecture of the SuduVax genome is typical of VZV in that the genome could be divided into TRL, UL, IRL, IRS, US and TRS (88, 104,799, 88, 7,276, 5,232, and 7276 bp, respectively). The G + C content of the SuduVax genome is approximately 46.1%. The lengths of the genome, lengths of each region and the G + C contents are very similar among the 24 VZV strains analyzed in this study ( Table 1). The SuduVax genome contains 74 ORFs. Of these 64 are UL genes and four are US genes. Three genes in IRS (ORFs 62-64) are inversely repeated in TRS (ORFs 69-71). Of the 74 ORFs, 39 are in the forward direction and 35 are in the reverse direction. The directions of ORFs are 100% conserved among the analyzed VZV strains. The ORF map of strain SuduVax is presented in Figure 1.

Phylogenetic analysis
Phylogenetic trees were constructed using the full nucleotide sequences of SuduVax and 23 VZV strains whose full genomic DNA sequences are known. As shown in an unrooted tree generated by maximum-likelihood method, SuduVax and four Oka strains (pOka, vOka, VarilRix, Var-iVax) formed a clade and strains M2DR and 8 formed an adjacent clade (Figure 2a). These two clades were joined with the clade whose member was the strain CA123 only. Strains 11, 22, 03-500 and HJ0 formed another clade and the rest of the clinical strains formed the last clade. Almost identical topology was observed in a tree generated by neighbour-joining method (data not shown) and Bayesian method [9]. SuduVax together with Oka strains formed a distinctive clade, corresponding to clade 2 proposed by the VZV Nomenclature Meeting 2008 [10]. When trees were constructed with concatenated coding nucleotide sequences (ORF) or amino acid sequences, similar tree topologies were obtained (data not shown). Next, we tried to build phylogenetic trees using non-coding sequences. Again, SuduVax grouped with four Oka strains, forming clade 2 (Figure 2b). One notable difference between the trees built by full or coding sequences and the tree built by non-coding sequences was the location of pOka, the parental Oka strain from which vaccine strain vOka was derived. While pOka was located between the four vaccine strains and 19 clinical strains in the tree built by full or coding sequences, pOka was buried among the vaccine strains in tree built by non-coding sequences (compare Figures 2a, b). In other words, four vaccine strains (vOka, VarilRix, VariVax, and SuduVax) formed a subclade within the clade 2 in the trees built by full or coding sequences (bootstrap value = 1,000 in neighbour-joining trees), but not in the tree built by non-coding sequences.
In order to find which ORFs are important in distinguishing vaccine strains from clinical strains, further phylogentic analyses using individual ORF were performed. Of the 74 phylogenetic trees, 12 ORF trees exhibited clear branches leading to a formation of clusters consisting of vaccine strains. These 12 ORFs included ORF 0, 1, 6, 18, 31, 35, 39, 59, 62, 64, 69 and 71 ( Figure 2c). The bootstrap values for vaccine clusters were greater than 640. In majority of ORF trees, vaccine clusters formed subclades within clade 2. However, in phylogenetic trees based on ORFs 1, 18, 39 and 59, branches leading to clade 2 were not present or very short with low bootstrap values ( Figure 2d). Thus, the vaccine strains did not always form a subclade within clade 2.
Evolutionary relationships between the Korean vaccine strain SuduVax and other VZV strains were investigated by calculating genetic distances among the 24 VZV strains. As a whole, VZV genome sequences were highly conserved among the strains. At the level of full nucleotide sequences, SuduVax was the most similar to VarilRix, followed by vOka, VariVax and pOka ( Table 2). Similar results were obtained when the genetic distances were calculated using concatenated non-coding nucleotide sequences or amino acid sequences. The average distance between SuduVax and three vaccine strains at the full nucleotide level was calculated to be 0.20 ± 0.05 × 10 -3 , which was < 10% of the average distance between Sudu-Vax and 20 clinical strains (2.08 ± 0.39 × 10 -3 , Table 2). Among the clinical strains except for pOka, strain 8 was the most similar to SuduVax.

Mutations found in SuduVax ORFs
SuduVax ORF0 exists as longer form due to a readthrough mutation. The stop codon TGA (nucleotide position 388-390) was mutated to CGA coding for Arg. A putative stop codon TGA was found downstream and overlapped with ORF1 ( Figure 3). This extended ORF0 encoded a new protein with 221 amino acid residues. The same read-through mutation was found in other vaccine strains, vOka, VarilRix and VariVax. All clinical strains including pOka contained 390 bp-long ORF0 coding for 129 amino acids.
Compared to the reference strain Dumas, the lengths of ORF17 and ORF56 of the strain SuduVax were 3 bp short due to deletion of TCA at position 367 to 369 and TCT at position 658 to 660, respectively. Both deletions resulted in deletion of amino acid S residue. On the other hand, insertion of three nucleotides ATG at position 27 was found in ORF60 of the strain SuduVax. Interestingly, the aforementioned two deletions and one insertion were also found in all Oka strains including pOka. SuduVax as well as Oka strains were found to have a15 bp (AACATTTCAGGGTCA) shorter ORF29 than most clinical isolates that contain two tandem reiterations of this 15 bp sequence. Among the clinical strains, M2DR, CA123 and 8 contained only one copy of the 15 bp element in ORF29. Strains M2DR and 8 shared the same length for ORF60 with Oka and Sudu-Vax strains. Table 3 summarizes the insertion and deletion mutations found in SuduVax.

Discussion
VZV strain SuduVax has been used by a Korean pharmaceutical company to produce live attenuated vaccine for chickenpox since 1994. Although its efficacy and safety have been proven in the marketplace, molecular characteristics of the vaccine strain have not been available. In this study sequencing and analyses of the nucleotide sequence of the Korean varicella vaccine strain SuduVax were undertaken.
In the original paper on the first complete sequencing of VZV strain Dumas [2], 71 ORFs were proposed. However, the information obtained from the NCBI GenBank database for Dumas (NC_001348) identifies 73 ORF if three ORFs located in TRS are counted as separate ORFs. Sequencing of two Oka-derived vaccine strains, VarilRix (DQ008354) and VariVax (DQ008355), identified 72 ORFs [5]. A Blast search using these three strains as queries produced 74 possible ORFs for VZV. We were presently able to locate ORF45 (position 81,523-82,593) to Dumas and ORF33.5 to VarilRix (position 60,257 -61,165) and VariVax (60,254 -61,162). Extended from of ORF0 due to read-through mutation was identified in SuduVax as well as in Oka vaccine strains (see below). Using these reference strains Dumas and VarilRix as queries, we were able to identify and locate 74 ORFs in the genome of the strain SuduVax as well as in other 23 VZV strains analyzed in this study. Phylogenetic analysis using the full nucleotide sequences of 24 VZV strains identified five distinct clades, consistent with previous findings [9,10]. Phylogenetic trees constructed with concatenated amino acid sequences and coding nucleotide sequences also revealed five clades with the same members. The tree built using non-coding nucleotide sequences appeared  [10] with other members of the clade 2. Various genotyping methods using limited genetic information of VZV strains have been proved to represent genotyping using full genome information [11][12][13][14][15]. Any genotyping method unequivocally placed SuduVax to the same genogroup with Oka strains as in phylogenetic trees based on full or near-full genetic information (data not shown).
It is not presently certain, because of the lack of full genome sequences from other Asian isolates, whether this clade 2 could be extended to include isolates from other Asian countries or whether it is confined to isolates from Japan and Korea only. However, available data based on partial nucleotide sequences or restriction fragment length polymorphism suggest that all Korean isolates and Chinese isolates form a clade with Japanese isolates [16,17]. Thus, it is possible that the clade 2 could be extended to include China, which is geographically close to Japan and Korea.
Coding sequences occupy approximately 91% of the VZV genome and reflect most of the sequence information of the whole genome. Thus, it was expected that the phylogenetic trees based on the coding sequences are very similar to the trees based on the full nucleotide sequences. We found that the coding sequence trees and amino acid trees were similar to the full nucleotide trees. Noncoding sequences were found to be interspersed between coding sequences or ORFs, accounting for approximately 9% of the VZV genome. The phylogenetic trees based on VZV noncoding sequences are not different from those based on full or coding nucleotide sequences or amino acid sequences. One notable difference is the location of pOka within clade 2. In full or coding sequence trees, pOka was separated from four vaccine strains to form two independent subclades within clade 2. On the contrary, pOka did not form a subclade separated from vaccine strains in noncoding sequence trees. pOka is a clinical strain. Thus, coding sequences or amino acid sequences of VZV genome may provide information distinguishing vaccine strains from clinical strains, while noncoding sequences does not.
Phylogenetic analyses using the nucleotide sequences of individual ORFs suggested 12 ORFs may be important in distinguishing vaccine strains from clinical strains. Yamanish identified 23 ORFs that are different between pOka and Oka vaccine [6], including 12 ORFs identified in this study. Moreover, our preliminary studies of single nucleotide polymorphism among the full genomic DNA sequences of the 24 VZV strains revealed 12 ORFs that may be characteristic for vaccine strains and these 12 ORFs coincide with the above-mentioned 12 ORFs [manuscript in preparation].
ORF0, also known as ORFS/L, is thought to be essential for VZV growth and encodes a membrane protein with 129 amino acid residues, which is possibly involved in vesicular trafficking and altering cell adhesion molecules in infected cells [18,19]. ORF0 in SuduVax was determined to possess an extended C-terminal sequence due to a read-through mutation of its original stop codon TGA to CGA coding for Arg. The nearest downstream stop codon TGA was found to overlap with ORF1 and the extended ORF0 is expected to code for a new protein with 221 amino acid residues. Interestingly, this read-through mutation was also found in the three Oka-derived vaccine strains, while the stop codons were found to be unaltered in all of the clinical strains including the parent Oka strain. In cells infected with vOka, the extended form of ORF0 protein with 221 amino acid residues and its spliced form with 155 amino acid residues are expressed [20]. Since other vaccine strains, including SuduVax, share 100% identical nucleotide sequences within and downstream of ORF0 up to the new stop codon, both forms of the extended ORF0 proteins are expected to be expressed in permissive cells infected with SuduVax. Thus, read-through mutation in ORF0 might be an important feature distinguishing vaccine strains from clinical strains.
Besides the read-through mutation in ORF0, SuduVax share same mutational events in ORFs 17, 29, 56 and 60 with Oka strains. ORF17 codes for an mRNA-specific RNase [21] and ORF29 encodes single strand DNA binding protein via its zinc-finger domain [22]. The function of ORF56 has not been well characterized, but its gene product is reported to co-localize with regulatory protein ICP22 and nuclear protein UL3 in small, dense nuclear bodies (NCBI, http://www.ncbi.nlm.nih. gov/pubmed?Db=gene&Cmd=retrieve&dopt=full_repor-t&list_uids=1487683. The gene product of ORF60 is glycoprotein L, which acts as a chaperon for glycoprotein H [23]. Three bp deletions were found in ORFs 17 and 56, and an insertion of 3-bp was found in ORF60. While most of the clinical strains contain two tandem copies of 15 bp (AACATTTCAGGGTCA) elements in ORF29, while the SuduVax and Oka strains contain only one copy of this 15 bp element. Of these four deletion and insertion events, two events (ORFs 29, 60) are shared with the clinical strains 8 and M2DR, and one event (ORF29) is also found in the strain CA123. Since these deletion and insertion events are also found in some of the clinical strains including pOka, they by themselves may not be important in attenuation, although it is still possible that they, in combination with other events such as read-through mutation in ORF0, may play some roles in attenuation of vaccine strains.

Conclusion
We obtained and analyzed full nucleotide sequence of the Korean vaccine strain SuduVax. SuduVax was shown to be genetically most similar to Oka-derived and vaccine strain VarilRix (DQ008354). The gaps between the contigs were filled by polymerase chain reaction sequencing using primers whose sequences were obtained from the adjacent contigs. The completed sequence was deposited into NCBI GenBank (accession number JF306641).

Allocation of ORFs
ORFs of the strain SuduVax in the full genome sequence was located by Blast search against two reference strains Dumas (NC_001348) and VarilRix (DQ008354). Complementary determining sequences

Phylogenetic analysis
Nucleotide sequences of the VZV full genome other than SuduVax were obtained directly from GenBank database ( Table 1). For each VZV strain, all ORF sequences were cut and pasted to generate a concatenated coding sequence. Similarly, all inter-ORF sequences were cut and pasted to build a concatenated noncoding sequence. Amino acid sequences were obtained by translation of the corresponding ORFs and pasted to generate a concatenated sequence harbouring all 74 ORFs. These full or concatenated nucleotide or concatenated amino acid sequence of the 24 VZV strains were multiple-aligned using the ClustalW program (ver 2.0.1) followed by manual editing. The resulting out-files were used to calculate genetic distances using Dnadist (for nucleotide) or Protdist (for amino acid) program included in Phylip package (version 3.69, http://evolution.genetics.washington.edu/phylip. html). Distance matrix was obtained by Kimura-2-parameter for nucleotide or Jones-Taylor-Thornton method for amino acid. Cluster analysis was performed by neighbour-joining (NJ) and maximum-likelihood (ML) method and resulting tree files were viewed by Treeview program (version 1.6.6). The significance of the phylogenetic trees was verified by bootstrap analysis. Phylogenetic trees were constructed from 1,000 replicates generated by the Seqboot program and the consensus tree was identified by the Consense program.