Molecular characterization of China rabies virus vaccine strain

Background Rabies virus (RV), the agent of rabies, can cause a severe encephalomyelitis in several species of mammals, including humans. As a human rabies vaccine strain employed in China, the genetic knowledge of the aG strain has not been fully studied. The main goal of the present study is to amplify the whole genome of aG strain, and genetic relationships between other vaccine strains and wild strains were analyzed. Results The entire genome of human rabies virus vaccine strain aG employed in China was sequenced; this is the second rabies virus vaccine strain from China to be fully characterized. The overall organization and the length of the genome were similar to that of other lyssaviruses. The length of aG strain was 11925nt, comprising a leader sequence of 58nt, nucleoprotein (N) gene of 1353nt, phosphoprotein (P) gene of 894 nt, matrix protein (M) gene of 609nt, glycoprotein (G) gene of 1575nt, RNA-dependent RNA polymerase (RdRp,L) gene of 6384nt, and a trailer region of 70 nt. There was TGAAAAAAA (TGA7) consensus sequence in the end of each gene, except AGA7 at the end of G gene. There was AACAYYYCT consensus start signal at the beginning of each gene. Conclusions In this report, we analyzed the full genome of China human rabies vaccine strain aG. Our studies indicated that the genome of aG retained the basic characteristics of RV. At gene level, N was the most conserved among the five coding genes, indicating this gene is the most appropriate for quantitative genotype definition. The phylogenetic analysis of the N indicated the aG strain clustered most closely with Japanese and Russian rabies vaccine strains, suggesting that they may share the same ancestor; also, the aG strain did not share high homology with wild strains isolated from China, making it may not be the best vaccine strain, more research is needed to elucidate the genetic relationship among the RV circulating in China.


Background
Rabies is a widespread neurological zoometric disease which affects almost all kinds of mammals, including humans. The morality is almost 100%. It is reported that nearly 55,000 human die of rabies all over the world each year, most occurred in Asia and Africa [1]. Following India, China has the second highest number of human cases in the world [2]. The causal agent of the disease is rabies virus (RV), a member of Lyssavirus genus, Rhabdoviridae family. The RV genome is almost 12 kb in size consisting of five monocistronic RNAs, which encodes the nucleocapsid protein (N), phosphoprotein (P), matrix protein (M), glycoprotein (G) and RNA-dependent RNA polymerase (large protein). Between the monocistronic there are four intergenic regions (IGRS) with different length [3]. Besides, at the end of the genome there are two underinflated regions (UTRS), which play an important role in virus replication.
So far, vaccination is the most effective method to prevent rabies. The vaccine strains employed in China are CTN, aG, PM and PV. The complete genome sequence of CTN strain has been sequenced already [4]. Although novel vaccines, such as oral vaccine and live attenuated vaccine have been developed in recent years, the virus strain used for the production of vaccine are still pathogenic for laboratory and wildlife spices, what is more important, these rabies vaccines have the potential of causing vaccine-induced rabies and appear to have limited efficacy in a broad spectrum of animal species [5]. Therefore, genetic knowledge of vaccine strain is essential. Also, comparison of sequence of vaccine strain and field strain circulating in the country would prove how effective of the vaccine against the field virus [6].
The aG strain, also known as pG strain, was obtained from a rabid dog in 1931, Beijing. After 50 passages in the brain of rabbit, the strain was named aG strain. Then, the aG strain has been adapted to VERO cell cultures after 68 passages. Much higher antibody was obtained in immunized dogs, and the morality was lower. Since then, the aG strain has been chosen for the production of human rabies vaccine. Before this study, P, M, G and L genes of aG strain have been sequenced; however, this is the first time that the complete genome has been sequenced. In the present paper, the complete sequence of the rabies virus vaccine strain aG employed in China was sequenced. The sequence obtained indicates that aG strain genome is similar to the sequence of other complete lyssavirus genome available in Genbank. Sequence comparison with other rabies virus strain and phylogenetic analysis according to the N gene also falls within the scope of this study.

Virus strain
The aG strain was prepared and stored by State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institite, Chinese Academy of Agricultural Science (LVRI, CAAS). The parental virus was isolated from a rabid dog's brain from Beijing, China, of a patient died of rabies from Beijing 1931, China. After 31 passages in primary hamster kidney (PHKC) cell, it was chosen for vaccine production in China [4].

Primer design
According to the conserved regions of the genome of rabies virus published in GenBank, particular the RC-HL(Genbank number: AB009663), RV-97(Genbank number EF542830), Nishigahara(Genbank number: AB044824) and NI-CE(Genbank number: Ab128149), 12 pairs of primers were designed to amplify the unknown regions of aG strain by primer premier 5.0 software ( Table 1). All primers were synthesized by Shanghai Sangon Biological Engineering Technology & Service Co., Ltd.

Reverse transcription-PCR and direct sequencing
Total RNA were extracted from lyophilized brains of mice infected with aG strain by using an Rneasy Mini Plus Total RNA extraction kit (Qiagen) according to the manufacture's instruction. Single cDNA was synthesized by using First Strand cDNA Synthesis Kit (ToYOBO), and the cDNA were amplified using KOD Plus Neo (TOYOBO). Amplified DNA products were examined via electrophoresis using 0.8% ultrapure TM Agarose (Invitrogen) gels. DNA products of expected size were purified using Agarose Gel DNA Purification Kit (TAKARA). The purified products were added to A at the end of 3'end using DNA A-tailing Kit (TAKARA) and then ligated with pMD18-T (TAKARA), the ligated products were transformed into competent cell JM109 (TianGen), at least 3 clones carrying the PCR products of expected size were identified by PCR and the positive clones were sequenced commercially (Shanghai Sangon biological Engineering Technology &Services Co., Ltd., China).

Amplification and sequencing of the terminal ends
The 5'and 3'terminal of the genome was confirmed using the approach described by Kuzmin et al [7]. Briefly, Total RNA was subjected to ligation by T4 RNA a The 12 fragments are labeled from "F1" to "F12" b "F" and "R" strand for forward and reverse direction of the primer extension ligase (Invitrogen) to circularize the genome. Thereafter the ligated RNA was subjected to nested PCR with the primer marked with F 11F, 11R, 12F and 12 R located within the 5'and 3'end of the genome. Products of the expected size were dealt with above.

Sequence alignment and analysis
Nucleotide sequences obtained were edited and assembled manually using EditSeq of DNAStar program 7.0 (DNAStar). Phylogenetic analysis of N gene of Lyssavirus, including the construction of phylogenetic tree were conducted using Molecular Evolutionary Genetics Analysis (MEGA) Version 5.0 with the methods of neighbor-joining (NJ) algorithm with the Kimura twoparameter model. The reference sequences have been listed in Table 2. The reliability of the phylogeny groupings was evaluated using bootstrapping with 1000 replicates. Bootstrap values of 70% or greater were viewed significant.

Genome organization of the aG strain
Using a total of 24 primers (as shown in table 1) the strain aG genome was obtained as 12 separate overlapping PCR products. The 5'-and 3'-terminal were obtained as mentioned above. The length of the genome is 11925 nucleotides(nt), and the genome organization of aG strain, which follows typical rabies virus organization, is summarized as follows:

Nucleotide feathers of strain aG
The length and sequence of leader region is highly conserved in all lyssavirus. The length of the leader region is 58nt. The first 12 nucleotides, especially the first 9 are the same in all the lyssavirus. However, the conservation is not so stringent after residue 25. Following most lyssavirus, the trailer regions of aG strain is 70nts. The complementarity of the 3'and 5' ends of the genome is another typical feather of the Mononegavirals, with up to 16 nts are complemented. The first 9 nts of the genome leader regions are absolutely complementary to the corresponding regions of the trailer regions [7] (Figure 1).
The transcription initiation signals (TISs) and transcription termination signals (TTS) are conserved as AACAYYHCT and G (A) 7 , but the TTS between G-L was AGAAAAAAA at the N-P junction. The intergenic regions between the cistrons CT, CCGAA, ACTATT and CTGTAGATCGAAAGAGCAACTGGC (Table 3). Pairwise comparisons of nucleotide and predicted amino acid sequence identities of the aG with the vaccine strain of lyssaviruses are listed in Table 4.

Structural feathers of the proteins of aG strain
Consistent with previous studies, at the nucleotide level N was the most conserved gene [8]. The N protein of aG strain is 450 amino acid residues. Only 10 Ala amino acid substitution is unique to the strain aG. Antigenic site I (residues 358-367), antigenic site IV (375-383, 359-366) and RNA-binding domain (residues 298-352) were found to be conserved in all the isolates analyzed. Ser389, which was considered to be related to caseintype phosphorylation site and regulation of viral RNA transcription and replication, was highly conserved [9] ( Figure 2). P gene was considered to be the most divergent protein among the five structural proteins. Six amino acid substitutions, Lys 68 , Tyr 95 , Thr 174 , Thr 184 , Glu 254 and Glu 295 , unique to the aG were observed. The interaction of the motif [(K/R)XTQT] between residues 145-149 with cytoplasmic dynein light chain (LC8), which plays an important role in viral nucleocapsid axoplasmic transport, was encoded as KSTQT [10]. Compared with vaccine strain, the N protein binding site (69-177 and 268-297) of aG strain had one unique amino acid residues substitution, Tyr 95 . The L binding site in P, the first 19 amino acid residues, was absolutely conserved in aG. Ser 162 , Ser 210 , Ser 271 , which have been shown to be involved in the phosphorylation of P were well conserved in the aG strain [11] (Figure 3). M protein connects the plasma membranes, RNPS and G protein together, and is deeply involved in the budding of rabies virus. Thr 100 , Ser 111 and Ser 174 were the three amino acid substitution to the strain aG. The motif proline-rich (PPxY) between residues 35-38 is thought to be associated with the interaction with WW domains of cellular components, and was encoded PPEY, which is very conserved [12]. Residue 58, which is crucial for regulation of RV RNA synthesis [13] (Figure 4), was encoded Glu in aG strain and Gly in PV strain.
The G protein of 514 amino aicd residues is the only viral protein exposed on the surface of the virus and responsible for the immune responses of the host. Compared with vaccine strain analysed in this study, there  Figure 5). L protein is an important component of RNP, although the N and P protein are also needed to form RNP. The 2,127 amino acid residues of L protein are distributed into six conserved motifs.
Three potential N-glycosylation sites, which are located at 37-39, 158-160 and 319-321 were conserved in aG. The L gene of aG encoded a protein of 2,127amino acid, and thirteen amino acid substitutions unique to strain aG were observed: Gly 48 , Gly 58 , Gln 189 , Tyr 353 , Gly 489 , Gly 496 , Ser 606 , Lys 995 , Val 1043 , Ser 1133 , Tyr 1588 , Leu 1658 , His 1801 . The motifs among 544-563(A), 728-732(B) and 1705-1710(C) have been regarded as functional motifs. These motifs are involved in RNA binding, active site of polymerase, polyadenylation or a core of ATP binding site. All these amino acid residues were conserved in strain aG ( Figure 6). In the absence of clear functions for these residues, the importance of any of these changes is presently unknown.

Phylogenetic analysis of N
As shown by the phylogenetic analysis of the nucleoprotein gene (Figure 7), the three vaccine strain Nishigahara, NI-CE and RC-HL were grouped with aG most closely.

Discussion
In the present study, the entire aG genomic sequence, which is composed of 11925 bp, was determined by 12 paris overlapping fragments covering the whole genome to improve our knowledge of the genetic relationship between aG strain and other strain of rabies virus available in Genbank. In contrast to the previous report [14], the length of the whole genomes in GT 1 lyssaviruses varies from 11711-nt to 12278-nt, with both even and odd numbers of nucleotides being represented [15,8]. In contrast to previous studies, the laboratory adapted vaccine strain aG has a odd nucleotide, while other laboratory isolates, such as Nishigahara, PV and SAD-B19 strain, have genomes of even nucleotides numbers [16]. It seems that genomes of odd numbers of nucleotides are more common in wild type strains than fixed strain but both odd and even numbers of genomes occurred in wild type strains and fixed strain. It remains to be determined that whether the isolates with odd or even numbers of the genome has any biological significance. However, the number of the sequence does not follow the 'rule of six' [17].
On average, the G+C content of the lyssaviruses genome is 44.57%, the G+C content was 45.18 mol% for aG strain. This is in agreement with the idea that the G+C content of the negative-stranded RNA viruses is lower than that of positive stranded RNA viruses, which has been considered to be associated with host cell RNA editing [18]. In agreement with previous studies, the genome of aG strain is similar to that of other lyssaviruses with few variations. The complete nucleotide length of aG strain is 11,925nt, which was one, seven and three nucleotides shorter than that of Nishigahara, PV and SAD-B19 strain. Transcriptional initiation and termination signals of aG strain at the beginning and end of the each monocistron, including the motif AACAYYHCT initiatite transcription and a nine nucleotide motif WG(A) 7 , except the AGAAAAAAA TTP at the N-P junction, which is involved in transcription termination and polyadenylation (TTP), are precisely the   same as described by Marston. All the intergenic sequences (IGS) in the aG agree precisely with what was described in that report. Unlike PV and SAD strains, the G-L intergenic region of aG strain has only one polyadenylation site. The 3' and 5' of the genome not only play an important role in transcription, replication and the switching between these two functions, but also in the initiation of encapsidation. The 11 nts at the leader and trailer region of the genome were complementary conserved. These regions showed that the basic structure of rabies virus have been retained in strain aG [19]. It is reported that the 3'NCR of G mRNA to be the pseudogene, because there were two TTP motifs in PV. However, in aG strain, there is only one TTP motif, which holds true for many other isolates, including laboratory fixed strains and street rabies virus strains. Thus, a conclusion can be drawn that the psdudogene is only part of non-translated region of G mRNA.
It is assumed that the aG strain in China is more closely with rabies virus in northern and northeast part of China, since these strains are closely related geographically and ethnically. The phylogenetic tree based on the N gene showed that aG strain clustered with the Japanese vaccine strain, supporting the idea that rabies virus from the same region tend to cluster together.
Consistent with previous observations that viruses from the same geographical area tend to group together, the aG strain tends to cluster most closely with DRV, which was isolated from deer in Jilin province, China. Also, the aG strain grouped closely to other vaccine strain, especially the vaccine strain from Japan [20]. However, the aG strain did not share high nucleotide homology with wild strain, such as FJ008, FJ009 and HN10, making aG not the best vaccine in China [21], although the strain aG has been employed for human vaccine strain in China for a long time. Of course, more research is needed to confirm the hypothesis.
To date, several methods, such as rapid amplification of cDNA (RACE) [22][23][24] and gene-walking approach have been employed to obtain the whole genome sequence, including the 5' and 3' untranslated regions (UTR). In the present study, a simple method of amplification and sequencing of full length of rabies virus genome was performed, according to reference 7. Before this study, similar method has been used to sequence the genome of different viruses [25]. Compared with the method previously, the method described here is easy to perform, does not need complex principle and does not need expensive device; thus it is recommended for other virus.
Comparison of nucleotide sequences of rabies virus available in Genbank with aG strain showed extensive divergence excluding the transcription regulatory signals and limited stretches of the nucleotide. However, according to the phylogenetic analysis of the N gene of aG strain, the strain probably emerged from the DRV, or they may share the same ancestor, which was isolated deer in Jilin province, China. More researches are needed to determine whether the aG strain came originally as a result of the DRV adapting to the dog host or not. The Ser 389 , the putative casein-type phosphorylation site and regulation of viral RNA transcription and replication was conserved in strain aG. The antigenic sites in strain aG were conserved, indicating the antigenicity of aG was well retained, which was suitable for vaccine production.  A.

3'TGCGAATTGTTT A TT TGTTGT T T CTGTTTTCTTTGTT AGTTTGTCGGTC5'
B.   P has been considered to be the most divergent of the five coding protein. The LC8-interacting motif was conserved in aG strain, indicating that the motif serve to transport the virus through neurons. Also, the L-binding region of P, as well as the N-binding site of P was also conserved, except Thr 174 , Glu 295 . These results, together with the five Ser and four Met, indicating that the same function which have been reported in fixed strains were also retained in aG strain. Mebatsion and Rasalingam questioned the function of the interaction between P and LC8 by constructing of recombinant virus in the absence of the LC8 binding site. The results indicated that the virus lacking the binding site was as pathogenic as wild virus, suggesting the interaction is not indispensible to the spread of RV. M protein has been considered to be important in membrane budding and interacting with cellular components. Finke found that substitution of Arg 58 to Gly 58 reduced the expression of full-length RNA. The aG strain retains the Glu 58 , suggesting that it may serve to RNA regulation.
The G protein is related to cell attachment, responsible for the induction of neutralizing antibodies and cellmediated immune response [26]. Ala 242 , Asp 255 , Ile 268 , and Arg 333 , which are correlated with pathogenicity of rabies virus, were encoded Cys, Met, Thr and Ala respectively. The antigenic sites were all retained in aG. These results suggest the antigenicity of G protein was retained. However, the strain aG was strongly attenuated, although other factors may be responsible for the pathogenicity of rabies virus.
The L protein, together with P compose RNA polymerase complex, which is involved in enzymic activities of polymerase. The RNA-binding region, active site of polymerase, the Pro residues rich motif GXGXG, which is involved in polyadenylation or protein kinase activities, were conserved; These motifs, together with GHP were conserved in strain aG. All These results suggest A. C. Figure 6 Alignment of motif A, B, C. shade (with solid black) residues differ from consensus. A Alignment of motif A. shade (with solid black) residues differ from consensus. B Alignment motif B. shade (with solid black) residues differ from consensus. C. Alignment of motif C. shade (with solid black) residues differ from consensus.

B.
the L protein possess the same function as other strain. However, little is known about the mechanism of the function of L protein, therefore, more researches are needed and necessary.

Conclusion
In this report, we analyzed the full genome of China human rabies vaccine strain aG. Our studies indicated that the genome of aG retained the basic characteristics   of RV, including the organization of the genome, the functional domains of the each ORF. At gene level, N was the most conserved among the five coding genes, indicating We this gene is the most appropriate for quantitative genotype definition. The phylogenetic analysis of the N indicated the aG strain clustered most closely with Japanese and Russian rabies vaccine strains, suggesting that they may share the same ancestor; also, the aG strain did not share high homology with wild strains isolated from China, making it may not be the best vaccine strain, more research is needed to elucidate the genetic relationship among the RV circulating in China.
List of abbreviations RACE: rapid amplification of cDNA ends; RV: rabies virus; RT-PCR: Reverse transcription polymerase chain reaction; RNP: ribonucleoprotein; PHKC: primary hamster kidney cell.