Complete genome sequence of human astrovirus genotype 6

Background Human astroviruses (HAstVs) are one of the important causes of acute gastroenteritis in children. Currently, eight HAstV genotypes have been identified and all but two (HAstV-6 and HAstV-7) have been fully sequenced. We here sequenced and analyzed the complete genome of a HAstV-6 strain (192-BJ07), which was identified in Beijing, China. Results The genome of 192-BJ07 consists of 6745 nucleotides. The 192-BJ07 strain displays a 77.2-78.0% nucleotide sequence identity with other HAstV genotypes and exhibits amino acid sequence identities of 86.5-87.4%, 94.2-95.1%, and 65.5-74.8% in the ORF1a, ORF1b, and ORF2 regions, respectively. Homological analysis of ORF2 shows that 192-BJ07 is 96.3% identical to the documented HAstV-6 strain. Further, phylogenetic analysis indicates that different genomic regions are likely undergoing different evolutionary and selective pressures. No recombination event was observed in HAstV-6 in this study. Conclusion The completely sequenced and characterized genome of HAstV-6 (192-BJ07) provides further insight into the genetics of astroviruses and aids in the surveillance and control of HAstV gastroenteritis.


Background
Human astroviruses (HAstVs) are one of the most common causes of acute gastroenteritis in children worldwide [1][2][3]. HAstV was first identified during an outbreak of gastroenteritis among hospitalized infants in 1975 [4]. Its name is derived from its distinctive starshaped appearance under the electron microscopy (EM). Molecular analyses indicate that HAstVs are non-enveloped viruses with a 6-8 kb single-stranded, positivesense RNA genome consisting of three overlapping open reading frames (ORFs)-ORF1a, ORF1b and ORF2-as well as the 5'-and 3' nontranslated regions (NTRs) [5]. ORF 1a encodes a serine protease; ORF 1b encodes an RNA dependent polymerase; and ORF 2 encodes a capsid precursor protein [5].
HAstVs have been grouped into eight known serotypes (HAstV-1 through HAstV-8) based on their reactivity to polyclonal antibodies and on analysis by immunofluorescence assays, neutralization assays, and immunoelectron microscopy (IEM) [5][6][7]. Phylogenetic analyses of the HAstV nucleotide sequence have defined eight genotypes, and further studies have indicated a strong correlation between the genotypes and serotypes [8]. As such, genotypes are frequently applied to type HAstVs.
Genomic characterization studies are important to the understanding of the origin, molecular evolution, and phylogenetic relationships among HAstV genotypes. The full-length genome sequence for a HAstV (HAstV-2) was first determined in 1993 [9]. Subsequently, the complete genomic sequences of five more genotypes (HAstV-1, HAstV-3, HAstV-4, HAstV-5, and HAstV-8) were reported [9][10][11][12]. Because the dominant, diseasecausing HAstV type and strain often fluctuate with time and geographic location, it is critical that we characterize the complete genomic sequences of all known genotypes in order to better control and prevent future epidemics [13]. Limited sequence information for HAstV genotype 6 is available. Only a partial genome sequence has been reported [14,15], even though this genotype has been identified as one cause of sporadic or large scale outbreaks of acute gastroenteritis worldwide [16,17].
In 2007, we identified a case of HAstV-6 infection in Beijing, China, suggesting that this strain might be more epidemiologically relevant than previously recognized [18]. Here we sequenced and analyzed the complete genomic sequence of this HAstV-6 192-BJ07 strain, and describe its genetic characteristics by comparing its sequence with other known HAstV genotypes. The characterization of HAstV-6 by whole genome sequencing provides critical insight into the genetics of this virus as well as valuable information for the control and prevention of HAstV-induced gastroenteritis.

Genome organization
Complete genome sequencing of HAstV-6 (192-BJ07) was performed from a stool sample that tested positive for HAstV-6 RNA fragments by RT-PCR. Starting with the cloning of ORF2, the entire sequence of the viral genome was obtained by a step-wise amplification strategy through 5'-and 3'-RACE. The full-length genomic RNA of 192-BJ07 is 6745 nucleotides (nt) in length, excluding a poly-A tail at the 3' end. HAstV-6 (192-BJ07) has the same genome organization as other known HAstV genotypes. It has a 5'-NTR of 82 nt, a 3'-NTR of 81 nt, and three overlapping major ORFs: ORF1a (2766 nt), ORF1b (1548 nt), and ORF2 (2337 nt). Details of the predicted genome organization of 192-BJ07 are shown in Fig. 1.

ORF analysis
The sequence of the HAstV-6 192-BJ07 strain displayed similarity to those of other known HAstV genotypes. ORF1a of the HAstV-6 192-BJ07 strain shared 79.0%-79.9% nucleotide identity and 86.5-87.4% amino acid identity with those of genotypes 1 through 5 and with genotype 8. Two mutation sites were found at amino acids 757 and 758 in 192-BJ07 ORF1a, which result in the insertion of Arg and Lys.
Pairwise comparisons of the nucleotide and amino acid sequences of the ORF2 region showed that 192-BJ07 shares relatively low identity with other known HAstV genotypes in this region (62.4-72.6% nucleotide identity, and 65.5-74.8% amino acid identity; Table 1). However, 192-BJ07 exhibited high identity-96.3% nucleotide identity and 95.9% amino acid identity-with the documented sequence of the HAstV-6 strain (Gen-Bank accession number Z46658 and Table 1). Structural predictions of ORF2 indicated that there are three highly conserved amino acid residues that can be cleaved to yield proteins with different sizes: Lys 71 for a 79-kDa protein, Arg 361 for VP29 and Arg 395 for VP26 [19].

Non-coding region analysis
HAstV genotypes typically contain 80 to 85 nt in the 5'-NTR. The region shows the highest nucleotide variation (23.2-57.3%) among the five regions of the viral genome (5'-NTR, ORF1a, ORF1b, ORF2, and 3'-NTR) ( Table 1). However, the 37 nt of the very 5' end of the viral genome are highly conserved (Fig. 3). Secondary structural predictions indicate that the HAstV-6 192-BJ07 strain has three stem-loop structures in the 5'-NTR. However, because there is a high level sequence variability within the region of nt 38-85 ( Fig. 3), the consensus of the stem-loop structures is very low among HAstV genotypes (data not shown).
In contrast, we found that the nucleotide identities of the 3'-NTR sequences are as high as 92.6-98.8% compared with other known HAstV genotypes. However, the sequence variability within this region also results in secondary structure disparities of the 3'-NTR between HAstV genotypes (data not shown).
It has been recognized that HAstV RNA has a cis-acting element [ribosomal frameshifting heptamer sequence (AAAAAAC)] followed by a stem-loop structure in the ORF1a/1b junction region [22]. The 192-BJ07 strain also has such a shifty heptamer sequence and a similar stem-loop structure based on analysis with RNAstructure 4.5 software. This conservation may reflect the importance of such structures for translational regulation [22].
The ORF1b/ORF2 junction has been regarded as a regulatory element of the sub-genomic RNA (sgRNA) [23]. The alignment analysis of 52 nt at the ORF1b/ ORF2 junction revealed that 192-BJ07 has a very high identity (98.4-100%) with other HAstV genotypes, consistent with a previous report [24].

Phylogenetic analysis
Nucleotide alignment analysis of whole genome sequences showed that the identities between the HAstV-6 192-BJ07 strain and the corresponding sequences of HAstV-1, -2, -3, -4, -5 and -8 were 77.2-78.0%. We found that the evolutionary relationships among HAstV genotypes were divergent when phylogenetic analyses were performed at the level of complete genome sequence or at the level of individual proteins. The phylogenetic analysis of the whole genome sequence and the ORF1a region indicates that the HAstV-6 192-BJ07 strain is a significant outlier on the phylogenetic tree compared to the other HAstV genotypes ( Fig. 4A and 4B). In the region of ORF1b, the HAstV-6 192-BJ07 strain and HAstV-3 branch out earlier (Fig. 4C); while in the region of ORF2, HAstV-8 and HAstV-4 share the common ancestor role of other HAstV genotypes (Fig. 4D).

Recombination analysis
To determine whether there were recombination events between the 192-BJ07 strain and other known HAstV genotypes, we analyzed the similarities at the genome level. A similarity plot comparing the nucleotide sequences of HAstV-6 192-BJ07 and HAstV-1, -2, -3, -4, -5, and -8 is shown in Fig. 5. The sequence identities of the 192-BJ07 ORF1b region with the other genotypes

Discussion
In this study, we report the whole genome sequence of HAstV-6 based on a strain (192-BJ07) identified in an etiological investigation of viral gastroenteritis in Beijing [18]. The sequence analysis shows that the 192-BJ07 strain has a typical astrovirus genome organization with three ORFs (ORF1a, ORF1b, and ORF2), an 80-85 nt 5'-NTR, and an 80-85 nt 3'-NTR. Phylogenetic and homological analyses of the ORF2 regions indicate that the 192-BJ07 strain genome possesses a 95.9% amino acid identity to the documented HAstV-6 strain (GenBank accession number Z46658), but a <75% amino acid identity to other HAstV genotypes. Consistent with previous reports of other HAstV genotypes, our results also show the existence of three potential cleavage sites at Lys 71, Arg 361, and Arg 395 in HAstV6 ORF2 [3,19,20]. It is thought that the cleavage at Lys 71 leads to the generation of the 79-kDa capsid protein [19]. The 79-kDa capsid protein can be converted into three smaller peptides-VP34, VP29, and VP26-and leads to an enhancement of HAstV infectivity [19]. Our observations support the critical role of these three amino acid residues in HAstV replication and pathogenesis.
In our study, we found two insertional mutations, Arg 757 and Lys 758, in ORF1a. How these hydrophilic amino acids contribute to the characteristic/function of the virus is unknown at present and needs to be addressed in further functional studies.
Our phylogenetic analysis suggests that HAstV-6 may be an ancestor of other HAstV genotypes as shown by the phylogenetic analysis of the whole genome sequence (Fig. 4A). This observation was further supported by the phylogenetic analysis of the ORF1a protein region (Fig.  4B). Moreover, detailed analysis of all genotype ORF1b amino acid sequences indicates that HAstV-6 and HAstV-3 may have functioned as the common ancestor of other HAstV genotypes (Fig. 4C). However, the analysis of HAstVs ORF2 suggests that HAstV-8 and HAstV-4 may have been the common ancestor of other HAstV genotypes (Fig. 4D). Different evolutionary and selective pressures in different HAstV genomic regions may be responsible for this discrepancy of the evolutionary relationships [25].
The secondary structure predictions indicate that stem-loop structures are not conserved in the 5'-and 3'-NTRs of known HAstV genotype genomes. This difference may be responsible for the possible discrepancy at the replication and/or transcription level among HAstV genotypes. The fact that the 5'-end of the 5'-NTR and the 3'-NTR and the 52 nt region at the ORF1b/ORF2 junction are highly conserved points to their critical role in the interaction with the viral replicative or transcriptive machinery. The variation in the 3'-end of 5'-NTR may influence the efficiency of viral genome replication or transcription, resulting in a difference in replication ability or virulence among different genotypes or strains [5].
The -1 ribosomal frameshifting is critical for the translation of the astrovirus genome [22]. The -1 ribosomal frameshifting requires two cis-acting signals: a shifty heptamer sequence (AAAAAAC) and a potential stemloop structure [10,26]. This study showed that the HAstV-6 192-BJ07 strain also has such cis-acting elements, and further demonstrates the conservation of such elements among HAstV genotypes [5].
At present, the mechanism of HAstVs' variations is unclear. One study has indicated that recombination may be responsible for HAstVs' variation [24]. However, current studies have not broadly established the role of recombination in HAstV variation [25,27]. In agreement with most reports, we found no clear evidence of recombination between the 192-BJ07 strain and other HAstV genotypes based on similarity plot analysis. Diversification of the HAstV amino sequences may be attributed to accumulated single nucleotide mutations. This mechanism is similar to the antigen drift in other viruses, such as in influenza viruses [28,29], which could lead to HAstVs escaping from existing host immunities and could result in the emergence of a new epidemic HAstV strain [30]. Additional studies, such as large scale whole genome sequencing, are needed to address the evolutionary patterns of HAstVs.

Conclusion
We have sequenced and characterized the complete genome of HAstV-6 (192-BJ07). This sequence will provide insight into the genetics of astroviruses, broaden our understanding of their properties, and inform surveillance and control of HAstV gastroenteritis around the world.

RNA extraction
A stool sample (termed 192-BJ07) that tested positive for HAstV-6 by RT-PCR was collected from a 2-year old boy who visited the Beijing Children's Hospital with acute diarrhea in 2007 [18]. Viral RNA was extracted

ORF2 amplification
The primers ORF2-F (5'-atggctagcaagtctgacaagcagg-3') and ORF2-R (5'-gaagctgtaccctcgatcctactc-3') targeting ORF2 of 192-BJ07 were designed based on the only available HAstV-6 sequence in GenBank (GenBank accession number Z46658). For reverse transcription (RT) reactions, cDNA was generated with the Super-Script™ III RT kit (Invitrogen, Carlsbad, CA) using a random primer (Takara, Dalian, China) as described in the manufacturer's protocol. The PCR reaction was performed as follows: 94°C for 3 minutes, 35 cycles of amplification (94°C for 30 seconds; 50°C for 30 seconds; and 72°C for 3 minutes), and a final 10 minutes extension at 72°C. The PCR products were analyzed by 1.0% agarose gel electrophoresis and stained with ethidium bromide.

Genome amplification and sequencing
Rapid amplification of cDNA end (RACE) reactions were performed to obtain the entire sequence of the viral genome by using the 5'-and 3'-RACE System for Rapid Amplification of cDNA Ends kit (Invitrogen, Carlsbad, CA) according to the manufacturer's protocol. The ORF2 sequence obtained above was used as the starting point for the amplification. PCR-amplified products were cloned into the pMD18-T vector (TaKaRa, Dalian, China) and were introduced into chemically competent E. coli DH5α cells. The plasmid DNA was sequenced using an ABI3730 DNA Analyzer (Applied Biosystems). The complete genome sequence of HAstV-6 has been deposited in GenBank (GenBank Accession number GQ495608).

ORF prediction and RNA structure analysis
ORF1a and ORF2 were predicted for HAstV-6 192-BJ07 using the DNAStar ORF search program. ORF1b was predicted based on the "shifty"' heptanucleotide (AAAAAAC) that occurs in other HAstVs [9]. RNA secondary structures were evaluated using RNAstructure 4.5 software.

Phylogenetic analysis
The MegAlign programs in the DNAStar software package were used to perform multiple sequence alignments. HAstV phylogenies with 1000 bootstrap replicates were created using the neighbor-joining method and the Kimura two-parameter model with the MEGA software version 4.0 [31].