Molecular characterization and phylogenetic analysis of the complete genome of a porcine sapovirus from Chinese swine

Background Porcine sapovirus was first identified in the United States in 1980, hitherto, several Asian countries have detected this virus. In 2008, the first outbreak of gastroenteritis in piglets caused by porcine sapovirus in China was reported. The complete genome of the identified SaV strain Ch-sw-sav1 was sequenced and analyzed to provide gene profile for this outbreak. Methods The whole genome of Ch-sw-sav1 was amplified by RT-PCR and was sequenced. Sequence alignment of the complete genome or RNA dependent RNA polymerase (RdRp) gene was done. 3' end of ORF2 with 21-nt nucleotide insertion was further analyzed using software. Results Sequence analysis indicated that the genome of Ch-sw-sav1 was 7541 nucleotide long with two ORFs, excluding the 17 nucleotides ploy (A) at the 3' end. Phylogenetic analysis based on part of RdRp gene of this strain showed that it was classified into subgroup GIII. Sequence alignment indicated that there was an inserted 21-nt long nucleotide sequence at the 3' end of ORF2. The insertion showed high antigenicity index comparing to other regions in ORF2. Conclusion Ch-sw-sav1 shared similar genetic profile with an American PEC strain except the 21-nt nucleotide at the 3' end of ORF2. The insert sequence shared high identity with part gene of Sus scrofa clone RP44-484M10.


Background
Caliciviridae is a family of positive sense single-stranded RNA viruses comprised of both human and animal pathogens [1]. Caliciviridae family contains four genera, Lagovirus, Vesivirus, Norovirus and Sapovirus [2]. Various caliciviruses possess common features. For example, they are small, non-enveloped virus, 27-38 nm in diameter. They possess a single-stranded, 7.3-8.3 kb plus-sense RNA genome, a single 56-71 kD capsid protein [3], and a polyprotein containing confering motifs of a putative 2C helicase, 3C-like protease, and 3D RdRp. SaV are recognized as emerging enteric pathogens in humans, swine and mink [4]. SaV infection may cause diarrhea especially in the younger [5]. It is currently divided into eight distinct genetic groups (GI-GVIII) based on the RdRp gene. Among these genetic groups, GIII can't infect humans but can be cultured in vitro in the presence of bile acid [6]. The genome of SaV consists of 7.1-7.5 kb nucleotide and encodes two or three open reading frames (ORFs). ORF1 encodes one polyprotein that contains coding sequences for the nonstructural proteins and the major capsid protein (VP1), ORF2 encodes the minor structural protein (VP2), while ORF3 is only present in strains from genotypes GI, GIV and GV, and encodes a small basic protein [7]. SaV is considered as a significant global enteropathogen of acute gastroenteritis [8]. Recently, it was shown that the host tropism of some calicivirus is less specific. Some calicivirus may have zoonotic potential, and animals such as domestic pig may be a reservoir for caliciviruses [9][10][11]. Porcine sapovirus was first identified in the United States by electron microscopy in 1980 [12] and genetically characterized as a sapovirus in 1999 [13]. Recently, SaV infections have been identified in Japan, South Korea, Venezuela, Hungary and Belgium [14][15][16][17][18]. In the United States, porcine sapovirus was also detected from Oyster [19]. Although porcine SaV was mainly detected in pigs, some studies indicated that some porcine SaV might be potential pathogencity transmitting to humans. For example, the porcine SaV strain (Sapovirus pig/43/06-18p3/ 06/ITA) isolated from Italy was most closely related to human SaV through the alignment of RdRp sequences, suggesting the possibility of a pig reservoir for human strains or vice versa [20]. We previously reported an outbreak of gastroenteritis in piglets in China caused by the first Chinese porcine SaV strain [21]. In this study, gene profile of this strain was investigated, the entire viral genome and 3' end of Ch-sw-sav1 were cloned and sequenced.

Samples
Porcine SaV positive fecal samples were collected from commercial pig farms in Shanghai as introduced in our previous study. Samples were converted to 20% (wt/vol) suspensions in phosphate-buffered saline (PBS) (0.01 M, pH 7.2 to 7.4) and clarified by centrifugation at 10,000 g for 10 min.

Primers Design
In order to amplify the full-length sequence, 15 sets of primers were designed based on the sequences of AF18276 and DQ056363 that were previously submitted in the GenBank: Nucleotide sequence and position of the primers are listed in Table 1. Purfied PCR products were ligated to pMD-18T vector (TaKaRa, Japan) and 3 to 5 positive colonies were sequenced.

3' RACE
The 3' RACE was carried out with TaKaRa RNA PCR Kit (TaKaRa, Japan) following the manufacture's instructions.
Briefly, ten microliters of RNA were used as template to synthesize cDNA with AMV Reverse transcriptase for 1 h at 42°C. The external reverse primer which has a poly (T) tract was used to prime the cDNA synthesis. The cDNA was then amplified with the external forward primer (5'-TCAATTGGCTGGG TCACGTGAAG-3', nucleotide position numbers 7027-7049) and internal forward primer (5'-CAAACACCTTTGGTCCACCAAGG-3', nucleotide position numbers 7070-7092) with Ex Taq DNA polymerase (TaKaRa, Japan). The PCR reaction mixture was incubated for 2 min at 94°C, followed by 35 amplification cycles comprising denaturation at 94°C for 30 s, annealing at 65°C for 30 s, and extension at 72°C for 30 s. The product was extended for another 7 min at 72°C to ensure a full extension.
The PCR products were purified from 1% agarose gel using the QIAquick Gel Extraction kit (Qiagen, Gemany). Purified PCR products were ligated into pMD18-T Vector. For each product, three to five positive colonies were selected and sequenced.

Phylogenetic analysis
Nucleotide sequences of the following calicivirus in Genbank were used in the phylogenic analysis (   The sequence determined in current study was deposited in GenBank, the name was Ch-sw-sav1 and the accession number was FJ387164.

Genomic organization of Ch-sw-sav1 virus
The complete RNA genome of Ch-sw-sav1 is consisted of 7541 nt, excluding its 3' end poly(A) tail, was longer than the USA strain (GenBank no.: AF182760). It's A, C, G, U ribonucleotide composition was 19%, 14.3%, 33.3%, and 33.3%, respectively. The 5' terminus genomic RNA started with the featured trinucleotide GTG. Similar to the genomes of SVs and LVs, the Ch-sw-sav1 genome contained two predicted ORFs. ORF1 was 6765 bases (2255 aa) in length encoding non-structural proteins and VP1 (544aa). ORF2, consisting of 516 bases (nt 6771-7286), was predicted to encode VP2 protein with 172 aa. (Fig.  1A). The predicted polyprotein encoded by ORF1 contained the common 2C helicase (GPPGIGKT), 3C protease (GDCG), and RdRp (GLPSG and YGDD) motifs that were highly conserved in all calicivirus. The PPG motif was also present in the predicted VP1 (data not shown). Phylogenetic tree generated for the sequences in the complete genome Figure 2 Phylogenetic tree generated for the sequences in the complete genome. Phylogenetic tree constructed on the basis of the complete genome sequence. All sequences were collected from GenBank. The virus detected in this study was marked with black triangle. Trees were prepared using the Treeview programs and all branches supported based on 100 bootstrapped data sets.

Sequence comparison
We compared the entire genome sequence identities of Ch-sw-sav1 with those of other calicivirus, A phylogenetic tree based on the entire genome sequence showed that Ch-sw-sav1 was closely related to the SLVs than to the other caliciviruses (Fig. 2). The phylogenetic tree was then constructed on the basis of concentrated alignments of RNA dependent RNA polymerase gene sequence of 31 SaV strains by the neighbour-joining method (Fig. 3). All eight genotypes were separated into corresponding lineages. Within the genotype-3 lineage, there were four distinct subgroups. The analysis indicated that Ch-sw-sav1 formed a subgroup together with two USA strains, one Japanese strain and one Hungary strain. Further analysis indicated Ch-sw-sav1 shared 82.2%-91.2% identities with the other GIII SaV strains, and it was closely related to the Hungary variant DQ383274 (Table 3). Whereas, it was less similar (< 57.1%) to the strains of GI, GII, GIV, GV, GVI, GVII, GVIII.
The 5' terminus of the genomic and predicted subgenomic RNAs of Ch-sw-sav1 possessed leader sequences with a Kozak structure (G/ANNATGG), which was favourable for translation initiation of eukaryotic mRNA [22] (Fig. 1B), similar to that of PEC (GenBank No.: AF182760) [13], The VP1 region (544aa) of Ch-sw-sav1 was the same in length as in PEC and slightly shorter than those of SaVs of human origin. The ORF2 overlapped 4 nucleotides with VP1 gene, common to others in PEC (Fig. 1C), but the length of ORF2 was distinct. Sequence alignment based on the 3' end of ORF2 of six available sequences in Gen-Bank indicated that there was 21-nt long nucleotide sequence insertion, which was similar to the gene module of OH-JJ-259-00-US strain (GenBank No.: AY826423) with 27-nt long nucleotides inserted (Fig. 4). Analysis of antigen index showed that the inserted sequence was Unrooted phylogenetic tree of calicivirus RdRp gene sequences constructed by the neighbor-joining method Figure 3 Unrooted phylogenetic tree of calicivirus RdRp gene sequences constructed by the neighbor-joining method. Phylogenetic tree constructed on the basis of concentrated RdRp gene sequence. Trees were prepared using the Treeview programs and are based on 100 bootstrapped data sets. All sequence used in this analysis were collected from GenBank. The virus detected in this study was marked with black triangle and it was composed of a cluster with PEC/swine-Id3/2005/HUN and Sapovirus swine/NC-QW270/03/US, they also belong to porcine SaV genotype GIII.
Nucleotide acid alignment of 3' end sequences of VP2 among six porcine SaV strains Figure 4 Nucleotide acid alignment of 3' end sequences of VP2 among six porcine SaV strains. The numbers above the alignment show the nucleotide location in the ORF2. The nucleotide with the white background is differential. The inserted sequence of Ch-sw-sav1 is from 27-nt to 46-nt within the affluent antigen site besides another at the 3' end of ORF2 (Fig. 5).

Discussion
Sapporo virus was identified in 1982 from an outbreak of diarrhea in an orphanage in Sapporo, Japan [23]. Schuffenecker [24] classified them into three major genetic groups. Furthermore, it has been divided into eight genogroups based on the genetic diversity of the viral polymerase [25]. PEC, the first of pig origin, was discovered in 1980s in the United States and belongs to SaV GIII [12].
Hitherto, SaV has been identified in many countries [14][15][16][17][18]. Traditionally, we thought only SaV GIII infected pig. However, strains detected in USA and Italy that belonged to new genotype showed high homology with human SaVs respectively. It indicated that animals might act as reservoirs for human caliciviruses. So it is necessary to analyze the genetic profile of porcine SaV for the first step of controlling the pathogen. In February 2008, we reported the first outbreak of gastroenteritis caused by porcine SaV in piglets in China mainland. It may be caused by simultaneous contact with virus polluted water Antigen index analysis of 3' end sequences of VP2 among six porcine SaV strains or food and the virus gene profile was further investegated. Ch-sw-sav1 was chosen to be sequenced and compared with other SaV published. Results showed that it shared high homology with PEC for the similar gene structure and similar sequence motif at 5' terminus that was favorable for translation initiation of eukaryotic sequence [22]. However, there was 21-nt nucleotide insertion at the 3' end of ORF2 of Ch-sw-sav1. The inserted sequence had a high antigenicity index analyzed with DNAstar software. It's predicted that ORF2 encodes capsid protein that is correlative with the assembly, antigenicity and receptor interations of SaV. So the inserted sequence may affect antigenicity profile or other profiles of capsid protein which need to be further identified [1]. Accordingly, in phylogenetic analysis, we classified Ch-sw-sav1 into Genogroup III of SaV basing on the partial RdRp gene sequence, and it shared highest nucleotide identity with the Hungary SaV (91.2%) which was isolated from a diarrheaed pig [17].
The porcine SaV strain in the present study came from an outbreak of gastroenteritis in piglets group, which had inserted sequence at the 3' end of ORF2. The role of the inserted sequence was unknown, but it is highly divergent in sequence and differs in size in caliciviruse s. Since the ORF2 protein is functionally conserved and may be involved in protein-protein interactions or proteinnucleic acid interactions during replication based on its strong positive charge. The inserted sequence likely has special biological function. So establishing full-length infectious clones containing or not containing this inserted fragment would now be the next step towards the identification of this fragment involved in symptomatology and pathogenicity.

Conclusion
Complete sequence of the first Chinese porcine SaV was determined and analyzed providing a gene profile of porcine SaV presented in swine population in China today. Sequence analysis showed that it was classified into genogroup III with two ORFs. A 21-nt insertion in ORF2 changed antigenicity index of capsid protein.