Polymorphic genetic characterization of the ORF7 gene of porcine reproductive and respiratory syndrome virus (PRRSV) in China

Background Porcine reproductive and respiratory syndrome virus (PRRSV) exhibits extensive genetic variation. The outbreak of a highly pathogenic PRRS in 2006 led us to investigate the extent of PRRSV genetic diversity in China. To this end, we analyzed the Nsp2 and ORF7 gene sequences of 98 Chinese PRRSV isolates. Results Preliminary analysis indicated that highly pathogenic PRRSV strains with a 30-amino acid deletion in the Nsp2 protein are the dominant viruses circulating in China. Further analysis based on ORF7 sequences revealed that all Chinese isolates were divided into 5 subgroups, and that the highly pathogenic PRRSVs were distantly related to the MLV or CH-1R vaccine, raising doubts about the efficacy of these vaccines. The ORF7 sequence data also showed no apparent associations between geographic or temporal origin and heterogeneity of PRRSV in China. Conclusion These findings enhance our knowledge of the genetic characteristics of Chinese PRRSV isolates, and may facilitate the development of effective strategies for monitoring and controlling PRRSV in China.


Background
Porcine reproductive and respiratory syndrome (PRRS) is a severe disease characterized by reproductive disorders in gilts and sows, especially during late gestation, and by respiratory distress in pigs. The disease first emerged in late 1987 in the United States [1] and three years later in Europe [2]. Porcine reproductive and respiratory syndrome virus (PRRSV), the causative agent of PRRS, belongs to the family Arteriviridae in the order Nidovirales [3], and is an enveloped virus with a singlestranded positive sense RNA genome containing nine open reading frames (ORFs) [4]. The ORFs 1a and 1b encode the non-structural proteins Nsp1a, Nsp1b, and Nsp2-12, while ORF2a, ORF2b, and ORFs 3-7 encode the structural proteins GP2a, GP2b, GP3, GP4, GP5, M, and N, respectively [5].
Two genotypes are recognized for PRRSV, the North American type and the European type, as represented by the prototypes VR-2332 and Lelystad virus (LV), respectively [6]. Significant genetic differences have been described both between these two genotypes and within the same genotype of PRRSV [7][8][9].
Since the first report of PRRSV in China in 1996 [10], the North American type PRRSV, with considerable genetic variation, has spread throughout the country [11][12][13]. Since 2006, several highly pathogenic PRRS outbreaks have been reported in China, causing severe economic losses in the swine industry [14][15][16]. The molecular characterization of these PRRSVs is thus a major focus of Chinese virology research [17][18][19]. However, previous studies have focused mainly on the ORF5 or Nsp2 genes, while the genetics of the Chinese PRRSVs based on the ORF7 gene are not well characterized. The ORF7 encodes the nucleocapsid protein (N), the most abundant viral protein in virus-infected cells [5] and the most immunodominant antigen in the pig immune response to PRRSV [20]. ORF7 is, therefore, a promising candidate for serological detection and diagnosis [21,22]. Indeed, the N protein has been extensively used for determining the genetic variation and the phylogenetic relationships among PRRSVs [23][24][25], suggesting a significant role for ORF7 in PRRSV evolutional surveillance.
In this paper, we sequenced the ORF7 and Nsp2 genes of seven PRRSV strains isolated during the outbreak of highly pathogenic PRRS in China. We analyzed the ORF7-coding region to assess the genetic variation of PRRSV in China and to better understand the molecular epidemiology of PRRS.

Results
The genetic diversity of the Nsp2 gene among Chinese PRRSV isolates To determine whether the isolates examined in this study possessed the same characteristic deletions in the Nsp2 gene found in the highly pathogenic PRRSVs previously described [14,15], the partial Nsp2 sequences of all 7 PRRSV isolates were sequenced and aligned with the 93 isolates in GenBank. The GenBank sequences included 91 Chinese reference isolates, in addition to the North American prototype VR-2332 and its attenuated vaccine virus RespPRRS MLV (Table 1). Alignment analysis of the deduced amino acid sequences revealed that 23 Chinese isolates, including our XJ isolate, contained no deletions or insertions in comparison to VR2332 and RespPRRS MLV. In contrast, deletions occurred within the Nsp2 protein in 75 Chinese strains. Of these, 74 were isolated during the outbreak of highly pathogenic PRRSV (except for the stain HB-2sh collected in 2002). The HB-2sh strain contains a continuous 12 amino acid deletion at position 470-481, while the other 74 isolates have four different types of deletions. Sixty-four Chinese reference isolates and 6 of our isolates (07N, 128, PC, TS, XIN, and XB) contained a discontinuous deletion of 1 and 29 amino acids at positions 482 and 533-561. In addition, CG and GDQY2 contained a discontinuous deletion of 36 and 29 amino acids at positions 471-506 and 533-561. The YN9 strain contained a discontinuous deletion of 25 and 29 amino acids at positions 478-502 and 533-561. The Em2007 strain had a continuous deletion of 68 amino acids at position 499-566. Therefore, the highly pathogenic PRRS viruses with a discontinuous 30 amino acid deletion have been the dominant strains circulating in China.

Phylogenetic analysis based on the ORF7 sequence
In order to better understand the genetic relationship and evolution of PRRSVs in China from 1996 to 2010, phylogenetic analysis was carried out based on the ORF7 gene sequences of our 7 isolates and the 93 reference viruses in GenBank (VR-2332, RespPRRS MLV, and 91 Chinese reference isolates).
Subgroup IV contained 2 of our isolates (128 and XIN) and 6 highly pathogenic reference viruses (09HUB7, BJSY07, BJSY-1, CG, HLJHL, and KP). They shared 98.4-100% nucleotide sequence identity and 99.2-100% amino acid sequence identity. Subgroup V consisted of 65 isolates, including 4 of our isolates (07N, PC, TS, and XB) and 61 highly pathogenic reference isolates. Their nucleotide and amino acid sequences were 95.7-100% and 94.3-100% identical. Highly pathogenic PRRSVs were concentrated in Subgroups IV and V, except for the strain Em2007, which was located in subgroup II and considered a recombinant between the vaccine strain CH-1R and the highly pathogenic virus [26].
The identities among the 98 Chinese isolates (our 7 isolates and 91 Chinese reference isolates) in all the 5 subgroups ranged from 90.9% to 100% for the nucleotide sequences and from 88.6% to 100% for the amino acid sequences. All 98 Chinese isolates were of the North American genotype. These Chinese isolates demonstrated 91.4-100% nucleotide sequence identity and 91.1-100% amino acid sequence identity with the North American prototype VR-2332 and its derived vaccine virus RespPRRS MLV (VR-2332 and RespPRRS MLV have identical ORF7 sequences). In contrast, the Chinese isolates had 67.8-71.1% nucleotide sequence identity and 62.5-65.8% amino acid sequence identity with the Lelystad virus (the European prototype).

Sequence comparison of ORF7 gene among Chinese isolates
The ORF7 sequences of the 5 subgroups, including 98 Chinese isolates and the 2 foreign isolates (VR-2332 and RespPRRS MLV), were further compared and analyzed. As shown in Figure 2 all of the PRRSV ORF7 sequences were the same length (372 nt), and encoded 123 amino acid residues. Many amino acid substitutions were observed in subgroups II, III, IV, and V when compared with VR-2332. Some substitutions, such as R11K and D15N, occurred in subgroups II, III, IV, and V. The V117A substitution was unique to subgroups III, IV, and V, while T91A was unique to subgroups III and V. Interestingly, two substitutions, K46R and H109Q, were exclusive to all isolates in subgroups IV and V, the most highly pathogenic PRRSVs. In addition, subgroup II strains had two distinct substitutions from the other subgroups, Q9R and N49 S, with the exceptions of CH-1a (no substitution occurred at amino acid residue 9) and Em2007 (N49H). The distribution of sequence diversity across the ORF7 protein was investigated for all 100 sequences analyzed in this study. These sequences contained 108/372 (29.0%) polymorphic nucleic acid positions and 36/123 (29.6%) polymorphic amino acid positions. However, frequent amino acid alterations were only found in the individual residues 11, 15, 46, 91, 109, and 117, while conserved regions were present primarily at positions 16-45, 55-90, and 92-108 (Figure 3a). Figure 3b shows the amino acid positions plotted versus the difference between nonsynonymous and synonymous substitution rates (dN-dS). The overall difference between dN and dS for ORF7 was negative (-0.03258 ± 0.01219), indicating that ORF7 was under purifying selection. Most negative values were located mainly in the middle domain (residues 55-90) of the protein. However, some residues were under positive selection, such as the above-mentioned six amino acid sites 11, 15, 46, 91, 109, and 117.
Hydrophilic analysis was performed using at least three isolates in each subgroup. Figure 3c shows the ORF7 hydrophilicity plots of isolates VR2332, BJ-4, and HB-2sh in subgroup I, CH-1a, CH2003, and Em2007 in subgroup II, BJ0706, HB-1sh, and XJ in subgroup III, CG, 128, and XIN in subgroup IV, and HUN4, JXA1, and XB in subgroup V. The hydropathy profiles of the PRRSV ORF7 s indicated that the proteins were highly hydrophilic, especially in the N-terminal domain, with two large hydrophilic regions at positions 5-21 and 30-74. The C-termini also possessed several small hydrophilic regions at 81-85, 95-99, and 106-112. The differences in hydrophilicity plots between these isolates were centralized in the segments 7-15, 41-55, 62-73, 85-105, and 113-120, and all resulted from single amino acid substitutions within these regions.

Discussion
Porcine reproductive and respiratory syndrome virus (PRRSV) has been one of the most economically damaging pathogens for the swine industry world-wide. Since it first emerged in 1996, the virus has spread widely throughout pig-producing provinces of China, imposing a considerable economic burden on the swine industry, especially after the outbreak of highly pathogenic PRRS in 2006 [14,15]. Studies have been performed on genetic variability of these isolates, revealing extensive sequence Pair wise comparisons showed that 6 isolates characterized in this study (07N, 128, PC, TS, XIN, and XB) and 64 Chinese reference isolates contained a discontinuous deletion of 30 amino acids in the Nsp2 gene. The deletion is considered a gene marker for highly pathogenic PRRSV; however, it is not related to virulence [27]. Further analysis of the complete ORF7 sequences revealed that all the 98 Chinese PRRSV isolates analyzed exhibited a very high degree of genetic diversity, and clustered into 5 subgroups, suggesting the coexistence of related non-identical PRRS viral variants evolving independently. Subgroup I isolates shared a high identity with the MLV vaccine and its parent virus VR-2332. Subgroup II isolates were highly homologous to the CH-1R vaccine and its parent virus CH-1a. Isolates in subgroups IV and V were all highly pathogenic PRRSVs, and distinct from the MLV or CH-1R vaccine. These highly pathogenic subgroups IV and V PRRSVs are the dominant strains circulating in China, and so should be the focus when formulating preventive and control measures against PRRSV.
No apparent relationship between geographic and genetic distance was found for the isolates based on the N protein in the study, especially for the highly pathogenic strains, since these atypical viruses existed throughout the mainland of China. A correlation between temporal and genetic distance was also not found, as the highly pathogenic PRRS outbreak occurred in several pig farms in the summer of 2006 and rapidly spread to almost all pig-producing provinces of China. Our data indicate that the disease is still circulating in China.
The nucleocapsid protein (N) encoded by ORF7 is highly immunogenic, and several antigenic domains have been mapped onto N in both the European and the North American PRRSV. A common linear epitope conserved among different isolates of European and North American origin was located in the amino acid segment 50-66 [28]. Another linear epitope, conserved in European and North American isolates, was identified in amino acids 25-30 [29]. Wootton et al. found three additional linear epitopes (residues 30-52, 37-52, and 69-123) and one discontinuous epitope utilizing residues 52-69 and 112-123 [30]. In addition, four other linear epitopes at 23-33, 30-48, 30-50, and 43-56 were observed in VR2332 [31]. For the North American isolate CH-1a, the first Chinese isolate, epitopes were reported in amino acid segments 51-58 and 79-87 [32,33].
Extensive substitutions were observed in the 123residue nucleocapsid protein on the basis of the alignment. Substitutions K46R and V117A occurred in all the highly pathogenic PRRSVs, which might impede the recognition of the epitopes encompassing or flanking the two substitutions by anti-N mAbs. A previous study confirmed that the 11 C-terminal residues 112-123 were essential for the generation of discontinuous epitopes [30]. Single amino acid substitutions introduced into the C-terminal domain show that the requirement of the C terminus for conformation-dependent mAb binding correlates with the proper formation of the predicted beta-strand formed by amino acids 111-117 [34]. Therefore, it is likely that the mutation V117A observed in highly pathogenic PRRSVs could exert great influence on the structure and antigenicity of N protein.
Two other substitutions, R11K and D15N, occurred in subgroups II, III, IV, and V, although there are a large number of Lys (K) and Asn (N) residues in the Nterminal half of the 123-residue nucleocapsid protein. The accumulation of these residues in the N terminus might function in the interaction with genomic viral RNA [37].
Some conserved sites were also observed from our alignment analysis. For example, three cysteine residues at amino acid positions 23, 75, and 90 were highly conserved in all isolates. Covalent interactions were formed through disulfide linkages between conserved cysteines at position 23 in North American strains, while the domain 30-37, which was also conserved in all isolates in this study, was shown to be essential for non-covalent interactions [38]. The C75 S mutant induced cytopathic effects and produced infectious strains with plaque morphology indistinguishable from the wild type clone. In contrast, the C23 S and C90 S mutations completely abolished viral infectivity, indicating that C23 and C90 play critical roles in PRRSV infection [39].
Conserved regions were also found by variability analysis at positions 16-45, 55-90, and 92-108. These highly conserved amino acid segments are probably associated with nucleocapsid structure and/or function. Meanwhile, non-synonymous mutations did not occur more frequently than synonymous mutations among the Chinese isolates, and the main variable sites and non-synonymous mutations (residues 11, 15, 46, 91, 109, and 117) were distributed in the hydrophilic regions prone to immune pressure.
The N protein contains important immunogenic epitopes, and the majority of antibodies produced during PRRSV infection are specific for it [20,40]. Thus, the N protein has been targeted as a suitable candidate for the detection and diagnosis of PRRS. Numerous serological diagnostic tests have been developed based on the N protein [21,22,[41][42][43]. Additionally, PCR is another widely used method for detecting the viruses, and ORF7 has been regarded as a promising target gene [25,44,45] due to its sequence stability relative to other structural genes [46]. However, the high genetic variability among the ORF7 sequences of the Chinese PRRSV isolates observed in this study should be taken into consideration when designing serological or molecular detection methods for PRRSV diagnosis and epidemiological surveillance.
In conclusion, all the ORF7 sequences of PRRSV isolates from 1996 to 2010 in China belonged to the North American type. Chinese strains were categorized into five subgroups. The highly pathogenic PRRSVs have become the dominant strains in China. Our study provides the first genetic analysis of the Chinese PRRSV N protein. These results could lead to a better understanding of the molecular variation of PRRSV in China and to the development of more effective vaccines and reliable diagnostic methods.

Sample origin
Fresh tissues were sampled during 2007-2010 from different swine herds in Gansu province of northwestern China. This region experienced outbreaks of severe reproductive failure in pregnant sows and respiratory problems in sucking and post-weaning piglets. All samples were stored in ice boxes and transported to the laboratory immediately. The materials were frozen at -80°C until analyzed.

RNA extraction and RT-PCR
Total RNA was extracted from the tissue homogenates using Qiagen RNeasy Mini kit (Qiagen, Hilden, Germany). Viral cDNA was synthesized using Oligo dT primer according to the manufacturer's instructions (TaKaRa, Dalian, China). The sense primer for Nsp2 amplification was 5'-AGGAAGGTCAGATCCGATTG-3' and the reverse primer was 5'-CGTCTGTGAGGACG-CAGACA-3'. The cycling conditions were 95°C for 4 min, followed by 30 cycles of 94°C for 1 min, 58°C for 30 sec, and 72°C for 30 sec, and a final extension at 72°C for 10 min. This yielded a 370 bp fragment of the Nsp2 gene. For ORF7 gene amplification, the sense primer was 5'-AAGCCTCGTGTTGGGTGGCAG-3' and the antisense primer was 5'-TCTCCCAATTCTAA-CACTGAG-3'. The PCR protocol included 95°C for 4 min and then 30 cycles of 94°C for 1 min, 56°C for 30 sec, 72°C for 30 sec, followed by a final extension at 72°C for 10 min. Amplification yielded the complete sequence of the ORF7 gene.

Nucleotide sequencing
The PCR products were purified using a PCR purification kit (Axygen, USA) and cloned into the pMD18-T vector (TaKaRa, China). At least three clones were generated from each cDNA fragment and used for sequencing.

Data analysis
Sequence analysis of the partial Nsp2 gene and the complete ORF7 gene from all 7 isolates found in this study, in addition to 93 reference isolates from GenBank (Table 1), was conducted using the Lasergene sequence analysis software package (DNASTAR Inc., Madison, WI). The CLUSTAL W program was used for multiple sequence alignment. The unrooted phylogenetic tree was generated by the distance-based neighbor-joining method using MEGA 4.0. Boot-strap values were calculated using 1000 replicates of the alignment. A hydrophilicity profile was generated with the ProtScale web utility http://expasy.org/tools/protscale.html by the Kyte and Doolittle method. Furthermore, the amino acid position was plotted versus the difference between nonsynonymous and synonymous substitution rates (dN-dS). The difference was calculated with the SNAP web utility http://hiv-web.lanl.gov/content/hiv-db/SNAP/ WEBSNAP/SNAP.html.