Mapping codon usage of the translation initiation region in porcine reproductive and respiratory syndrome virus genome

Background Porcine reproductive and respitatory syndrome virus (PRRSV) is a recently emerged pathogen and severely affects swine populations worldwide. The replication of PRRSV is tightly controlled by viral gene expression and the codon usage of translation initiation region within each gene could potentially regulate the translation rate. Therefore, a better understanding of the codon usage pattern of the initiation translation region would shed light on the regulation of PRRSV gene expression. Results In this study, the codon usage in the translation initiation region and in the whole coding sequence was compared in PRRSV ORF1a and ORFs2-7. To investigate the potential role of codon usage in affecting the translation initiation rate, we established a codon usage model for PRRSV translation initiation region. We observed that some non-preferential codons are preferentially used in the translation initiation region in particular ORFs. Although some positions vary with codons, they intend to use codons with negative CUB. Furthermore, our model of codon usage showed that the conserved pattern of CUB is not directly consensus with the conserved sequence, but shaped under the translation selection. Conclusions The non-variation pattern with negative CUB in the PRRSV translation initiation region scanned by ribosomes is considered the rate-limiting step in the translation process.


Introduction
Porcine reproductive and respiratory syndrome virus (PRRSV) infection causes serious disease in swine populations with a series of clinical consequences, such as high mortality, reproductive failure, post-weaning pneumonia and growth reduction [1,2]. Based on its serological characteristics, PRRSV has two main serotypes, which named the Northern American isolate (US) and the European isolate (EU), respectively [3][4][5][6][7]. PRRSV is an enveloped, single-stranded positive-sense RNA virus with a genome size of about 15.4kb and classified into the order Nidovirales of family Arteriviridae [8,9]. The PRRSV genome contains ORF1a, encoding papain-like cysteine protease, ORF1b, encoding RNA dependent RNA polymerase, ORF2-6, encoding envelop proteins, and ORF7, encoding the nucleocapsid protein [10][11][12][13]. Despite a well-organization of the ORFs within the single RNA genome, viral proteins are in fact encoded from subgenomic RNAs that are likely generated through a discontinuous transcription mechanism [12,14]. Therefore, each subgenomic RNA could be translated at different translation rates that are regulated by codon usage bias (CUB). Because the faster a polypeptide chain is completed, the more rapid the ribosomes return to initiate and complete another polypeptide chain. The relationship between the efficiency of translation initiation and the level of gene expression has been wellestablished in many species [15][16][17][18][19]. Moreover, when the distance between the initiation codon and the non-preferential site is less than 50-60 positions (codons), the ribosomes can be blocked at the non-preferential positions to shape a queue of ribosomes [20].
It is generally considered that the alternative synonymous codons are not used with equal frequencies among organisms, and the codon usage pattern plays a role in genes expressed at higher levels [21][22][23][24][25][26][27][28][29][30]. Jacques and Dreyfus proposed that the translation initiation site is a rate-limiting factor for gene expression [31]. Nevertheless, a regulatory relationship, which is thought to be mediated by preferential codons, between CUB and translation efficiency for individual genes is challengeable [32,33]. This suggested that a heterogonous gene is not necessarily expressed at a low level simply because its codons are infrequently translated by the host cell. There is a codon bias with respect to intragenic codon bias in the initial sequences of genes for which major proteins are strikingly different from their downstream codon bias. It is found that the translational initiation region plays an important role in regulating the translational efficiency and the pattern of synonymous codon usage varies in different regions along a coding sequence [34,35]. This indicated that the alternative synonymous codon usage might be related with gene function, protein structure and translation efficiency. In this study, we focus on the pattern of CUB in the translation initiation region of PRRSV as well as the characteristics of the synonymous codon usage at each position in the target region, since the interest in the pattern of CUB has been aroused by its potential relevance to the translational efficiency of PRRSV subgenomic RNAs. And the frequency of non-preferential codons usage in the target region is investigated in order to evaluate the role of translation selection on the formation of negative CUB pattern.

Sequences data and the synonymous codon usage value
The 13 complete RNA sequences of PRRSV were downloaded from the National Center for Biotechnology Information (NCBI) http://www.ncbi.nlm.nih.gov/Genbank/ and the synonymous codon usage values (SCUV) for this virus were reported previously [30]. Multiple alignment analyses were performed with the Clustal W (1.7) method of DNAStar software (7.0) for windows. The translation initiation regions (the 1 st to the 50 th residue) of ORF1a, ORF2, ORF3, ORF4, ORF5, ORF6 and ORF7 were used as targets for alignment analysis respectively.

The calculation of codon usage bias
To calculate CUB, it is supposed that statistically equal and random usage of all available synonymous codons was the "neutral point" (RSCU 0 = 1.00) for the development of serotype-specific codon usage [19]. CUB: More simply, CUB is the average value of difference between RSCU ij and RSCU 0 at each position of the target region. n represents all codons appearing in this position. When all RSCU values according to a particular position in the target region are RSCU 0 , CUB is equal to zero. It means that there are few preferential or non-preferential codons existing at this position. In contrast, when CUB value is much more deviation than RSCU 0 , codons with CUB are preferentially chosen at a particular position.

Analysis of codon usage characteristic of the translation initiation region
We analyzed the codon usage characteristics of the translation initiation region depending on R values, where the R value, computed as the ration R = (n i /N i )/(n/N), represents the relative abundance for a particular codon in the translation initiation region. n i represents the total number of a particular codon within the 1 st to i th amino acids, N i represents the total number of corresponding amino acid in the 1 st to i th amino acid ones, n is the total number of a certain codon within the whole coding sequence, and N is the total number of corresponding amino acids within the whole coding sequence. When R value is equal to 1.00, it means that the frequency of this codon in the target region is equal to the frequency of this codon in the whole coding sequence; when R value is lower than zero, it implies that the frequency of this codon in the target region is lower than that of the whole coding sequence; when R value is higher than zero, it suggests that the frequency of this codon is higher than that of the whole coding sequence.

Multiple alignment analysis
The consensus amino acid sequence is based on the comparison of the strains in previous study [30]. The positions of amino acid conservation are listed in Table 1. The conservation of amino acid usage in translation region was analyzed. For ORF1a, 94% of amino acids in the target region of US serotype were invariant; 70% in the target region of EU serotype were conserved. For ORF2, 78% of amino acids were invariant in US serotype; 60% were invariant in EU serotype. Non-conserved amino acids scattered into the target regions of both US and EU serotypes. For ORF3, 74% of amino acids were invariant in US serotype; 60% were invariant in EU serotype, the most conserved amino acids tended to exist in the C' termination of the target regions of both US and EU serotypes. For ORF4, 76% of amino acids were invariant in US serotype; 72% were invariant in EU serotype. Non-conserved amino acids scattered in the flank of the target regions of both US and EU serotypes. For ORF5, 72% of amino acids were invariant in US serotype; 66% were invariant in EU serotype. Non-conserved amino acids scattered into the target regions of both US and EU serotypes. For ORF6, 96% of amino acids were invariant in US serotype; 82% were invariant in EU serotype, and non-conserved amino acids had a tendency to exist in the N' termination. For ORF7, 90% of amino acids were invariant in US serotype; 76% were invariant in EU serotype, and conserved amino acids scattered into the target region compared with that of US serotype. The various extents of the conserved amino acids encoded by ORFs of PRRSV suggested that these residues played an important role in virus biology.

Characteristics of codon usage bias in the target regions
The bars of all positions in the translation initiation region represented the CUB degree ( Figure 1). Although different invariant degrees of the amino acids exist in the target regions between US and EU serotypes, the similar patterns of codon usage are present in the target regions of both US and EU serotypes ( Table 2). For ORF1a, 58% of positions possess the similar pattern of codon usage in the target regions of both serotypes. Although the two target regions corresponding to both the US and EU serotypes have a significant difference to the conservation in obvious amino acids, a large size of the similar patterns of codon usage exist in the target region and the most positions possessed the positive codon usage bais ( Figure 1A). For ORF2, 34% of positions have the similar pattern of codon usage, and the positions in the N-terminal fragment had a tendency to choose low codon bias. It was also observed that the number of the positions with the negative codon usage bias for US serotype was more than that of EU serotype ( Figure 1B). For ORF3, 62% of positions have the similar pattern of codon usage ( Figure 1C). For ORF4, 72% positions contain the similar pattern of codon usage ( Figure 1D). For ORF5, 40% of positions have the similar pattern of codon usage, and these positions with the similar pattern of codon usage do not appear to exist near the N' termination ( Figure  1E). For ORF6, 26% of positions which contain the similar pattern of codon usage do not exist near the N' termination ( Figure 1F). For ORF7, 44% of positions have the similar pattern of codon usage, and the most positions with low codon usage bias tend to exist near the N-terminal fragment ( Figure 1G).
The various extents of the conserved pattern of codon usage for their positions in PRRSV ORFs suggest that CUB associated with these positions might modulate the corresponding gene expression.

The rate of codon usage frequency in the translation initiation region to that of the whole coding sequence
The R value for each codon was calculated and listed in Table 3. A higher R value indicated more preferential usage in the translation initiation site than that of the whole coding sequence. CUB ij value for each codon was listed in Table 4. Depending on the data from Table 3, 4 and comparison with the whole coding sequence of PRRSV, for ORF1a, the codons with negative CUB, namely GCA (Ala), GCG (Ala), CAA (Gln), AGU (Ser), ACA (Thr) and ACG (Thr), were more preferentially chosen in the target region for both serotypes; for ORF2, the codons, namely UGU (Cys), AUA (Ile), AAA (Lys), CCG (Pro), AGU (Ser) and UCG (Ser), were more preferentially used; for ORF3, the codons, namely UGU (Cys), AGC (Ser) and ACG (Thr), were more preferentially chosen; for ORF4, the codons, namely GAC (Asp), UUC (Phe), AGU (Ser) and UCG (Ser), were more preferentially chosen; for ORF5, the codons, namely UGU (Cys), CCG (Pro), UCG (Ser) and ACG (Thr), were more preferentially chosen; for ORF6, the codons, namely CAA (Gln), AUA (Ile) and CUA (Leu), were more preferentially used; for ORF7, the codons, namely GGA (Gly) and AAA (Lys), were more preferentially chosen. Due to these non-preferential codons, ribosomes might be stalled by them to regulate the efficiency of gene translation.      distributions [36][37][38][39][40][41]. However, the redundant intensity of mutation has deleterious effects on the viral fitness. Thus, the robustness of viral sequences can perform a reduced sensitivity to perturbations affecting phenotypic expression. The balance between the high mutations and the robustness produce a dynamic population pool, termed as 'quasispecis' [36,42]. As to comparative genomics, it is generally accepted that sequences with a crucial function are conserved among different but related organisms [43][44][45]. In addition, Akashi found that the frequency of preferential codons is significantly higher at the conserved amino acid positions than that at the non-conserved amino acid positions among different Drosophila species, suggesting that translation selection favors the conserved pattern of synonymous codon usage to enhance the accuracy of gene expression [46]. A lot of experimental data have shown that rates of chain elongation during translation of proteins are not uniform [47]. Non-uniform character of distribution of codons with different usage frequencies along mRNA is assumed to be a main factor to modulate the translation rate. Extensive studies have been carried out previously on the determination of the translation rates and the overall level of gene expression for certain individual codons [48][49][50][51][52]. From this research, we observed that the conserved pattern of codon usage did not simply follow the corresponding positions in the conserved sequence fragment, suggesting that the conservation of codon usage within a gene sequence have an important function in modulating its translational rate. The positions with the conserved positive CUB enhance the accuracy and efficiency of their gene translation. It has been observed that preferential codons can reduce the frequency of amino acid misincorporations, resulting in an approximately 10-fold increase of protein products over non-preferential codons for the same amino acid [53]. However, the positions with negative CUB in the translation initiation region of each PRRSV subgenomic RNA are not ignored. Because these positions are likely to regulate the translation initiation rate to generate the target product with high activity. Lithwich and Margalit reported that CUB is most highly associated with protein expression and is most conserved [26]. Once a significant number of gene sequences have been obtained, it will be taken into consideration that biased codon usage can regulate the expression levels of individual genes by modulating   [30].

RNA virus possesses high mutation rates and therefore virus populations exist as dynamic and complex mutant
the rates of polypeptide elongation [21,[54][55][56][57][58]. Komar pointed out that although preferential codons enable the corresponding gene to be translated efficiently, the nonpreferential codons replaced by the corresponding preferential codons can regulate the gene expression to perform the precise protein folding [59]. Lavner and Kotlar indicated that translation selection may shape codon bias pattern, not only to increase translation efficiency by favoring preferential codons in highly expressed genes, but also to decrease translation rate by favoring non-optimal codons in lowly expressed ones [60]. A relationship between the translation efficiency and CUB have been reported that it can lead to link between the protein folding by modulating the translational rate and the synonymous codon usage bias [47,[61][62][63][64][65]. The nucleotide sequences around the Nterminal region of the protein appear to be particularly sensitive to the presence of rare codons [66,67]. Our data showed that some positions in the translation initiation regions of ORFs tended to preferentially choose non-preferential codons which were more preferentially used in these regions than the whole coding sequences. This phenomenon suggested that the determinant of the invariant pattern of codon usage is not only correlated with the conserved sequence, but also dependent of the translation selection. As codon usage pattern comprised of preferential and non-preferential codons contributes to different translation rates, it is possible to change the local translation rates of a gene by suitable selection of its synonymous codons. A gene sequence with non-preferential codons intends to encode turns, loops and domain linkers within its protein structure through the limited step to the translation rate [47,63,64,68]. Taken together, under the translation selection, the conserved non-preferential codons in the translation initiation regions of PRRSV may affect the translation efficiency so as to maintain the normal biological functions of their target products. Komar and Jaenicke indicated that the non-preferential coodns play an important role in maintaining the normal function or activity of CAT product [68]. It shows the importance of non-preferential codons to the formation of the target products. As non-preferential codons or even one aggregating near the translation initiation codon can decrease translation rate arising from the limitation of availability of tRNAs depending on the host cell [69], the view that non-preferential codons probably have a negative effect on gene expression can be explained by the 'minor codon modulator hypothesis' [70]. When the tRNA concentration of minor codons becomes extremely limited, ribosomes of the host cell block at the minor codons to inhibite the ribosome from entering into the initiation site effectively, thereby resulting in a decrease in the translation rate. Moreover, the non-preferential codons locating at the translation initiation region modulate the number of ribosomes that are sequestered by an mRNA if the rates of elongation at these codons were so sufficiently slow that stalled ribosomes could block access to the initiation signals [19,71]. In summary, the conserved non-preferential codons in the translation initiation region have a high relationship with the regulation of gene expression. And the conserved codons with negative CUB are preferentially used in the initial region, which may be explained by the minor codon modulator hypothesis and the translation selection. These codons within this critical region might play a negative role in regulation of gene expression.