The characteristics of the synonymous codon usage in hepatitis B virus and the effects of host on the virus in codon usage pattern

Background Hepatitis B virus (HBV) infection is one of the main human health problem and causes a large-scale of patients chronic infection worldwide.. As the replication of HBV depends on its host cell system, codon usage pattern for the viral gene might be susceptible to two main selections, namely mutation pressure and translation selection. In this case, a deeper investigation between HBV evolution and host adaptive response might assist control this disease. Result Relative synonymous codon usage (RSCU) values for the whole HBV coding sequence were studied by Principal component analysis (PCA). The characteristics of the synonymous codon usage patterns, nucleotide contents and the comparison between ENC values of the whole HBV coding sequence indicated that the interaction between virus mutation pressure and host translation selection exists in the processes of HBV evolution. The synonymous codon usage pattern of HBV is a mixture of coincidence and antagonism to that of host cell. But the difference of genetic characteristic of HBV failed to be observed to its different epidemic areas or subtypes, suggesting that geographic factor is limited to influence the evolution of this virus, while genetic characteristic based on HBV genotypes could be divided into three groups, namely (i) genotyps A and E, (ii) genotype B, (iii) genotypes C, D and G. Conclusion Codon usage patterns from PCA for identification of evolutionary trends in HBV provide an alternative approach to understand the evolution of HBV. Further more, a combined selection of mutation pressure with translation selection on codon usage might shed a light on understanding the evolutionary trends of HBV genotypes.


Introduction
Hepatitis B virus (HBV) disease is one of the main global health problems that two billion people are infected and 350 million people undergo chronic infection as well [1]. HBV belongs to the protyotype member of the family Hepadnaviridae, and has a compact and circular DNA genome of about 3.2 kb in length, with four overlapping open reading frames including large S region (PreS/S), PreC/C, × and P [2,3]. Moreover, the overlapping regions on the genome are helpful to study the evolution of the virus with its point mutations, because the incidence of recombination is rare and any point mutation could effect the genetic characteristics of two overlapped genes [3]. The evolution of HBV should be interactional and constrained by the overlap of genes [4]. In some cases, the evolution of one overlappinggene protein may evolve more rapidly as a consequce of negative selection to the other, [5]. And the overlapping genes might be subject to different selections [6]. Furthermore, independent adaptive selection for both overlapping genes has been reported [7]. One of the main features of HBV are its genetic heterogeneity [8].
There are four main subtypes, namely ayw, adw, adr and ayr [9]. According to phylogenetic analysis of the complete HBV genomic sequence, 9 genotype of HBV from genotype A to I have been determined and divided into approximately twenty-five subgenotypes [10][11][12][13][14]. HBV genotypes show distinct geographical distributions at the level of nucleotide different more than 8% each other [11,15,16]. It is noticed that nucleotide composition comprising of HBV coding sequence with various genetic diversities is selective rather than random, because the natural selection from host is responsible for selection of various strains shaped by mutation. In previous reports, translation selection and compositional constraints under the mutational pressure are thought to be the major factors accounting for codon usage variation among genomes in microorganisms [17][18][19][20][21][22][23][24]. In some RNA viruses, compared with natural selection, mutation pressure plays a more important role in synonymous codon usage pattern [25,26]. Although it is known that compositional constraints and translation selection are the more generally accepted mechanisms accounting for codon usage bias [27][28][29][30], other selection forces have also been proposed such as fine-tuning translation kinetics selection as well as escape of cellular antiviral responses [23,[31][32][33][34]. Thus, the codon usage pattern may be important in disclosing the molecular mechanism and evolutionary process of HBV to avoid host cell response. To our knowledge, it is the first systemic study to analysis the synonymous codon usage pattern and evolutional dynamics of HBV as well as the relationship between codon usage pattern of HBV and its host.

Synonymous coodn usage in HBV
The C% and U% were higher than A% and G%, and C 3 % and U 3 % were higher than A 3 % and G 3 % in HBV ( Table 1).
The overall nucleotide composition never affects the nucleotide contents in the third site of codon in HBV coding sequence, suggesting that composition constraints may be one of the factors in affecting the codon usage pattern of HBV. For the synonymous codon usage pattern of HBV, the over-represented synonymous codons are rare in HBV coding sequence, only including UCU for Ser, in addition, the under-represented ones contain AUA for Ile, CCC for Pro, ACC for Thr, GCC for Ala, CGU and CGG for Arg ( Table 2).
The codon usage bias of HBV suggests that some synonymous codons are not chosen equally and randomly.

Genetic relationship based on synonymous codon usage in HBV
The PCA detected the first principal component (f 1 ') which can account for 23.65% of the total synonymous codon usage variation, and the second principal .47% of the total variation. Based on the geographical factor in influencing HBV evolution potentially, there is an obviously geographical distribution. For example, the overall codon usage pattern of HBV isolated from Philippines and South Korea is far from those of China and Indonesia, and the HBV isolated from Germany and Iran has a similar genetic diversity with that isolated from South Africa ( Figure 1). Based on the subtypes of HBV, the plots for the subtype adw were generally divided into two groups, while the other three subtypes seem to have a similar genetic characteristic ( Figure 2).
It is worth noting that the plots for different HBV genotypes were generally separated from each other. Moreover, the genotypes A and B have an obviously different genetic characteristic with the rest, while genotypes C, D and G appear to have a relationship of evolution ( Figure 3).
These results indicated that the geographic distribution might be a limited factor to effect the codon usage of the whole HBV coding sequence, and the subtypes did not reflect the characteristic of HBV evolution to some degree. In this case, the codon usage variation might be one of factors to drive HBV evolution.
The effect of mutation pressure on codon usage of HBV To analyze if the evolution of HBV is shaped by mutation pressure from virus itself or by translation selection from host, G+C content at the first and second codon positions (GC 12 %) was compared with that at synonymous third codon positions (GC 3 %) (Figure 4).
A highly significant correlation was observed (r = 0.432, P < 0.01), implying that mutation pressure from base composition of HBV is a main factor in shaping genetic diversity of this virus, since the effects are present at all codon positions. In addition, the ENC values were calculated for each strain and the plot was made by ENC value against GC 3 % ( Figure 5).
The Figureure 5 represented that the plots of HBV aggregated below the expected curve, suggesting other selections take part in the process of HBV evolution.

Discussion
The ENC values calculated for HBV indicated that although a significantly lower bias of codon usage exists in HBV, the codon usage is not mainly affected by mutation pressure. As for some viruses, previous study reported that the major factor in shaping codon usage patterns appears to be mutation pressure rather than natural selection [19,21,24,35]. However, the comparison of the synonymous codon usage between HBV and human cells suggested that the interaction of mutation pressure with translation selection exists in the process of HBV evolution, although ENC values for the whole HBV coding sequence to represent mutation pressure is one of the factors in influencing codon usage pattern. This characteristic of HBV confers adaptive advantages which result in a highly efficient dissemination of the virus through different ways of transmission. The pattern of codon usage is a genetic characteristic of various organisms in Previous study [19,20,27,31,32,35,36]. Because C%, U%, U 3 % and C 3 % play roles in the formation of the different optimal codons with any nucleotide-ended, the codon usage pattern of HBV is likely influenced by composition constraints. The codon usage pattern of PV is mostly coincident with that of its host, while the codon usage pattern of HBV is antagonistic to that of its host [37,38]. The codon usage pattern of HBV is a mixture of the two types of codon usage. The coincident portion of codon usage pattern for HBV enables the corresponding amino acids to be translated rapidly, the other antagonistic portion of codon usage pattern likely enable viral proteins to be folded properly, although the translation efficiency of the corresponding amino acids is decreased. Latent genes in Epstein-Barr virus deoptimize codon usage in order to evade competition for host protein translation [28] and attenuation of PV activity was performed by rare codon pairs inducing poor translation for sequences of viral proteins [27]. These results suggested that disfavored codons coding for amino acids may not be a deleterious factor for viruses to adapt to its host cells. According to the data of codon usage pattern of HBV isolated from different countries, the geographic factor fails to influence the formation of codon usage pattern of HBV. After all, with development of international communication and highly efficient dissemination of HBV through various approaches of transmission, the affection of geographic factor seems to be weak on the limitation of HBV distribution in different countries. It is interesting that the main four subtypes of HBV have no significant difference in genetic characteristic shaped by different human races. This result might suggested that translation selection from human is not a single factor to shape the overall codon usage pattern of this virus and mutation pressure from HBV itself is a main force to drive HBV evolution. Genotyping of HBV is of high interest because there is increasing evidence that HBV genotypes may be associated with HBeAg seroconversion rates, mutation occurring in the procure and core promoter region, severity of liver disease and treatment response [15,16,39,40]. There is a significant difference of the overall codon usage pattern of HBV between genotypes A, B, E and C, D, G. HBV genotypes and subgenotypes have been associated with differences in clinical and virological characteristics, showing that they may play a role in the virus-host relationship [41]. It has been shown that genotypes C and D are associated with more serious liver injuries and with a higher incidence of HCC than genotypes A and B [42][43][44]. In addition, genotype C and D have a much lower rate in response to interferon therapy than those infected with A or B genotypes [40,45]. Moreover, subtle differences in frequency and type of lamivudine resistant variants occur in genotype A and D infectious [15]. An evolutionary approach to HBV infection, based on the principles of natural selection, may offer explanation for how modes of transmission may favor some genotypes and subgenotypes over others and influence HBV virulence.
The genetic diversity and codon usage patterns we proposed here are helpful to understand the processes of HBV evolution, especially the roles played by translation selection from host and mutation pressure from virus. Additionally, such information might benefit to understand the roles of geographic and subtype factors in influencing the process of HBV evolution.

Sequence data
The 58 complete RNA sequences of HBV were downloaded from the National Center for Biotechnology Information (NCBI) http://www.ncbi.nlm.nih.gov/Genbank/ and detailed information about the viruses were listed in Table 3 Each general nucleotide composition (U%, A%, C% and G%) and each nucleotide composition in the third site of codon (U 3 %, A 3 %, C 3 % and G 3 %) in HBV coding sequence were calculated by biosoftware DNAStar 7.0 for windows.
The calculation of the relative synonymous codon usage (RSCU) The relative synonymous codon usage (RSCU) values for the whole 58 coding sequence of HBV were calculated as previously described [46]. RSCU values do not depend on the factors of amino acid composition and the size of the coding sequence, because the two factors can be eliminated in the process of calculation. When RSCU value is equal to 1.0, it means that this codon is chosen equally and randomly. The RSCU value for a synonymous codon more than 1.0 or less than 1.0 indicates the more frequency or less frequency, respectively. The synonymous codons with RSCU more than 1.6 were thought to be over-represented, while the synonymous codons with RSCU less than 0.6 were regarded as under-represented [47].

Analysis of codon usage bias
The 'effective number of codons' (ENC), the useful estimator of absolute codon usage bias, was a measure quantifying the codon usage bias of the whole coding sequence of HBV. The ENC value ranges from 20 (when only one synonymous codon is chosen by the corresponding amino acid) to 61 (when all synonymous codons are used equally) [48]. In this study, this measure was used to evaluate the degree of codon usage bias of coding sequences for HBV.

Principal component analysis
Principal component analysis (PCA), which was a commonly used multivariate statistical method [24], was carried out to analyze the major trend in codon usage pattern among different strains of HBV. PCA involves a mathematical procedure that transforms some correlated variable (RSCU values) into a smaller number of uncorrelated variables called principal components. Each strain was represented as a 59 dimensional vector, and each dimension corresponded to the RSCU value of each sense codon, which only included several synonymous codons for a particular amino acid, excluding the codon of AUG, UGG and three stop codons.

Correlation analysis
The relationship between each general nucleotide composition (U%, A%, C% and G%) and each nucleotide composition in the third site of codon (U 3 %, A 3 %, C 3 % and G 3 %) in HBV coding sequence and the relationship between U 3 %, A 3 %, C 3 %, G 3 % and the coodn usage pattern of HBV were evaluated by the Pearson's rank. All statistical processes were carried out by statistical software SPSS11.5 for windows.