Skip to main content


You are viewing the new article page. Let us know what you think. Return to old version

Research | Open | Published:

Analysis of synonymous codon usage in Hepatitis A virus



Hepatitis A virus is the causative agent of type A viral hepatitis, which causes occasional acute hepatitis. Nevertheless, little information about synonymous codon usage pattern of HAV genome in the process of its evolution is available. In this study, the key genetic determinants of codon usage in HAV were examined.


The overall extent of codon usage bias in HAV is high in Picornaviridae. And the patterns of synonymous codon usage are quite different in HAV genomes from different location. The base composition is closely correlated with codon usage bias. Furthermore, the most important determinant that results in such a high codon bias in HAV is mutation pressure rather than natural selection.


HAV presents a higher codon usage bias than other members of Picornaviridae. Compositional constraint is a significant element that influences the variation of synonymous codon usage in HAV genome. Besides, mutation pressure is supposed to be the major factor shaping the hyperendemic codon usage pattern of HAV.


Hepatitis A virus (HAV), the causative agent of type A viral hepatitis, is an ancient human virus that was first identified in the stools of infected people in 1973 [1]. HAV is a non-enveloped, single-stranded positive-sence RNA virus which belongs to order Picornavirales, family Picornaviridae, the genus Hepatovirus in virus taxonomy [24]. The genome of HAV is approximately 7500 nucleotide in length and contains a large open-reading frame (ORF) encoding a polyprotein in which the major capsid proteins represent the amino-terminal third, with the remainder of the polyprotein comprising a series of nonstructural proteins required for HAV RNA replication: 2B, 2C, 3A, 3B, 3Cpro and 3Dpol. Based on the studies of genetics, HAV was proposed to divide into six different genotypes [5]. However, there is only one known serological group of human HAV [6, 7]. Although HAV causes occasional, dramatic disease outbreaks of acute hepatitis with fatal outcomes in otherwise healthy adults as well as isolated severe cases of hepatitis, it has never been associated with chronic liver disease [8].

As we all know, the genetic code chooses 64 codons to represent 20 standard amino acids and stop signals. These alternative codons for the same amino acid are termed as synonymous codons. Synonymous mutations tend to occur in the third base position, but the cases can be interchanged without altering the primary sequence of the polypeptide product. Some reports indicate that synonymous codons are not chosen equally both within and between genomes [913]. In general, codon usage variation may be the product of natural selection and/or mutation pressure for accurate and efficient translation in various organisms [1421]. It is well known that codon usage variation is considered as an indicator of the forces shaping genome evolution. In addition, compared with natural selection, mutation pressure plays an important role in synonymous codon usage pattern in some RNA viruses [18, 22, 23].

Nevertheless, little information about codon usage pattern of HAV genome including the relative synonymous codon usage (RSCU) and codon usage bias (CUB) in the process of its evolution is available. In this study, the key genetic determinants of codon usage index in HAV were examined.


Synonymous codon usage in HAV

The values of nucleotide contents in complete coding region of all 21 HAV genomes were analyzed (Table 1). Evidently, (C+G)% content fluctuated from 36.9 to 37.9, with a mean value of 37.15 and S.D of 0.28, indicating that nucleotides A and U were the major elements of HAV genome. Comparing the values of A3%, U3%, C3% and G3%, it is clear that U3% was distinctly high, and C3% was the lowest of all. The (C3+G3)% in complete coding region of each HAV genome fluctuated from 28.8 to 31.5, with a mean value of 29.92 and S.D of 0.62. And the effective number of codons (ENC) values of these HAV genomes fluctuated from 38.8 to 40.7, with a mean value of 39.34 and S.D. of 0.58. The ENC values for these HAV genomes were a little low indicating that the there is a particular extent of codon preference in HAV genome. The details of the overall relative synonymous codon usage (RSCU) values of 59 codons in 21 HAV genomes were analyzed (Table 2). Most preferentially used codons in HAV are A-ended or U-ended codons except the Gln and Leu whose optimized codons are CAG and UUG ending by G, respectively. Interestingly, HAV prefers U-ended optimized codons to A-ended codons.

Table 1 Identified nucleotide contents in complete coding region (length >250 bps) in hepatitis A virus (21 isolates) genome
Table 2 Synonymous codon usage of the whole coding sequence in hepatitis A virus

Correspondence analysis (COA)

To investigate the major trend in codon usage variation among HAV, COA was used for all 21 HAV complete coding regions selected for this study. COA detected one major trend in the first axis (ƒ'1) which accounted for 26.98% of the total variation, and another major trend in the second axis (ƒ'2) which accounted for 19.50% of the total variation. A plot of the first and second principal axes of the complete coding region of each gene was shown in Figure 1. It is clear that coordinate of each gene is relatively isolate except the Australia isolates, Brazil isolate and one Russia isolate. Nevertheless, these relatively isolated spots tend to cluster into several groups according to the same genotype. But MBB which isolated from North Africa had a special codon usage pattern contrasting with the other IB strains. All above imply that these strains of HAV isolated from different places, even the same genotype, have different trend in codon usage variation. Interestingly, the pattern of codon usage in vaccine strain H2 change to MBB-like pattern after continuous culturing in a human diploid cell line (KMB17), i.e. H2K5 and H2K20, suggesting that host was an element that could dramatically influence the codon usage pattern.

Figure 1

A plot of value of the first and second axis of each complete coding region in COA. The first axis (ƒ'1) accounts for 26.98% of the total variation, and the second axis (ƒ'2) accounts for 19.50% of the total variation. A): Each HAV complete coding region was divided by geographical area. B): Each HAV complete coding region was divided by genotype.

Compositional properties of HAV genomes

In order to analyze whether the codon usage variation of HAV genome was regulated by natural selection or mutation pressure, the A%, U%, C%, G% and (C+G)% were respectively compared with A3%, U3%, C3%, G3% and (C3+G3)% (Table 3). There was a complex correlation existing in nucleotide compositions. In detail, A3%, C3% and G3% have a significant negative correlation with C%, U% and A%, respectively. These data suggest that the nucleotide constraint may influence synonymous codon usage. However, A3% has non-correlation with U%, and U3% has non-correlation with A%, C%, G% and (C+G)%, respectively, which haven't indicated any peculiarity about synonymous codon usage. Furthermore, C3% and G3% have non-correlation with G% and C% respectively, indicating these data probably don't reflect the true feature of synonymous codon usage as well. Therefore, linear regression analysis was implemented to analyze the correlation between synonymous codon usage bias and nucleotide compositions. Details of correlation analysis between the first two principle axes (ƒ'1 and ƒ'2) of each HAV genome in COA and nucleotide contents were analyzed (Table 4). In surprise, only A3% has a significant correlation with both principle axes which represent the major trend in codon usage variation, suggesting that nucleotide A is the major factor influencing the synonymous codon usage pattern of HAV genome. However, interestingly, although the (ƒ'2) value has non-correlation with base nucleotide C and G contents on the third codon position respectively, it is observably related to (C3+G3)%, suggesting that codon usage patterns in HAV probably be correlated with (C3+G3)% to a specific extent. Overall, compositional constraint is a factor shaping the pattern of synonymous codon usage in HAV genome.

Table 3 Summary of correlation analysis between the A, U, C, G contents and A3, U3, C3, G3 contents in all selected samples.
Table 4 Analysis of correlation between the first two principle axes and nucleotide contents in samples.

Mutational bias is another main factor leading to codon usage variation

ENC-plot was considered as a part of the general strategy to investigate patterns of synonymous codon usage. The ENC-plots of the genes, whose codon choice is constrained only by a C3+G3 composition, will lie on or just below the curve of the predicted values (Wright, 1990). ENC values of each HAV genome were plotted against its corresponding (C3+G3)%. All of the spots lie below the curve of the predicted values, as shown in Figure 2, suggesting that the codon usage bias in all these 21 HAV genomes is principally influenced by the mutational bias.

Figure 2

Effective number of codons used in each ORF plotted against the GC3s. The continuous curve plots the relationship between GC3s and ENC in the absence of selection. All of spots lie below the expected curve.


Overtime, there have been more and more features that are unique to HAV within the family Picornaviridae, including its tissue tropism, its virion morphogenesis, its genetic distance from other members of this family, the important details of the processing of the viral polyprotein and the interactions of the virus with host cells [24]. After we analyzed synonymous codon usage in HAV (Table 2), we found that comparing with other viruses of Picornaviridae, such as Coxsackievirus A9 (ENC = 55.6), Enterovirus 71 (ENC = 56.6), Poliovirus type 3 (ENC = 54.2), Rhinovirus type 89 (ENC = 45.9) [23] and Food-and-Mouth Disease virus (mean ENC = 51.53) [21], the ENC values for HAV are a little low (mean ENC = 39.34). Although the ENC values for Coxsackievirus, Enterovirus, Poliovirus and Rhinovirus are not the mean value, it is also suggesting that the overall extent of codon usage bias in HAV genomes is rather high in Picornaviridae. In fact, Sánchez et al. have previously reported that HAV presents a higher codon usage bias than other members of the family, which conveys in the adaptation to use abundant and rare codons [25]. As a result, HAV codon usage has evolved to be complementary to that of human cells, never adopting codons those abundant for the host cell, even in some instances using these abundant codons as rare codons [26].

Since the variation and evolution of virus generally appear in the changes of virus genome composition, compositional constraint was assumed to be closely correlated with the synonymous codon usage pattern [18, 19, 2730]. Nucleotide U content was the highest, and the ratio of U3% was much higher than the other base composition on the third codon position (Table 3), which interpreted why most of the preferentially used codons are U-ended codons (Table 2). Despite the ratio of U3% was the highest, the major compositional constraint, which shaping the synonymous codon usage pattern of HAV genome, was from the percent of nucleotide A on the third codon position (Table 4). Moreover, two principle axes (ƒ'1 and ƒ'2) are not correlated with the other base compositions except nucleotide A (Table 4). This discovery was different from many reports which suggest that C+G compositional constraints were the major factor influencing codon usage bias in virus genome [18, 29, 30]. Therefore, we supposed that the compositional constraint was from not only C+G contents but also A and/or U contents. In addition, we found that A3% has a remarkable correlation with (C+G)% (Table 3). Hence, we could infer that A3% could influence the synonymous codon usage pattern through coordinating the contents of (C+G)%. Moreover, each composition was closely correlated with one of the other compositions, and each composition has a striking negative correlation with the other compositions. The (C3+G3)% was correlated with all the base compositions especially U and C contents. All these data suggest that there were kinds of complex and fantastic interrelations existing among these base compositions to regulate the codon usage bias. In brief, compositional constraint can indeed determine the variation of synonymous codon usage in virus genome.

Mutational pressure and natural selection are generally thought to be the main factors that account for codon usage variation between genes in different organisms [1421]. We wished to determine which should be responsible for the extreme codon usage bias in HAV. In the present study, the mutational pressure was determined to be the more important factor for the codon usage bias in HAV, which is shown in Figure 2, indicating that the codon usage in HAV genome is influenced by the C+G content which is usually assumed to be the result of mutational pressure. Actually, it is previously reported that mutation pressure rather than natural selection is the most important determinant of the codon bias in human RNA viruses [23]. Since mutation rates in RNA viruses are much higher than those in DNA viruses [31], it is understandable that mutational pressure is the major factor of shaping codon usage pattern in the 21 HAV strains included in our study. Despite this, HAV does not appear to undergo the rapid accumulation of genetic changes seen in many RNA viruses. Because HAV exploits a very low translation rate and a very low replication rate to promote and ensure its survival [26, 32], it shows a quite low mutation rate than other members of the family Picornaviridae[24, 33].

Since HAV mutation rate is much lower than other members of the family Picornaviridae, how does it form such a higher codon usage bias than other members of the family? Furthermore, how does it form kinds of trends in codon usage variation among different stains (Shown in Figure 1) in the condition of the similar nucleotide contents (Table 2)? This could be ascribed to the distinct endemicity of HAV, which is speculated from the result of COA. Early comparative studies of the nucleotide sequences of different human HAV strain suggested that sequence correlation could be correlated with the geographical origin of viruses [34, 35]. It is well known that quasispecies dynamics is characterized by continuous generation of variant viral genomes, competition among them, and selection of the fittest mutant distributions in any given environment. As other RNA viruses, HAV exists in vivo as distributions of closely related variant referred to as quasispecies [25, 32]. HAV strains maintained their low rate of accumulating mutations over a long period of time so that it developed specific ecological niches [33]. Because of surviving in different geographical area, different human race and different rounds of replication, the extreme codon usage bias of HAV was established over a long time. Moreover, in the context of a very low mutation rate, the extreme codon usage bias of HAV was conserved so that a distinct endemicity was generated.


HAV presents a higher codon usage bias than other members of Picornaviridae. The most important determinant of the high codon bias in HAV is mutation pressure which is also the main element shaping the hyperendemic codon usage pattern of HAV despite the mutation rate of HAV is quite low. Besides, compositional constraint is another factor influencing the synonymous codon usage in HAV. Although basic knowledge of codon usage patterns of HAV and the factors regulating the synonymous codon usage are demonstrated in our present study, more comprehensive analysis is necessary for revealing the deeper characteristic of synonymous codon usage in HAV genome.

Materials and methods


The 21 available complete RNA sequences of HAV were obtained from GenBank randomly in October 2010. The serial number (SN), GenBank number, genotype and other detail information are listed in Table 5.

Table 5 Information of hepatitis A virus genomes used in this study

Measures of relative synonymous codon usage

Relative synonymous codon usage values of each codon in a gene were calculated to investigate the characteristics of synonymous codon usage without the confounding influence of amino acid composition of different gene sample [14]. The RSCU value of the i th codon for the j th amino acid was calculated as:

Where gij is the observed number of the i th codon for j th amino acid which has ni type of synonymous codons. When the codon with RSCU values close to 1.0, it means that this codon is chosen equally and randomly.

The ENC was calculated to quantify the codon usage bias of an ORF [36], which is the best estimator of absolute synonymous codon usage bias [37]. The larger extent of codon preference in a gene, the smaller the ENC value is. And the index GC3s was used to calculate the fraction of the nucleotides G+C at the synonymous third codon position (excluding Met, Trp, and the termination codons).

Correspondence analysis

Multivariate statistical analysis can be used to explore the relationships between variables and samples. In this study, correspondence analysis was used to investigate the major trend in codon usage variation among genes. In this study, the complete coding region of each gene was represented as a 59 dimensional vector, and each dimension corresponds to the RSCU value of one sense codon (excluding Met, Trp, and the termination codons) [38].

Correlation analysis

Correlation analysis was used to identify the relationship between nucleotide composition and synonymous codon usage pattern [39]. This analysis was implemented based on the Spearman's rank correlation analysis way.

All statistical processes were carried out by with statistical software SPSS 11.5 for windows.


  1. 1.

    Feinstone SM, Kapikian AZ, Purcell RH: Hepatitis A: detection by immune electron microscopy of a virus-like antigen associated with a cute illness. Science 1973, 182: 1026-1028. 10.1126/science.182.4116.1026

  2. 2.

    Melnick JL: Classification of hepatitis A virus as enterovirus type 72 and of hepatitis B virus as hepadnavirus type I. Intervirology 1982, l8: 105-106. 10.1159/000149313

  3. 3.

    Gust ID, Coulepis AG, Feinstone SM, Locarnini SA, Moritsugu Y, Najera R, Siegl G: Taxonomic classification of hepatitis A virus. Intervirology 1983, 20: 1-7. 10.1159/000149367

  4. 4.

    Minor PD: Picornaviridae. In Classification and Nomenclature of Viruses: Fifth Report of the International Committee on Taxonomy of Viruses, Archives of Virology, Supplementum. Volume 2. Edited by: Francki RIB, Fauquet CM, Knudson DL, Brown F. Wien: Springer Verlag; 1991:320-326.

  5. 5.

    Costa-Mattioli M, Ferre V, Casane D, Perez-Bercoff R, Coste-Burel M, Imbert-Marcille BM, Andre EC, Bressollette-Bodin C, Billaudel S, Cristina J: Evidence of recombination in natural populations of hepatitis A virus. Virology 2003, 311: 51-59. 10.1016/S0042-6822(03)00109-0

  6. 6.

    Rakela J, Fay OH, Stevenson K, Gordon I, Mosley JW: Similarities of two hepatitis A virus strains. WHO Bulletin 1976, 54: 561-564.

  7. 7.

    Lemon SM, Binn LN: Antigenic relatedness of two strains of hepatitis A virus determined by cross-neutralization. Infection and Immunity 1983, 42: 418-420.

  8. 8.

    Martin A, Lemon SM: Hepatitis A virus: from discovery to vaccines. Hepatology 2006, 43: S164-S172. 10.1002/hep.21052

  9. 9.

    Grantham R, Gautier C, Gouy M, Mercier R, Pave A: Codon catalog usage and the genome hypothesis. Nucleic Acids Res 1980, 8: 49-62.

  10. 10.

    Martin A, Bertranpetit J, Oliver JL: Variation in G+C content and codon choice: differences among synonymous codon groups in vertebrate genes. Nucleic Acids Res 1989, 17: 6181-6189. 10.1093/nar/17.15.6181

  11. 11.

    Lloyd AT, Sharp PM: Evolution of codon usage patterns: the extent and nature of divergence between Candida albicans and Saccharomyces cerevisiae . Nucleic Acids Res 1992, 20: 5289-5295. 10.1093/nar/20.20.5289

  12. 12.

    Xie T, Ding D, Tao X, Dafu D: The relationship between synonymous codon usage and protein structure. FEBS Lett 1998, 434: 93-96. 10.1016/S0014-5793(98)00955-7

  13. 13.

    Dittmar KA, Goodenbour JM, Pan J: Tissue-specific differences in human transfer RNA expression. PLoS Genet 2006, 2: 2107-2115. 10.1371/journal.pgen.0020221

  14. 14.

    Sharp PM, Li WH: Codon usage in regulatory genes in Escherichia coli does not reflect selection for 'rare' codon. Nucleic Acids Res 1986, 14: 7737-7749. 10.1093/nar/14.19.7737

  15. 15.

    Sharp PM, Tuohy TM, Mosurski KR: Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res 1986, 14: 5125-5143. 10.1093/nar/14.13.5125

  16. 16.

    Karlin S, Mrázek J: What drives codon choices in human genes? J Mol Biol 1996, 262: 459-472. 10.1006/jmbi.1996.0528

  17. 17.

    Lesnik T, Solomovici J, Deana A, Ehrlich R, Reiss C: Ribosome traffic in E.coli and regulation of gene expression. J Theor Biol 2000, 202: 175-185. 10.1006/jtbi.1999.1047

  18. 18.

    Gu WJ, Zhou T, Ma JM, Sun X, Lu ZH: Analysis of synonymous codon usage in SARS coronavirus and other viruses in the Nidovirales . Virus Res 2004, 101: 155-161. 10.1016/j.virusres.2004.01.006

  19. 19.

    Zhou T, Gu WJ, Ma JM, Sun X, Lu ZH: Analysis of synonymous codon usage in H5N1 virus and other influenza A viruses. Biosystems 2005, 81: 77-86. 10.1016/j.biosystems.2005.03.002

  20. 20.

    Zhou T, Sun X, Lu ZH: Synonymous codon usage in environmental Chlamydia UWE25 reflects an evolution divergence from pathogenic chlamydiae. Gene 2006, 368: 117-125. 10.1016/j.gene.2005.10.035

  21. 21.

    Zhong JC, Li YM, Zhao S, Liu S, Zhang Z: Mutation pressure shapes codon usage in the GC-rich genome of foot-and-mouth disease virus. Virus Genes 2007, 35: 767-776. 10.1007/s11262-007-0159-z

  22. 22.

    Levin DB, Whittome B: Codon usage in nucleopolyhedroviruses. J Gen Virol 2000, 81: 2313-2325.

  23. 23.

    Jenkins GM, Holmes EC: The extent of codon usage bias in human RNA viruses and its evolutionary origin. Virus Res 2003, 92: 1-7. 10.1016/S0168-1702(02)00309-X

  24. 24.

    Cristina J, Costa-Mattioli M: Genetic variability and molecular evolution of Hepatitis A virus. Virus Res 2007, 127: 151-157. 10.1016/j.virusres.2007.01.005

  25. 25.

    Sánchez G, Bosch A, Pintó RM: Genome variability and capsid structural constraints of hepatitis A virus. J Virol 2003, 77: 452-459.

  26. 26.

    Pintó RM, Aragonès L, Costafreda MI, Ribes E, Bosch A: Codon usage and replicative strategies of hepatitis A virus. Virus Res 2007, 127: 158-163.

  27. 27.

    Bernardi G, Bernardi G: Codon Usage and Genome Composition. J Mol Evol 1985, 22: 363-365. 10.1007/BF02115693

  28. 28.

    Bernardi G, Bernardi G: Compositional Constraints and Genome Evolution. J Mol Evol 1986, 24: 1-11. 10.1007/BF02099946

  29. 29.

    Das S, Paul S, Dutta C: Synonymous codon usage in adenoviruses: Influence of mutation, selection and protein hydropathy. Virus Res 2006, 117: 227-236. 10.1016/j.virusres.2005.10.007

  30. 30.

    Zhao S, Zhang Q, Liu XL, Wang XM, Zhang HL, Wu Y, Jiang F: Analysis of synonymous codon usage in 11 Human Bocavirus isolates. BioSystems 2008, 92: 207-214. 10.1016/j.biosystems.2008.01.006

  31. 31.

    Drake JW, Holland JJ: Mutation rates among RNA viruses. Proc Natl Acad Sci USA 1999, 96: 13910-13913. 10.1073/pnas.96.24.13910

  32. 32.

    Costa-Mattioli M, Domingo E, Cristina J: Analysis of sequential hepatitis A virus strains reveals coexistence of distinct viral subpopulations. J Gen Virol 2006, 87: 115-118. 10.1099/vir.0.81286-0

  33. 33.

    Robertson BH, Jansen RW, Khanna B, Totsuka A, Nainan OV, Siegl G, Widell A, Margolis HS, Isomura S, Ito K, Ishizu T, Moritsugu Y, Lemon SM: Genetic relatedness of hepatitis A virus strains recovered from different geographical regions. J Gen Virol 1992, 73: 1365-1377. 10.1099/0022-1317-73-6-1365

  34. 34.

    Jansen RW, Siegl G, Lemon SM: Molecular epidemiology of human hepatitis A virus defined by an antigen-capture/polymerase chain reaction method. Proc Natl Acad Sci USA 1990, 87: 2867-2871. 10.1073/pnas.87.8.2867

  35. 35.

    Robertson BH, Khanna B, Nainan OV, Margolis HS: Epidemiologic patterns of wild-type hepatitis A virus determined by genetic variation. J Infect Dis 1991, 163: 286-292. 10.1093/infdis/163.2.286

  36. 36.

    Wright F: The 'effective number of codons' used in a gene. Gene 1990, 87: 23-29. 10.1016/0378-1119(90)90491-9

  37. 37.

    Comeron JM, Aguade M: An evaluation of measures of synonymous codon usage bias. J Mol Evol 1998, 47: 268-274. 10.1007/PL00006384

  38. 38.

    Mardia KV, Kent JT, Bibby JM: Multivariate analysis. NewYork, Academic press; 1979.

  39. 39.

    Ewens WJ, Grant GR: Statistical Methods in Bioinformatics. NewYork, Springer; 2001.

Download references


This work was supported in parts by grants from National Key Technologies R&D Program (2009ZX08007-006B), International Science & Technology Cooperation Program of China (No.2010DFA32640) and Science and Technology Key Project of Gansu Province (No.0801NKDA034). This study was also supported by National Natural Science foundation of China (No.30700597 and No. 31072143).

Author information

Correspondence to Jie Zhang.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

YQZ conceived of the study, downloaded these sequences, calculated the data, analyzed the results and drafted the manuscript; YSL conceived of the study, supervised the research, analyzed the results and helped draft the manuscript; JHZ calculated and visualized the data; WQL, HTC, YW, LNM and YZD assisted with data analysis; JZ supervised the research and helped draft the manuscript. All authors read and approved the final manuscript.

Yiqiang Zhang, Yongsheng Liu contributed equally to this work.

Authors’ original submitted files for images

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

Reprints and Permissions

About this article


  • Codon Usage
  • Synonymous Codon
  • Codon Usage Bias
  • Synonymous Codon Usage
  • Relative Synonymous Codon Usage