- Open Access
Compare the differences of synonymous codon usage between the two species within cardiovirus
Virology Journal volume 8, Article number: 325 (2011)
Cardioviruses are positive-strand RNA viruses in the Picornaviridae family that can cause enteric infection in rodents and also been detected at lower frequencies in other mammals such as pigs and human beings. The Cardiovirus genus consists two distinct species: Encephalomyocarditis virus (EMCV) and Theilovirus (ThV). There are a lot differences between the two species. In this study, the differences of codon usage in EMCV and ThV were compared.
The mean ENC values of EMCV and ThV are 54.86 and 51.08 respectively, higher than 40.And there are correlations between (C+G)12% and (C+G)3% for both EMCV and ThV (r = -0.736;r = 0.986, P < 0.01, repectively). For ThV the (C+G)12%, (C+G)3%, axis f'1 and axis f'2 had a significant correlations respectively but not for EMCV. According to the RSCU values, the EMCV species seemed to prefer U, G and C ending codon, while the ThV spice seemed to like using U and A ending codon. However, in both genus AGA for Arg, AUU for Ile, UCU for Ser, and GGA for Gly were chosen preferentially. Correspondence analysis detected one major trend in the first axis (f'1) which accounted for 22.89% of the total variation, and another major trend in the second axis (f'2) which accounted for 17.64% of the total variation. And the plots of the same serotype seemed at the same region at the coordinate.
The overall extents of codon usage bias in both EMCV and ThV are low. The mutational pressure is the main factor that determines the codon usage bias, but the (C+G) content plays a more important role in codon usage bias for ThV than for EMCV. The synonymous codon usage pattern in both EMCV and ThV genes is gene function and geography specific, but not host specific. Maybe the serotype is one factor effected the codon bias for ThV, and location has no significant effect on the variations of synonymous codon usage in these virus genes.
Synonymous codon usage is biased and the bias seems to be different in different organisms[1, 2]. Many factors are concerned to be the reasons for this bias, such as degree and timing of gene expression, codon-anticodon interactions, transcription and translation rate and fidelity, codon context, and global and local (C+G) content[3, 4]. Understanding the extent and causes of biases in codon usage is essential to the understanding of viral evolution, particularly the interplay between viruses and the immune response . More recent studies have revealed that patterns of codon usage bias and nucleotide composition within many cellular genomes are far more complex than previously imagined, and the factors shaping their evolution are still not entirely understood. In general, natural selection and/or mutation pressure for accurate and efficient translation in various organisms are the main reasons to this bias. In addition, compared with natural selection, mutation pressure plays an important role in synonymous codon usage pattern in some RNA viruses [6–10].
Picornaviruses are positive single-stranded RNA viruses that cause a variety of important disease states in humans and animals, such as foot-and-mouth disease. The Cardiovirus genus of the family Picornaviridae consists two distinct species: Encephalomyocarditis virus (EMCV) and Theilovirus ThV . The EMCVs comprise a single serotype and have a wide host range [11–21], while the ThV species, probably includes four serotypes: Theiler's murine encephalomyelitis virus (TMEV), Vilyuisk human encephalomyelitis virus (VHEV), Thera virus (TRV; isolated from rats) and Saffold virus (SAFV; isolated from humans) 1-8., which appear to have much narrower host ranges than EMCV. Like the other virus within Picornaviruses family, the strains in Cardiovirus also consist a open-read-frame (ORF), 5'-untranslate region (5'-UTR) and 3'-untranslate region (3'-UTR). However there are still many complete nucleotide sequences of this type are not reported especially, such as SAFV, therefore there is much more work to study this type virus.
Nevertheless, little information about codon usage pattern of Cardiovirus genus genome including the relative synonymous codon usage (RSCU) and codon usage bias (CUB) in the process of its evolution is available. In this study, the key genetic determinants of codon usage index in Cardiovirus genus were examined.
The characteristics of Synonymous codon usage in EMCV and ThV
In order to investigate the extent of codon usage bias in Cardiovirus, all RSCU values of different codons in 39 Cardiovirus strains were calculated. As shown in Table 1, the EMCV strains seem to like using U, G and C ending codon, while the ThV species seem to like using U and A ending codons. The values of ENC (effective number of codons) (Table 2) among EMCV strains examined are very similar, which vary from 54.40 to 56.11 with a mean value of 54.86 and S.D. of 0.36, while the values of ENC among Theilovirus are a little different, which vary from 48.24 to 54.94 with a mean value of 51.08 and S.D. of 2.17. Because all the ENC values of both EMCV strains and Theilovirus strains are high (ENC > 40), codon usage bias in Cardioviru genome is a little slight. However, there is a marked variation in codon usage pattern among different Theilovirus genes (S.D. = 6.41) compared to the EMCV genes (S.D. = 0.36). The concept is further supported by the values of CG3. The (C+G)3% values of EMCV strains range from 46.47 to 52.11% with a mean of 48.90 and S.D. of 2.18 while these values of ThV strains range from 37.35 to 51.00, with a mean of 43.72 and S.D. of 5.77.
Compositional properties of coding sequences of both EMCV and ThV
As shown in Table 3, (C+G)% has a highly significant correlation with each A3%, C3%, G3% and U3% . (C+G)3% has a highly significant correlation with each of A%, U%, C% and G% among the ThV strains but not among the EMCV strains. This indicates that the (C+G)% and (C+G)3% may reflect some more important characteristics of codon usage pattern of ThV compared with EMCV. Then the C+G content at first and second codon positions ((C+G)12%) was compared with that at synonymous third codon positions ((C+G)3%) for both EMCV and ThV respectively. A highly significant correlation is observed in ThV (r = 0.986, P < 0.01)(Figure 1A, Table 4). However for EMCV a highly negative correlation is observed (r = -0.736, P < .0.01)(Figure 1B, Table 4). Then the (C+G)12% and (C+G)3% of both EMCV and ThV were compared with axis f'1 and axis f'2 respectively. The results (Table 4) show that for EMCV there are no significant correlations between (C+G)3%, (C+G)12%, axis f'1 and axis f'2 are observed, while the results of ThV are opposite. The ENC-plot [ENC plotted against (G+C)3%] was used as a part of general strategy to investigate patterns of synonymous codon usage and all of the spots lie below the expected curve (Figure 2). All these results imply that the codon bias of Cardiovirus especially the ThV can be explained mainly by an uneven base composition, in other words, by mutation pressure rather than natural selection and the (C+G) content has a more significant effect for ThV than EMCV
Correspondence analysis (COA) for all the strains
To investigate the major trend in codon usage variation among Cardiovirus, COA was used for all 39 Cardiovirus complete coding regions selected for this study. COA detect one major trend in the first axis (f'1) which account for 22.89% of the total variation, and another major trend in the second axis (f'2) which account for 17.64% of the total variation. The coordinate of the complete coding region of each gene was plotted in Figure 3 defining by the first and second principal axes. It is clear that the f'1 values of all EMCV are positive while the ThVs are negative. And the plots of the strains of the same serotype seem at the same region. Furthermore, the EMCV has a tendency to converge tightly while the different serotypes of ThV are dispersed. These findings imply that different serotype may have different codon usage patterns. Interestingly, the plot of EMC-30 is a little far from the other EMCV, but this does not indicate the location is an element that could dramatically influence the codon usage pattern.
Qualitative evaluation of codon usage bias in EMCV and ThV
There was a seemingly random variation in RSCU between amino acids and gene groups. There were several synonymous codons with strong discrepancy for codon usage in each genus. As for EMCV, in details, AGA for Arg, GGA for Gly, CAU for His, AUU for Ile, CCA for Pro, UCU for Ser and GUG for Val. And there are some differences of the global pattern of codon usage between EMCV and Theilovirus. However, in both genus AGA for Arg, AUU for Ile, UCU for Ser, and GGA for Gly were chosen preferentially (Figure 4).
Studies of synonymous codon usage in viruses can reveal much about viral genomes. In this study, we used RSCU, ENC, COA, and GC3S, to measure the synonymous codon usage bias in order to compare the differences between EMCV and ThV, the two species within Cardiovirus. The synonymous codon usage bias in coding regions of both EMCV and ThV are low because the mean ENC values of 54.86 and 51.08 respectively (higher than 40). This is in agreement with previous reports about some other RNA viruses, for example, BVDV (mean ENC = 51.42), H5N1 (mean ENC = 50.91) and SARS-covs (mean ENC = 48.99)[6, 7, 21]. A low codon usage bias is advantageous to replicate efficiently in vertebrate host cells, with potentially distinct codon preferences. However there is a marked variation in codon usage pattern among different Theilovirus genes (S.D. = 6.41) compared to the EMCV genes (S.D. = 0.36). One explanation about this phenomenon is that the ThV probably has four serotypes while the EMCV just has one and the serotype might affect the codon choice.
A general mutational pressure, which affects the whole genome, would certainly account for the majority of the codon usage variation. In this study, the general association between codon usage bias and base composition suggests that mutational pressure, rather than natural selection, is mainly supported by the highly significant correlation between (C+G)12% and (C+G)3% (r = -0.736 for EMCV; r = 0.986 for ThV, P < 0.01), since the effects are present at all codon positions. Also the (G+C) content was another factor which was found to be strong correlated with codon usage bias. In this study, the results indicated the (C+G) content played an important role in codon usage bias for ThV (Table 3), but not for EMCV. This is a little complex for EMCV and we need to do more research for this genus such as each nucleotide composition, gene structure and so on to find the main factor for codon bias of EMCV. Nevertheless we still consider that the mutational pressure rather than natural selection is the one of the main factors responsible for the variation of synonymous codon usage among ORF coding sequences in Cardiovirus genus.
Generally, previous reports indicates that many viruses including foot-and-mouth disease viruses, influenza A virus subtype H5N1, severe acute respiratory syndrome Coronavirus (SARSCoV) and human bocavirus, preferentially use C and G-ended codons[2, 7, 9, 10]. In this study we found that the EMCV strains seemed to like using U, G and C ending codon, while the ThV species seemed to like using U and A ending codon. Also there was a seemingly random variation in RSCU between amino acids and gene groups. This may be because using these codon with different endings could be advantage for replicating efficiently in host cells with potentially distinct codon preferences for both EMCV and ThV.
Serotype may be one factor for codon bias in Cardiovirus as the Figure 3 showed. And there was no evidence supported that location could be a factor for codon bias, because the plot of EMC-30 which was isolated from USA was a little far from other EMCV that were isolated from USA plots.
The overall extents of codon usage bias in both EMCV and ThV are low (mean ENC = 54.86; mean ENC = 51.08 respectively, higher than 40). The mutational pressure rather than natural selection is the main factor that determines the codon usage bias that is supported by the highly significant correlation between (C+G)12% and (C+G)3% (r = -0.736 for EMCV; r = 0.986 for ThV, P < 0.01), but the (C+G) content plays a more important role in codon usage bias for ThV than for EMCV. The synonymous codon usage pattern in both EMCV and ThV genes is gene function and geography specific, but not host specific. Maybe the serotype is one factor effected the codon bias for ThV, and location has no significant effect on the variations of synonymous codon usage in these virus genes.
Materials and methods
A total of 39 Cardiovirus genomes were used in this study, including 18 EMCV genomes and 21ThV genomes. The CDS of these viruses were obtained from NCBI http://www.ncbi.nlm.nih.gov/Genbank/ randomly in December 2010. And the serial number (SN), GenBank number, genotype and other detail information are listed in Table 5.
Measures of relative synonymous codon usage
Relative synonymous codon usage (RSCU) values of each codon in each ORF were used to measure the synonymous codon usage. RSCU values are largely independent of amino acid composition and are particularly useful in comparing codon usage between genes, or sets of genes that differ in their size and amino acid composition . The RSCU value of the i th codon for the j th amino acid was calculated as:
Where gij is the observed number of the i th codon for j th amino acid which has ni type of synonymous codons. When the codon with RSCU values close to 1.0, it means that this codon is chosen equally and randomly. The values of RSCU were obtained by CodonW program
The effective number of codons (ENC) was calculated to quantify the codon usage bias of an ORF , which is the best estimator of absolute synonymous codon usage bias . The larger extent of codon preference in a gene, the smaller the ENC value is. In an extremely biased gene where only one codon is used for each amino acid, this value would be 20; if all codons are used equally, it would be 61; and if the value of the ENC is greater than 40, the codon usage bias was regarded as a low bias  The values of ENC were obtained by CodonW program.
Composition analysis of coding region
In order to better understand the synonymous codon usage variation among different Cardiovirus isolates, The (C+G) content at the first and second codon positions [(C+G)12%] and that at the synonymous third position [(C+G)3%] were calculated by the CodonW program, respectively [26, 27]. The values of the (C+G) content at different positions were used to compare with the values of the other compositional content.
Correspondence analysis (COA)
Multivariate statistical analysis can be used to explore the relationships between variables and samples. In this study, correspondence analysis was used to investigate the major trend in codon usage variation among genes. In this study, the complete coding region of each gene was represented as a 59 dimensional vector, and each dimension corresponds to the RSCU value of one sense codon (excluding Met, Trp, and the termination codons) .
Correlation analysis was used to identify the relationship between nucleotide composition and synonymous codon usage pattern . This analysis was implemented based on the Spearman's rank correlation analysis way.
All statistical processes were carried out by with statistical software SPSS 11.5 for windows.
Andersson SGE, Kurland CG: Codon preferences in free living microorganisms. Microbiol Rev 1990, 54: 198-210.
Grantham R, Gautier C, Gouy M, Mercier R, Pave A: Codon catalog usage and the genome hypothesis. Nucleic Acids Res 1980, 8: 49-62.
Martin A, Bertranpetit J, Oliver JL: Variation in G+C content and codon choice: differences among synonymous codon groups in vertebrate genes. Nucleic Acids Res 1989, 17: 6181-6189. 10.1093/nar/17.15.6181
Xie T, Ding D, Tao X, Dafu D: The relationship between synonymous codon usage and protein structure. FEBS Lett 1998, 434: 93-96. 10.1016/S0014-5793(98)00955-7
Laura A Shackelton, Colin R Parrish, Edward C Holmes: Evolutionary Basis of Codon Usage and Nucleotide Composition Bias in Vertebrate DNA Viruses. J Mol Biol 1996, 262: 459-472. 10.1006/jmbi.1996.0528
Gu WJ, Zhou T, Ma JM, Sun X, Lu ZH: Analysis of synonymous codon usage in SARS coronavirus and other viruses in the Nidovirales. Virus Res 2004, 101: 155-161. 10.1016/j.virusres.2004.01.006
Zhou T, Gu WJ, Ma JM, Sun X, Lu ZH: Analysis of synonymous codon usage in H5N1 virus and other influenza A viruses. Biosystems 2005, 81: 77-86. 10.1016/j.biosystems.2005.03.002
Zhou T, Sun X, Lu ZH: Synonymous codon usage in environmental Chlamydia UWE25 reflects an evolution divergence from pathogenic chlamydiae. Gene 2006, 368: 117-125.
Zhong JC, Li YM, Zhao S, Liu S, Zhang Z: Mutation pressure shapes codon usage in the GC-rich genome of foot-and-mouth disease virus. Virus Genes 2007, 35: 767-776. 10.1007/s11262-007-0159-z
Jenkins GM, Holmes EC: The extent of codon usage bias in human RNA viruses and its evolutionary origin. Virus Res 2003, 92: 1-7. 10.1016/S0168-1702(02)00309-X
Stanway G, Brown F, Christian P, Hovi T, Hyypia T, AMQ King, Knowles NJ, Lemon SM, Minor PD, Pallansch MA, Palmenberg AC, Skern T: Eighth report of the International Committee on Taxonomy of Viruses. In Family Picornaviridae. Edited by: Fauquet CM, Mayo MA, Maniloff J, Desselberger U, Ball LA. Elsevier Academic Press, London, United Kingdom; 2005:757-778.
Zimmerman JJ, ed.: Encephalomyocarditis. 2nd edition. CRC Press, Boca Raton, FL; 1994.
Theiler M: Spontaneous encephalomyelitis of mice--a new virus disease. Science 1934, 80: 122.
Abed Y, Boivin G: New Saffold cardiovirus in 3 children in Canada. Emerg Infect Dis 2008, 14: 834-836. 10.3201/eid1405.071675
Casals J: Immunological characterization of Vilyuisk human encephalomyelitis virus. Nature 1963, 200: 339-341. 10.1038/200339a0
Pritchard AE, Lipton HL: Nucleotide sequence identifies Vilyuisk virus as a divergent Theiler's virus. Virology 1992, 191: 469-472. 10.1016/0042-6822(92)90212-8
Ohsawa K, Watanabe Y, Miyata H, Sato H: Genetic analysis of a Theiler-like virus isolated from rats. Comp Med 2003, 53: 191-196.
Vladimirtsev VA, et al.: Family clustering of Viliuisk encephalomyelitis in traditional and new geographic regions. Emerg Infect Dis 2007, 13: 1321-1326.
Lipton HL, Friedmann A, Sethi P, Crowther JR: Characterization of Vilyuisk virus as a picornavirus. J Med Virol 1983, 12: 195-203. 10.1002/jmv.1890120305
Pritchard AE, T Strom and HL: Lipton Nucleotide sequence identifies Vilyuisk virus as a divergent Theiler's virus. Virology 1992, 191: 469-472. 10.1016/0042-6822(92)90212-8
Wang M, Zhang J, Liu Y: Analysis of codon usage in bovine viral diarrhea virus. Arch Virol 2011, 156: 153-160. 10.1007/s00705-010-0848-0
Sharp PM, Li WH: Codon usage in regulatory genes in Escherichia coli does not reflect selection for 'rare' codon. Nucleic Acids Res 1986, 14: 7737-7749. 10.1093/nar/14.19.7737
Wright F: The 'effective number of codons' used in a gene. Gene 1990, 87: 23-29. 10.1016/0378-1119(90)90491-9
Comeron JM, Aguade M: An evaluation of measures of synonymous codon usage bias. J Mol Evol 1998, 47: 268-274. 10.1007/PL00006384
Anders Fuglsang: The 'effective number of codons' revisited. Biochemical and Biophysical Research Communications 2004, 317: 957-964. 10.1016/j.bbrc.2004.03.138
Novembre JohnA: Accounting for Background Nucleotide Composition When Measuring Codon Usage Bias. Mol Biol Evol 2002, 19: 1390-1394.
Comeron JM, Aguade M: An evaluation of measures of synonymous codon usage bias. J Mol Evol 1998, 47: 268-274. 10.1007/PL00006384
Mardia KV, Kent JT, Bibby JM: Multivariate analysis. NewYork, Academic press; 1979.
Ewens WJ, Grant GR: Statistical Methods in Bioinformatics. NewYork, Springer; 2001.
This work was supported in parts by grants from International Science & Technology Cooperation Program of China (No.2010DFA32640) and Science and Technology Key Project of Gansu Province (No.0801NKDA034). This study was also supported by National Natural Science foundation of China (No.30700597, 31072143).
The authors declare that they have no competing interests.
WQL and JZ conceived of the study. WQL downloaded these sequences, calculated the data, analyzed the results and drafted the manuscript; JZ supervised the research, analyzed the results and helped draft the manuscript; JHZ calculated and visualized the data; YQZ, HTC, LNM and YZD assisted with data analysis; YSL supervised the research and helped draft the manuscript.
Wen-qian Liu, Jie Zhang contributed equally to this work.
About this article
Cite this article
Liu, W., Zhang, J., Zhang, Y. et al. Compare the differences of synonymous codon usage between the two species within cardiovirus. Virol J 8, 325 (2011). https://doi.org/10.1186/1743-422X-8-325
- Codon Usage
- Codon Bias
- Codon Usage Bias
- Severe Acute Respiratory Syndrome
- Synonymous Codon Usage