Evidence for a novel coding sequence overlapping the 5'-terminal ~90 codons of the Gill-associated and Yellow head okavirus envelope glycoprotein gene
© Firth and Atkins; licensee BioMed Central Ltd. 2009
Received: 1 October 2009
Accepted: 17 December 2009
Published: 17 December 2009
The genus Okavirus (order Nidovirales) includes a number of viruses that infect crustaceans, causing major losses in the shrimp industry. These viruses have a linear positive-sense ssRNA genome of ~26-27 kb, encoding a large replicase polyprotein that is expressed from the genomic RNA, and several additional proteins that are expressed from a nested set of 3'-coterminal subgenomic RNAs. In this brief report, we describe the bioinformatic discovery of a new, apparently coding, ORF that overlaps the 5' end of the envelope glycoprotein encoding sequence, ORF3, in the +2 reading frame. The new ORF has a strong coding signature and, in fact, is more conserved at the amino acid level than the overlapping region of ORF3. We propose that translation of the new ORF initiates at a conserved AUG codon separated by just 2 nt from the ORF3 AUG initiation codon, resulting in a novel 86 amino acid protein.
Overlapping genes are common in RNA viruses where they serve as a mechanism to optimize the coding potential of compact genomes. However, annotation of overlapping genes can be difficult using conventional gene-finding software . Recently we have been using a number of complementary approaches to systematically identify new overlapping genes in virus genomes [11–15]. When we applied these methods to the okaviruses, we found strong evidence for a new coding sequence - hereafter ORFX - overlapping the 5'-terminal 88 codons of ORF3 in the +2 reading frame (Figure 1). Here we describe the bioinformatic analyses.
Okavirus sequences in GenBank with full-length coverage of ORF3 were identified by applying tblastn  to the ORF3 amino acid sequence from one GAV isolate [GenBank:AF227196] and one YHV isolate [GenBank:EU487200], resulting in the additional sequences [GenBank:EF156405], [GenBank:FJ848673], [GenBank:FJ194949], [GenBank:EU785042] and [GenBank:EU785043]. The ORF3 regions were extracted, translated, aligned with CLUSTALW , and back-translated to give a nucleotide sequence alignment for ORF3.
The alignment was analysed for conservation at synonymous sites, as described in . The procedure takes into account whether synonymous site codons are 1-, 2-, 3-, 4- or 6-fold degenerate and the differing probabilities of transitions and transversions. The analysis revealed a striking, and highly statistically significant (p < 10-21 for the total conservation within ORFX), peak in synonymous site conservation at the 5' end of ORF3 (Figure 1B, panels 5-6). Such conservation peaks are indicative of overlapping functional elements, though such elements may be either coding or non-coding. However, in this case, coinciding with the conserved region there was a conserved absence of stop codons in the +2 reading frame (Figure 1B; panel 4), thus suggesting an overlapping coding sequence in the +2 frame as a possible explanation for the enhanced conservation at ORF3-frame synonymous sites.
Inspection of an additional 33 sequences (FJ438530-FJ438532, FJ428584-FJ428613; ) with only partial coverage of ORF3, but nearly complete coverage of the ORFX region, again revealed the complete absence of +2 frame stop codons in ORFX. One further partial sequence, [GenBank:DQ978360], differed in having a single nucleotide deletion ~15 codons into ORFX. The effect of this deletion is to fuse the 5' end of ORFX with the downstream ORF3: ribosomes initiating at the ORF3 initiation codon terminate early while ribosomes initiating at the ORFX initiation codon (see below) go on to translate an ORFX-ORF3 fusion. In fact, except for this deletion, the 717 nt DQ978360 is identical to the corresponding region of FJ848673, so it is possible that this deletion is simply due to a sequencing error or a defective sequence.
Given the short length of ORFX, conservation alone is perhaps not sufficient evidence for a coding assignment. Therefore, the alignment was also analysed with MLOGD - a gene-finding program which was designed specifically for identifying overlapping coding sequences, and which includes explicit models for sequence evolution in multiply-coding regions [11, 12]. In contrast to the synonymous site conservation index above, MLOGD, when applied in the 'sliding window' mode, does not depend on the degree of conservation per se (the sequence divergence parameter is fitted independently for each window). When applied to the ORF3 alignment, MLOGD detected a strong coding signature for ORFX - with positively scoring windows throughout the ORFX region in the +2 frame - indicating directly that ORFX is indeed a coding sequence (Figure 1B, panel 9). In fact, the MLOGD score in the +2/ORFX frame within the ORFX region was significantly greater than the score in the +0/ORF3 frame, indicating that the ORFX product is subject to stronger functional constraints than the product of the overlapping region of ORF3 (which indeed has a negative MLOGD score towards the 5'-terminal half of the ORFX region; Figure 1B, panel 7). Consistently, a comparison of GAV AF227196 with YHV EU487200 showed that, in the region where ORFX and ORF3 overlap, ORFX has higher amino acid conservation than ORF3 (71/86 identities for ORFX, 62/86 identities for ORF3).
When MLOGD was applied in the 'test query ORF' mode (Figure 1B, panel 10), the number of independent base variations across the alignment within the ORFX region was calculated to be Nvar ~ 57, and the total MLOGD score was log(LR) ~ 33.0 (see  for details). Although MLOGD has a significant false negative rate (i.e. there are known overlapping genes that it fails to detect - particularly ones that are less conserved than the genes they overlap), the false positive rate (with appropriate thresholds) is low. In particular, extensive tests with known single-coding and double-coding virus sequence alignments indicate that 'Nvar ≥ 20' and 'log(LR) ≥ × Nvar' signals robust detection (<1% false positive rate) of an overlapping same-strand CDS  (and unpublished data).
Initiation at the proposed AUG codon would give ORFX the nucleotide coordinates 20680..20937 in AF227196 (GAV) and 20990..21247 in EU487200 (YHV), resulting in an 86 amino acid product with a molecular mass of 10 kDa. The full predicted amino acid sequences are shown in Figure 2B. Application of blastp  to the amino acid sequences revealed no similar sequences in GenBank - as expected for a gene created de novo via out-of-frame 'overprinting' of a preexisting gene [20, 21]. Application of InterProScan  predicted an N-terminal signal peptide/transmembrane region, and an additional transmembrane region comprising amino acids 34 to 54. Application of SignalP 3.0  predicted an N-terminal signal anchor or signal peptide. If the latter, cleavage of the signal peptide - mostly probably between residues 25 and 26 - would leave a 61 amino acid, 7.3 kDa ORFX product.
Overlapping genes are difficult to identify and are often overlooked. However, it is important to be aware of such genes as early as possible in order to avoid confusion (otherwise functions of the overlapping gene may be wrongly ascribed to the gene they overlap), and also so that the functions of the overlapping gene may be investigated in their own right. We hope that presentation of this bioinformatic analysis will help fullfil these goals. Initial verification of ORFX product could be by means of immunoblotting with ORFX-specific antibodies.
This work was supported by National Institutes of Health Grant R01 GM079523 and an award from Science Foundation Ireland, both to JFA.
- Cowley JA, Dimmock CM, Walker PJ: Gill-associated nidovirus of Penaeus monodon prawns transcribes 3'-coterminal subgenomic mRNAs that do not possess 5'-leader sequences. J Gen Virol 2002, 83: 927-935.View ArticlePubMedGoogle Scholar
- Gorbalenya AE, Enjuanes L, Ziebuhr J, Snijder EJ: Nidovirales: evolving the largest RNA virus genome. Virus Res 2006, 117: 17-37. 10.1016/j.virusres.2006.01.017View ArticlePubMedGoogle Scholar
- Pasternak AO, Spaan WJ, Snijder EJ: Nidovirus transcription: how to make sense...? J Gen Virol 2006, 87: 1403-1421. 10.1099/vir.0.81611-0View ArticlePubMedGoogle Scholar
- Sittidilokratna N, Dangtip S, Cowley JA, Walker PJ: RNA transcription analysis and completion of the genome sequence of yellow head nidovirus. Virus Res 2008, 136: 157-165. 10.1016/j.virusres.2008.05.008View ArticlePubMedGoogle Scholar
- Cowley JA, Walker PJ: The complete genome sequence of gill-associated virus of Penaeus monodon prawns indicates a gene organisation unique among nidoviruses. Arch Virol 2002, 147: 1977-1987. 10.1007/s00705-002-0847-xView ArticlePubMedGoogle Scholar
- Jitrapakdee S, Unajak S, Sittidilokratna N, Hodgson RA, Cowley JA, Walker PJ, Panyim S, Boonsaeng V: Identification and analysis of gp116 and gp64 structural glycoproteins of yellow head nidovirus of Penaeus monodon shrimp. J Gen Virol 2003, 84: 863-873. 10.1099/vir.0.18811-0View ArticlePubMedGoogle Scholar
- Cowley JA, Cadogan LC, Spann KM, Sittidilokratna N, Walker PJ: The gene encoding the nucleocapsid protein of Gill-associated nidovirus of Penaeus monodon prawns is located upstream of the glycoprotein gene. J Virol 2004, 78: 8935-8941. 10.1128/JVI.78.16.8935-8941.2004PubMed CentralView ArticlePubMedGoogle Scholar
- Wijegoonawardane PK, Cowley JA, Phan T, Hodgson RA, Nielsen L, Kiatpathomchai W, Walker PJ: Genetic diversity in the yellow head nidovirus complex. Virology 2008, 380: 213-225. 10.1016/j.virol.2008.07.005View ArticlePubMedGoogle Scholar
- Sittidilokratna N, Chotwiwatthanakun C, Wijegoonawardane PK, Unajak S, Boonnad A, Wangnai W, Jitrapakdee S, Cowley JA, Walker PJ: A virulent isolate of yellow head nidovirus contains a deformed envelope glycoprotein gp116. Virology 2009, 384: 192-200. 10.1016/j.virol.2008.10.042View ArticlePubMedGoogle Scholar
- Gangnonngiw W, Anantasomboon G, Sang-oum W, Sriurairatana S, Sritunyalucksana K, Flegel TW: Non-virulence of a recombinant shrimp nidovirus is associated with its non structural gene sequence and not a large structural gene deletion. Virology 2009, 385: 161-168. 10.1016/j.virol.2008.10.044View ArticlePubMedGoogle Scholar
- Firth AE, Brown CM: Detecting overlapping coding sequences with pairwise alignments. Bioinformatics 2005, 21: 282-292. 10.1093/bioinformatics/bti007View ArticlePubMedGoogle Scholar
- Firth AE, Brown CM: Detecting overlapping coding sequences in virus genomes. BMC Bioinformatics 2006, 7: 75. 10.1186/1471-2105-7-75PubMed CentralView ArticlePubMedGoogle Scholar
- Chung BYW, Miller WA, Atkins JF, Firth AE: An overlapping essential gene in the Potyviridae. Proc Natl Acad Sci USA 2008, 105: 5897-5902. 10.1073/pnas.0800468105PubMed CentralView ArticlePubMedGoogle Scholar
- Firth AE, Chung BY, Fleeton MN, Atkins JF: Discovery of frameshifting in Alphavirus 6K resolves a 20-year enigma. Virol J 2008, 5: 108. 10.1186/1743-422X-5-108PubMed CentralView ArticlePubMedGoogle Scholar
- Firth AE, Atkins JF: A conserved predicted pseudoknot in the NS2A-encoding sequence of West Nile and Japanese encephalitis flaviviruses suggests NS1' may derive from ribosomal frameshifting. Virol J 2009, 6: 14. 10.1186/1743-422X-6-14PubMed CentralView ArticlePubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215: 403-410.View ArticlePubMedGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994, 22: 4673-4680. 10.1093/nar/22.22.4673PubMed CentralView ArticlePubMedGoogle Scholar
- Wijegoonawardane PK, Sittidilokratna N, Petchampai N, Cowley JA, Gudkovs N, Walker PJ: Homologous genetic recombination in the yellow head complex of nidoviruses infecting Penaeus monodon shrimp. Virology 2009, 390: 79-88. 10.1016/j.virol.2009.04.015View ArticlePubMedGoogle Scholar
- Matsuda D, Dreher TW: Close spacing of AUG initiation codons confers dicistronic character on a eukaryotic mRNA. RNA 2006, 12: 1338-1349. 10.1261/rna.67906PubMed CentralView ArticlePubMedGoogle Scholar
- Belshaw R, Pybus OG, Rambaut A: The evolution of genome compression and genomic novelty in RNA viruses. Genome Res 2007, 17: 1496-1504. 10.1101/gr.6305707PubMed CentralView ArticlePubMedGoogle Scholar
- Rancurel C, Khosravi M, Dunker KA, Romero PR, Karlin D: Overlapping genes produce proteins with unusual sequence properties and offer insight into de novo protein creation. J Virol 2009, 83: 10719-10736. 10.1128/JVI.00595-09PubMed CentralView ArticlePubMedGoogle Scholar
- Zdobnov EM, Apweiler R: InterProScan - an integration platform for the signature-recognition methods in InterPro. Bioinformatics 2001, 17: 847-848. 10.1093/bioinformatics/17.9.847View ArticlePubMedGoogle Scholar
- Bendtsen JD, Nielsen H, von Heijne G, Brunak S: Improved prediction of signal peptides: SignalP 3.0. J Mol Biol 2004, 340: 783-795. 10.1016/j.jmb.2004.05.028View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.