- Open Access
A potentially novel overlapping gene in the genomes of Israeli acute paralysis virus and its relatives
© Sabath et al; licensee BioMed Central Ltd. 2009
- Received: 2 July 2009
- Accepted: 17 September 2009
- Published: 17 September 2009
The Israeli acute paralysis virus (IAPV) is a honeybee-infecting virus that was found to be associated with colony collapse disorder. The IAPV genome contains two genes encoding a structural and a nonstructural polyprotein. We applied a recently developed method for the estimation of selection in overlapping genes to detect purifying selection and, hence, functionality. We provide evolutionary evidence for the existence of a functional overlapping gene, which is translated in the +1 reading frame of the structural polyprotein gene. Conserved orthologs of this putative gene, which we provisionally call pog (p redicted o verlapping g ene), were also found in the genomes of a monophyletic clade of dicistroviruses that includes IAPV, acute bee paralysis virus, Kashmir bee virus, and Solenopsis invicta (red imported fire ant) virus 1.
- Monophyletic Clade
- Negative Strand
- Orthologous Cluster
- Colony Collapse Disorder
- Israeli Acute Paralysis Virus
Colony collapse disorder (CCD) is a syndrome characterized by the mass disappearance of honeybees from hives . CCD imperils a global resource estimated at approximately $200 billion . For example, it has been estimated that up to 35% of hives in the US may have been affected . Many culprits have been suggested as causal factors of CCD, among them fungal, bacterial, and protozoan diseases, external and internal parasites, in-hive chemicals, agricultural insecticides, genetically modified crops, climatic factors, changed cultural practices, and the spread of cellular phones . The Israeli acute paralysis virus (IAPV), a positive-strand RNA virus belonging to the family Dicistroviridae, was found to be strongly correlated with CCD . It was first isolated in Israel , but was later found to have a worldwide distribution [4, 6, 7].
The genome of IAPV contains two long open reading frames (ORFs) separated by an intergenic region. The 5' ORF encodes a structural polyprotein; the 3' ORF encodes a non-structural polyprotein . The non-structural polyprotein contains several signature sequences for helicase, protease, and RNA-dependent RNA polymerase . The structural polyprotein, which is located downstream of the non-structural polyprotein, encodes two (and possibly more) capsid proteins.
Overlapping genes are easily missed by annotation programs , as evidenced by the fact that several overlapping genes were only detected by using the signatures of purifying selection [9–13]. Here, we apply a recently developed method for the detection of selection in overlapping reading frames  to the genome of IAPV and its relatives.
A list of completely sequenced dicistroviruses used in this study
Israel acute paralysis virus (IAPV)
Acute bee paralysis virus (ABPV)
Kashmir bee virus (KBV)
Solenopsis invicta virus (SINV-1)
Black queen cell virus (BQCV)
Cricket paralysis virus (CrPV)
Homalodisca coagulata virus-1 (HoCV-1)
Drosophila C virus (DCV)
Aphid lethal paralysis virus (ALPV)
Himetobi P virus (HiPV)
Taura syndrome virus (TSV)
Plautia stali intestine virus (PSIV)
Triatoma virus (TrV)
Rhopalosiphum padi virus (RhPV)
Clusters of orthologous overlapping ORFs on the positive strand
Start of ORF
End of ORF
Sequence conservation in comparisons of known orthologous proteins and orthologous products of overlapping ORFs.
Identity of known proteins (%)
Identity of hypothetical product of overlapping ORFs (%)
An additional indication for selection on these ORFs was obtained by comparing the degrees of conservation of the hypothetical protein sequences of the overlapping ORFs against the protein sequences of the known genes (structural and nonstructural polyproteins, Table 3). The degree of amino-acid conservation and, hence, sequence identity between orthologous protein-coding genes is influenced ceteris paribus by the intensity of purifying selection. If both overlapping genes are under similar strengths of selection, the amino-acid sequence identity of one pair of homologous genes would be similar to that of the overlapping pair. On the other hand, if a functional gene overlaps a non-functional ORF, the amino-acid identity between the hypothetical protein sequences of the non-functional ORFs would be much lower than that between the two homologous overlapping functional genes. We found that the degree of amino-acid conservation of the overlapping sequence identity between pairs of overlapping ORFs in cluster A is only slightly lower than that of the known gene (maximum of 12% difference between IAPV and SINV-1 in cluster A, Table 3). In contrast, the amino-acid sequence identity between ORF pairs in clusters B and C is much lower than that between the pairs of known genes (maximum of 44% difference between CrPV and DCV in cluster C, Table 3).
The signature of purifying selection on the ORFs in cluster A suggests that they may encode functional proteins. We provisionally term this gene pog (p redicted o verlapping g ene). In Figure 1, we show that pog is found in the genomes of four viruses that constitute a monophyletic clade, but not in any other dicistrovirid genome (Figure 1A). Its phylogenetic distribution suggests that pog originated before the divergence of SINV-1 from the three bee viruses. The phylogenetic distributions of the ORFs in clusters B and C (Figure 1B) are patchy. This patchiness is an additional indication that the overlapping ORFs in clusters B and C are spurious, i.e., non-functional.
A protein motif search resulted in several matches, all with a weak score. Two patterns were found in all four proteins: (1) a signature of rhodopsin-like GPCRs (G protein-coupled receptors), and (2) a protein kinase C phosphorylation site (Figure 3). Prediction of the secondary structures  suggests that the proteins contain two conserved helix domains, separated by 3-5 residues (except for SINV-1, in which one long domain is predicted), at the C-terminus (Figure 3). A search for transmembrane topology  indicates that the longer helix may be a transmembranal segment (Figure 3). Although viruses often use GPCRs to exploit the host immune system through molecular mimicry [22–25], the lengths of the proteins encoded by pog are shorter than the average virus-encoded GPCR. Therefore, these proteins may have a different function.
In this note, we provide evolutionary evidence (purifying selection) for the existence of a functional overlapping gene, pog, in the genomes of IAPV, ABPV, KBV, and SINV-1. To our knowledge, this putative gene, whose coding region overlaps the structural polyprotein, has not been described in the literature before.
Sequence Data, Processing, and Analysis
Fourteen completely sequenced dicistrovirid genomes were obtained from NCBI (Table 1). Each genome was scanned for the presence of overlapping ORFs. We used BLASTP  with the protein sequences of the known genes to identify matches of orthologous overlapping ORFs (E value < 10-6). Matching overlapping ORFs were assigned into clusters. Within each cluster, we aligned the amino-acid orthologs by using the sequences of the known genes as references. If alignment length of the overlapping sequence exceeded 60 amino-acids, and if the amino-acid sequence identity among the hypothetical genes within a cluster was higher than 65%, we tested for selection on the hypothetical gene (see below).
We aligned the protein sequences of the two polyproteins with CLUSTAW  as implemented in the MEGA package . Alignment quality was confirmed using HoT . We reconstructed two phylogenetic trees (one for each polyprotein) by applying the neighbor joining method , as implemented in the MEGA package . Trees were rooted by the mid-point rooting method  and confidence of each branch was estimated by bootstrap with 1000 replications.
Detection of Selection in Overlapping Genes
We used the method of Sabath et al.  for the simultaneous estimation of selection intensities in overlapping genes. This method uses a maximum-likelihood framework to fit a Markov model of codon substitution to data from two aligned homologous overlapping sequences. To predict functionality of an ORF that overlaps a known gene, we modified an existing approach for predicting functionality in non-overlapping genes . Given two aligned orthologous overlapping sequences, we estimate the likelihood of two hierarchical models. In model 1, there is no selection on the ORF. In model 2, the ORF is assumed to be under selection. The likelihood-ratio test is used to test whether model 2 fits the data significantly better than model 1, in which case, the ORF is predicted to be under selection and most probably functional.
We looked for motifs within the inferred protein sequences encoded by the overlapping ORF by using the motif search server http://motif.genome.jp/ and the My-Hits server http://hits.isb-sib.ch/cgi-bin/PFSCAN with the following motif databases: PRINTS , PROSITE , and Pfam . We used PSIPRED  to predict secondary structure, and MEMSAT  to predict transmembrane protein topology.
Overlapping ORFs on the negative strand
In the fourteen completely sequenced dicistroviruse genomes (Table 1), we identified 240 overlapping ORFs of length equal or greater than 60 codons on the negative strand. Of the 240 ORFs, 113 were found in concordant genomic locations in two or more genomes. The concordant overlapping ORFs were assigned into 29 clusters (Additional file 1). There are 9, 1, and 19 clusters in phase 0, 1, and 2, respectively. The cluster size ranges from 2 to 9. In two clusters, 5 and 10, both in phase 2, there is a weak signature of selection. However, this signature seems to be a false positive, which was driven by the unique structure of opposite-strand phase-2 overlap (Additional file 2). In this structure, codon positions one and two of one gene match codon positions two and one of the overlapping gene. This structure leads to a situation where most changes are either synonymous or nonsynonymous in both overlapping genes and occasionally, to false signal of purifying selection on the overlapping ORF. In addition, one of the clusters (cluster 10) does not constitute a monophyletic clade, and is, therefore, unlikely to be functional. We therefore conclude that dicistroviruses most probably do not encode proteins on the negative strand.
We thank Dr. Ilan Sela and an anonymous reviewer for their comments. This work was supported in part by US National Library of Medicine Grant LM010009-01 to Dan Graur and Giddy Landan and by the Small Grants Program of the University of Houston.
- Oldroyd BP: What's killing American honey bees? PLoS Biol 2007, 5: e168. 10.1371/journal.pbio.0050168PubMed CentralView ArticlePubMedGoogle Scholar
- Gallai N, Salles J-M, Settele J, Vaissière BE: Economic valuation of the vulnerability of world agriculture confronted with pollinator decline. Ecological Economics 2009, 68: 810-821. 10.1016/j.ecolecon.2008.06.014View ArticleGoogle Scholar
- van Engelsdorp D, Hayes J Jr, Underwood RM, Pettis J: A survey of honey bee colony losses in the U.S., fall 2007 to spring 2008. PLoS ONE 2008, 3: e4071. 10.1371/journal.pone.0004071View ArticlePubMedGoogle Scholar
- Cox-Foster DL, Conlan S, Holmes EC, Palacios G, Evans JD, Moran NA, Quan PL, Briese T, Hornig M, Geiser DM, et al.: A metagenomic survey of microbes in honey bee colony collapse disorder. Science 2007, 318: 283-287. 10.1126/science.1146498View ArticlePubMedGoogle Scholar
- Maori E, Lavi S, Mozes-Koch R, Gantman Y, Peretz Y, Edelbaum O, Tanne E, Sela I: Isolation and characterization of Israeli acute paralysis virus, a dicistrovirus affecting honeybees in Israel: evidence for diversity due to intra- and inter-species recombination. J Gen Virol 2007, 88: 3428-3438. 10.1099/vir.0.83284-0View ArticlePubMedGoogle Scholar
- Blanchard P, Schurr F, Celle O, Cougoule N, Drajnudel P, Thiery R, Faucon JP, Ribiere M: First detection of Israeli acute paralysis virus (IAPV) in France, a dicistrovirus affecting honeybees ( Apis mellifera ). J Invertebr Pathol 2008, 99: 348-350. 10.1016/j.jip.2008.07.006View ArticlePubMedGoogle Scholar
- Palacios G, Hui J, Quan PL, Kalkstein A, Honkavuori KS, Bussetti AV, Conlan S, Evans J, Chen YP, vanEngelsdorp D, et al.: Genetic analysis of Israel acute paralysis virus: distinct clusters are circulating in the United States. J Virol 2008, 82: 6209-6217. 10.1128/JVI.00251-08PubMed CentralView ArticlePubMedGoogle Scholar
- Chen W, Calvo PA, Malide D, Gibbs J, Schubert U, Bacik I, Basta S, O'Neill R, Schickli J, Palese P, et al.: A novel influenza A virus mitochondrial protein that induces cell death. Nat Med 2001, 7: 1306-1312. 10.1038/nm1201-1306View ArticlePubMedGoogle Scholar
- Chung BY, Miller WA, Atkins JF, Firth AE: An overlapping essential gene in the Potyviridae. Proc Natl Acad Sci USA 2008, 105: 5897-5902. 10.1073/pnas.0800468105PubMed CentralView ArticlePubMedGoogle Scholar
- Firth AE: Bioinformatic analysis suggests that the Orbivirus VP6 cistron encodes an overlapping gene. Virol J 2008, 5: 48. 10.1186/1743-422X-5-48PubMed CentralView ArticlePubMedGoogle Scholar
- Firth AE, Atkins JF: Bioinformatic analysis suggests that the Cypovirus 1 major core protein cistron harbours an overlapping gene. Virol J 2008, 5: 62. 10.1186/1743-422X-5-62PubMed CentralView ArticlePubMedGoogle Scholar
- Firth AE, Atkins JF: Bioinformatic analysis suggests that a conserved ORF in the waikaviruses encodes an overlapping gene. Arch Virol 2008, 153: 1379-1383. 10.1007/s00705-008-0119-5View ArticlePubMedGoogle Scholar
- Firth AE, Atkins JF: Analysis of the coding potential of the partially overlapping 3' ORF in segment 5 of the plant fijiviruses. Virol J 2009, 6: 32. 10.1186/1743-422X-6-32PubMed CentralView ArticlePubMedGoogle Scholar
- Sabath N, Landan G, Graur D: A method for the simultaneous estimation of selection intensities in overlapping genes. PLoS ONE 2008, 3: e3996. 10.1371/journal.pone.0003996PubMed CentralView ArticlePubMedGoogle Scholar
- Nguyen M, Haenni AL: Expression strategies of ambisense viruses. Virus Res 2003, 93: 141-150. 10.1016/S0168-1702(03)00094-7View ArticlePubMedGoogle Scholar
- de Miranda JR, Drebot M, Tyler S, Shen M, Cameron CE, Stoltz DB, Camazine SM: Complete nucleotide sequence of Kashmir bee virus and comparison with acute bee paralysis virus. J Gen Virol 2004, 85: 2263-2270. 10.1099/vir.0.79990-0View ArticlePubMedGoogle Scholar
- Govan VA, Leat N, Allsopp M, Davison S: Analysis of the complete genome sequence of acute bee paralysis virus shows that it belongs to the novel group of insect-infecting RNA viruses. Virology 2000, 277: 457-463. 10.1006/viro.2000.0616View ArticlePubMedGoogle Scholar
- Valles SM, Strong CA, Dang PM, Hunter WB, Pereira RM, Oi DH, Shapiro AM, Williams DF: A picorna-like virus from the red imported fire ant, Solenopsis invicta : initial discovery, genome sequence, and characterization. Virology 2004, 328: 151-157. 10.1016/j.virol.2004.07.016View ArticlePubMedGoogle Scholar
- Kozak M: Comparison of initiation of protein synthesis in procaryotes, eucaryotes, and organelles. Microbiol Rev 1983, 47: 1-45.PubMed CentralPubMedGoogle Scholar
- McGuffin LJ, Bryson K, Jones DT: The PSIPRED protein structure prediction server. Bioinformatics 2000, 16: 404-405. 10.1093/bioinformatics/16.4.404View ArticlePubMedGoogle Scholar
- Jones DT: Improving the accuracy of transmembrane protein topology prediction using evolutionary information. Bioinformatics 2007, 23: 538-544. 10.1093/bioinformatics/btl677View ArticlePubMedGoogle Scholar
- Murphy PM: Viral exploitation and subversion of the immune system through chemokine mimicry. Nat Immunol 2001, 2: 116-122. 10.1038/84214View ArticlePubMedGoogle Scholar
- Lalani AS, McFadden G: Evasion and exploitation of chemokines by viruses. Cytokine Growth Factor Rev 1999, 10: 219-233. 10.1016/S1359-6101(99)00018-0View ArticlePubMedGoogle Scholar
- McLysaght A, Baldi PF, Gaut BS: Extensive gene gain associated with adaptive evolution of poxviruses. Proc Natl Acad Sci USA 2003, 100: 15655-15660. 10.1073/pnas.2136653100PubMed CentralView ArticlePubMedGoogle Scholar
- Hughes AL, Friedman R: Genome-wide survey for genes horizontally transferred from cellular organisms to baculoviruses. Mol Biol Evol 2003, 20: 979-987. 10.1093/molbev/msg107View ArticlePubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215: 403-410.View ArticlePubMedGoogle Scholar
- Thompson JD, Gibson TJ, Higgins DG: Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinformatics 2002,Chapter 2(Unit 2):3.PubMedGoogle Scholar
- Kumar S, Nei M, Dudley J, Tamura K: MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Brief Bioinform 2008, 9: 299-306. 10.1093/bib/bbn017PubMed CentralView ArticlePubMedGoogle Scholar
- Landan G, Graur D: Heads or tails: a simple reliability check for multiple sequence alignments. Mol Biol Evol 2007, 24: 1380-1383. 10.1093/molbev/msm060View ArticlePubMedGoogle Scholar
- Saitou N, Nei M: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 1987, 4: 406-425.PubMedGoogle Scholar
- Farris JS: Estimating phylogenetic trees from distance matrices. Am Nat 1972, 106: 645-668. 10.1086/282802View ArticleGoogle Scholar
- Nekrutenko A, Makova KD, Li WH: The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study. Genome Res 2002, 12: 198-202. 10.1101/gr.200901PubMed CentralView ArticlePubMedGoogle Scholar
- Attwood TK, Blythe MJ, Flower DR, Gaulton A, Mabey JE, Maudling N, McGregor L, Mitchell AL, Moulton G, Paine K, Scordis P: PRINTS and PRINTS-S shed light on protein ancestry. Nucleic Acids Res 2002, 30: 239-241. 10.1093/nar/30.1.239PubMed CentralView ArticlePubMedGoogle Scholar
- Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS, Pagni M, Sigrist CJ: The PROSITE database. Nucleic Acids Res 2006, 34: D227-230. 10.1093/nar/gkj063PubMed CentralView ArticlePubMedGoogle Scholar
- Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A: The Pfam protein families database. Nucleic Acids Res 2008, 36: D281-288. 10.1093/nar/gkm960PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.