Comparative genomic analysis of the family Iridoviridae: re-annotating and defining the core set of iridovirus genes

Background Members of the family Iridoviridae can cause severe diseases resulting in significant economic and environmental losses. Very little is known about how iridoviruses cause disease in their host. In the present study, we describe the re-analysis of the Iridoviridae family of complex DNA viruses using a variety of comparative genomic tools to yield a greater consensus among the annotated sequences of its members. Results A series of genomic sequence comparisons were made among, and between the Ranavirus and Megalocytivirus genera in order to identify novel conserved ORFs. Of these two genera, the Megalocytivirus genomes required the greatest number of altered annotations. Prior to our re-analysis, the Megalocytivirus species orange-spotted grouper iridovirus and rock bream iridovirus shared 99% sequence identity, but only 82 out of 118 potential ORFs were annotated; in contrast, we predict that these species share an identical complement of genes. These annotation changes allowed the redefinition of the group of core genes shared by all iridoviruses. Seven new core genes were identified, bringing the total number to 26. Conclusion Our re-analysis of genomes within the Iridoviridae family provides a unifying framework to understand the biology of these viruses. Further re-defining the core set of iridovirus genes will continue to lead us to a better understanding of the phylogenetic relationships between individual iridoviruses as well as giving us a much deeper understanding of iridovirus replication. In addition, this analysis will provide a better framework for characterizing and annotating currently unclassified iridoviruses.

Iridoviruses have been found to infect invertebrates and poikilothermic vertebrates, including amphibians, reptiles, and fish [4]. Iridovirus infections produce symptoms that range from subclinical to very severe, which may also result in significant mortality [5][6][7][8][9]. The high pathogenicity associated with some members of the iridovirus family has had a significant impact on modern aquaculture, fish farming, and wildlife conservation. For example, systemic iridovirus infections have been found in economically important freshwater and marine fish species worldwide. In addition, iridovirus infections have been implicated in amphibian population declines, representing a set of emerging infectious diseases whose spread has been accelerated by human activities [10][11][12][13][14].
Despite the economic and ecological significance of iridoviruses, very little is currently known about their molecular biology. One approach towards gaining a deeper understanding of iridoviral pathogenesis is to investigate the core set of essential genes conserved among all members of the family. The genomes of twelve iridoviruses, including at least one from each genus, have been completely sequenced (Table 1). According to the previously published annotations, these genomes contained only 19 core genes associated with a variety of viral activities: transcriptional regulation, DNA metabolism, protein modification, and viral structure. Definition of this core set of genes also highlights those genes that are conserved across some, but not all, genera, and unique genes found within a single species. These non-core genes may be involved in specific virus-host interactions, enhancement of virus replication, and augmented pathogenesis in certain species.
Despite the growing number of sequenced iridovirus genomes, no systematic comparative genomic analysis of the family has yet been performed. Thus, annotation of these genomes has been performed without standardization and has so far been guided primarily by the position of start/stop codons rather than the presence of homologous sequences. As a result, some long overlapping potential ORFs have been automatically designated as coding sequences, and smaller homologous ORFs overlooked. In this paper, we have taken a comparative genomics approach to re-examine the annotation of all twelve iridovirus genomes, using the Viral Orthologous Clusters (VOCs) [15] and Viral Genome Organizer (VGO) [16] software. These re-annotated genomes were then analysed further, both to define the core set of iridovirus genes more accurately, and to provide a deeper understanding into the phylogenetic relationship between individual iridovirus species.

Re-annotation of Iridovirus genomes
One objective of this project was to demonstrate the application of comparative genomics to annotating viral genomes, particularly those that have been poorly characterized experimentally. In an earlier study, we utilized comparative genomics to identify previously unannotated small viral ORFs in the Poxviridae [17]. Here, we focused our analysis on the Iridoviridae family, which represents a challenge in genome annotation since there is little experimental evidence available to confirm gene expression. Another problem is that iridovirus promoter elements have not been well characterized, and thus cannot be used as a reliable criterion for assigning ORFs. These combined factors made previous iridovirus gene annotation a somewhat arbitrary process, resulting in closely related iridovirus species with dramatic differences in their genomic annotations. Therefore, we decided to analyse all members of this family using a standardized comparative genomics approach, using the fact that ORFs that are conserved in more than one divergent species are likely to be functional genes. These three viruses display a co-linear arrangement of genes with an overall DNA sequence identity of greater than 90%. In the analysis of this genus, differences in gene content were examined in detail. Dotplots were used to determine presence of orthologous DNA and a variety of BLAST searches and the VGO genome visualization software were used to determine the reason (frameshifts, extra stop codons) behind the apparent absence of some ORFs.
Using this approach, a substantial number of ORFs were either added to, or deleted from members of the Megalocytivirus genus ( Table 2). OSGIV and RBIV share 99% DNA sequence identity, and thus are probably different strains of the same virus; however, previous annotation described only 82 out of 118 total annotated ORFs shared by the two genomes [18,19]. After our re-analysis, the RBIV and OSGIV genomes had an identical complement of annotated genes. Furthermore, this re-annotated ISKNV genome contained 110 ORFs orthologous with both RBIV and OSGIV (compared to 71 in the old annotation.) ( Table 2) [18,20].
In the process of re-examining these genomes, we annotated a number of genes containing apparent frameshift mutations between species. In RBIV we annotated ten genes with potential frameshift mutations, while OSGIV had four such genes ( Table 2). All of the genes containing potential frameshift mutations had orthologs in the other two members of the Megalocytivirus genus (Table 2). In some cases, these mutations may be the result of natural mutations within the viruses; however, it is also possible that these apparent frameshift mutations are actually sequencing errors. For both RBIV and OSGIV, PCR primers based on the ISKNV sequence were used to amplify genomic fragments, which were subsequently sequenced [18,19]. It is possible that errors were introduced during the PCR process, leading to apparent frameshifts in the reported sequence. It is interesting to note that the genomic sequence of ISKNV (sequenced using subcloned fragments rather than PCR products) [20], had significantly fewer annotation changes made during our re-analysis. Though we have not experimentally proven that the frameshift mutations in OSGIV and RBIV are the result of sequencing errors, it would be useful to focus future sequencing efforts on these regions, to determine if the reported sequences are indeed correct.
After re-annotating the Megalocytivirus genus, we applied the same comparative genomic analysis to the Ranavirus genus. The genus contains five sequenced members divided into two groups, each with a high degree of sequence conservation and a co-linear arrangement of genes. The first group is comprised of frog virus 3 (FV3), tiger frog virus (TFV), and Ambystoma tigrinum virus (ATV). The second group contains Singapore grouper iridovirus (SGIV) and grouper iridovirus (GIV).
The first step in the re-annotation of the Ranavirus genus was a comparative genomic analysis of FV3, TFV, and ATV. This resulted in an increase in the number of conserved annotated genes from 76 to 87 (Table 3). Subsequent reanalysis of the second Ranavirus group, containing SGIV and GIV, resulted in an increase from 131 to 138 conserved annotated ORFs (Table 4). It should be noted that two of the newly annotated ORFs, SGIV 0.5L and GIV 120.5L, appear to "wrap around", beginning at one end of the genome with the remainder of the ORF located at the opposite end [21,22]. These apparent "split ORFs" are actually the result of the circularly permutated iridovirus genome being represented as a linear genomic sequence, when the arbitrarily chosen start point happens to fall in the middle of an ORF [23].
As seen above, our comparative genomic approach was able to identify previously unannotated ORFs, homologous ORFs with potential frameshifts, and ORFs split between the two ends of a circular genome. Although this approach proved extremely successful for the Ranavirus and Megalocytivirus genera, we were unable to use it for the Chloriridovirus, Iridovirus, and Lymphocystivirus genera. This is due to the lack of co-linearity and the highly divergent sets of genes that exist between the members of these genera, as well as the low number of available genome sequences. However, we did modify the annotations of lymphocystis disease virus-China (LCDV-China) and invertebrate iridescent virus-6 (IIV-6). The previous annotations of these genomes of both species had contained a large number of overlapping ORFs [2,24], which we decided to exclude on several grounds. First, LCDV-China and IIV-6 are the only iridoviruses, out of the twelve so far sequenced, in which overlapping ORFs have been annotated. In addition, the original sequencing paper for IIV-6 [2] and a follow-up paper by the same group [25] did not include a number of the overlapping ORFs reported in the database sequence, presumably due to their small size and lack of similarity with other viral and cellular genes. Finally, there is no experimental or bioinformatics evidence to suggest that any of these ORFs encode proteins. Therefore, to improve the overall consistency of the Iridoviridae family annotations, we removed the small overlapping ORF annotations from the LCDV-China and IIV-6 genomic sequences (Table 5, Additional File 1 &2).
Identifying genes conserved between some, but not all, iridovirus species can give us important information when investigating evolutionary relationships within the family. A number of past phylogenetic analyses of Iridoviridae have used phylogenic trees constructed from aligned protein sequences [1,[18][19][20]22,24,27]. However, there are potential problems with phylogenic analysis based on comparisons of single genes. This type of analysis is rarely consistent due to horizontal gene transfer [28] and variable rates of evolution [29]. Therefore, we decided to take a whole genome comparative phylogenetic analysis to understand the relationship between iridoviruses. Our approach was to identify all the genes conserved between different genera to gain a better understanding of the relationships within the iridovirus family. This approach yields an indication of how similar in gene content 2 genomes are. Our whole-genome comparative analysis, grouped orthologous genes between genera (Figures 1 &2 and Additional File 3), and was consistent with phylogenic trees constructed from single protein sequences. Based on gene conservation, the Ranavirus and Lymphocystivirus genera appear to be most closely related to one another ( Figure 2). In addition, the Iridovirus and Chloriridovirus genera are also closely related to one another based on presence of orthologous genes ( Figure  2). In contrast, the Megalocytivirus genus and the Iridovirus/Chloriridovirus genera are equally divergent from each other as well as all other Iridoviridae family members ( Figure 2).
As the list of sequenced iridovirus genomes grows, the non-co-linearity between many of these genomes becomes more apparent. The Megalocytivirus and Ranavi-rus, but not the Chloriridovirus, Iridovirus, and Lymphocystivirus genera, show a co-linear arrangement of genes within each genus. However, comparisons of genomic sequences from different genera suggest no co-linearity. This trend may be the result of the high recombination rates [30] seen in some iridovirus members [31]. For example, within the Ranavirus genus, ATV has two inversions relative to the FV3 and TFV sequences [30], reducing the colinearity of these genomes to some degree. Figure 3A shows how two recombination events could convert FV3 to the ATV arrangement of genes. In contrast, a comparison between the more distantly related members within the Ranavirus genus (such as FV3 and GIV) demonstrate a much more dramatic loss of co-linearity. No long stretches of co-linear genes exist between these sequences, although small sections of co-linearity remain as seen through a dotplot analysis between FV3 and GIV ( Figure  3B). The dotplot shows small regions of co-linearity scattered throughout the genome of FV3 and GIV as seen by short diagonal lines on the dotplot ( Figure 3B). A schematic representation of the co-linearity between FV3 and GIV demonstrates that co-linearity occurs in small clusters of genes often only 2-4 genes in length ( Figure 3C).

Conclusion
The Iridoviridae family can cause severe diseases resulting in significant economic and environmental losses. Very little is known about how iridoviruses cause disease in their host. Our re-analysis of genomes within the Iridoviridae family provides a unifying framework to understand the biology of these viruses. For example, the re-analysis of the Iridoviridae family has increased the consistency of annotated sequences from viruses within the same genus.
In addition, the re-analysis has helped create a much greater consensus among Iridoviridae family members and enhanced our understanding of this virus family as a whole. The updated annotations that we have produced for the iridovirus sequences can be found in the additional files to this paper; in addition, the databases and tools to analyse Iridoviridae genomes are available to all researchers [32]. This database will contain genomes from the original GenBank files and also the edited genomes The Iridoviridae core genes are shown. a ORFs that have been added or altered are highlighted in bold b Potentially frameshifted ORF described in this paper. Further re-defining the core set of iridovirus genes will continue to lead us to a better understanding of the phylogenetic relationships between individual iridoviruses as well as giving us a much deeper understanding of iridovirus replication. In addition, this analysis will provide a better framework for characterizing and annotating currently unclassified iridoviruses.

Re-annotation of the iridoviridae
Annotated sequences for the twelve completely sequenced iridovirus genomes (Table 1) were obtained from Gen-Bank files and imported into the Viral Orthologous Clus-ters (VOCs) database [15]. Species from the same genus were examined using VOCs to identify all of the orthologous genes. The analysis then focused on the differences found between genomes within the same genus. For those genomes that contained co-linear arrangements of genes (those in the Ranavirus and Megalocytivirus genera), we compared those regions containing annotated ORFs. If more than two sequenced genomes were available for a given genus, and the ORF was present in at least two of the genomes, then we set out to determine if that ORF was also present in the remainder of the genomes. By this method, we were able to re-annotate small segments of each genome without needing to re-analyse the entire Conserved Iridovirus Genes Figure 1 Conserved Iridovirus Genes. Every Iridoviridae gene that has an ortholog in at least 2 Iridoviridae genera are shown. Orthologs share the same row on the table. The genes within each genus are color-coded for easier identification. As long as at least one member of the genus contains an ortholog, the entire genus is highlighted. Where multiple ORFs are listed for a particular gene name, the ORFs represent multiple orthologs of the gene in that viral species. The remainder of the figure showing just the genes conserved between the Iridovirus and Chloriridovirus genera are included in Additional File 3. Lymphocystivirus  Iridovirus  Chloriridovirus  Megalocytivirus  Gene Name  FV3  TFV  ATV  SGIV  GIV  LCDV-1  LCDV-C  IIV-6  IIV-3  ISKNV  RBIV  OGIV  N System amino acid transporter  5L  114L  1L  1L  1L  ICP-46  5R  6R  77R  79R  Unknown ATV p04  4R  5R  4R  genome. The Viral Genome Organizer (VGO) software [16] was used to visualize the annotated ORFs, as well as the start and stop codons found within each genome.

Analysis of orthologous genes
We used a combination of BLAST searches and queries using the VOCs software [32] to define orthologous genes between Iridoviridae genera. VOCs is a JAVA client-server that accesses a sequence query language (SQL) database containing iridovirus genomes. This SQL database permits complex queries to be assembled in an easy to use graphical user interface. VOCs initially groups orthologous genes into families based on BLASTP scores, these can be manually checked and altered if necessary.

Dotplot analysis
Dotplots of FV3 and GIV were done using JDotter [33]. JDotter provides an interactive input window that links JDotter to the VOCs database. The sequences for the FV3 and GIV were obtained through the VOCs database.

Competing interests
The author(s) declare that there are no competing interests.

Authors' contributions
HEE, JM, EP, and CRB carried out the analysis of the Iridoviridae family and generated the tables and figures. VTJ and CU generated the databases and tools to carry out the Phylogenetic relationships between the five iridovirus genera based on gene content Figure 2 Phylogenetic relationships between the five iridovirus genera based on gene content. Individual viral species were compared within a genus to identify the number of orthologous genes. Orthologous genes between viral genera were then determined. The numbers on each line identify the number of orthologous genes shared between viral species or genera including the 26 core genes. The Iridovirus and Chloriridovirus genera have a high degree of gene conservation and a combined genera box (Iridovirus/Chloriridovirus) was used to compare orthologous genes between genera. In addition, two subgroups of the Ranavirus genus are shown. Each subgroup contains a virtually identical complement of genes. However, a comparison between the FV3/TFV/ ATV subgroup with the SGIV/GIV subgroup revealed 72 orthologous genes.
analysis done in the manuscript. CRB and CU conceived of the study, and participated in its design and coordina-tion and helped to draft the manuscript. All authors read and approved the final manuscript.
Co-linearity found within the Ranavirus genus There is a limited amount of co-linearity found between FV3/TFV/ ATV and SGIV/GIV. The co-linearity has been visualized using a dotplot analysis between FV3 (horizontal sequence) and GIV (vertical sequence). Genes are colored either red or blue representing right-or left-ward transcription respectively. (C) The co-linearity between FV3 and GIV is generally composed of stretches of 2 or 3 co-linear orthologous genes. Orthologous genes, in a co-linear arrangement are schematically shown as blocks of the same color on either FV3 or GIV genomic sequence.