Mimiviridae: clusters of orthologous genes, reconstruction of gene repertoire evolution and proposed expansion of the giant virus family
© Yutin et al.; licensee BioMed Central Ltd. 2013
Received: 11 February 2013
Accepted: 27 March 2013
Published: 4 April 2013
The family Mimiviridae belongs to the large monophyletic group of Nucleo-Cytoplasmic Large DNA Viruses (NCLDV; proposed order Megavirales) and encompasses giant viruses infecting amoeba and probably other unicellular eukaryotes. The recent discovery of the Cafeteria roenbergensis virus (CroV), a distant relative of the prototype mimiviruses, led to a substantial expansion of the genetic variance within the family Mimiviridae. In the light of these findings, a reassessment of the relationships between the mimiviruses and other NCLDV and reconstruction of the evolution of giant virus genomes emerge as interesting and timely goals.
Database searches for the protein sequences encoded in the genomes of several viruses originally classified as members of the family Phycodnaviridae, in particular Organic Lake phycodnaviruses and Phaeocystis globosa viruses (OLPG), revealed a greater number of highly similar homologs in members of the Mimiviridae than in phycodnaviruses. We constructed a collection of 898 Clusters of Orthologous Genes for the putative expanded family Mimiviridae (MimiCOGs) and used these clusters for a comprehensive phylogenetic analysis of the genes that are conserved in most of the NCLDV. The topologies of the phylogenetic trees for these conserved viral genes strongly support the monophyly of the OLPG and the mimiviruses. The same tree topology was obtained by analysis of the phyletic patterns of conserved viral genes. We further employed the mimiCOGs to obtain a maximum likelihood reconstruction of the history of genes losses and gains among the giant viruses. The results reveal massive gene gain in the mimivirus branch and modest gene gain in the OLPG branch.
These phylogenomic results reported here suggest a substantial expansion of the family Mimiviridae. The proposed expanded family encompasses a greater diversity of viruses including a group of viruses with much smaller genomes than those of the original members of the Mimiviridae. If the OLPG group is included in an expanded family Mimiviridae, it becomes the only family of giant viruses currently shown to host virophages. The mimiCOGs are expected to become a key resource for phylogenomics of giant viruses.
The Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) comprise a major, apparently monophyletic group of viruses that consists of 6 established virus families and a 7th putative family [1–3]. The NCLDV infect animals and diverse unicellular eukaryotes and either replicate exclusively within the so-called virus factories in the cytoplasm of the host cells [4, 5], or go through both cytoplasmic and nuclear stages in their reproduction cycle .
With the exception of some viruses in the Phycodnaviridae family that do not encode their own RNA polymerase subunits and hence depend on the host for transcription, the NCLDV do not show strong dependence on the host replication or transcription systems for completing their replication [6, 7]. This relative independence of the NCLDV from the host cells is consistent with the fact that these viruses encode many conserved proteins that mediate most of the processes essential for viral reproduction. These key proteins include DNA polymerases, primases, helicases, flap nucleases and DNA clamps that are responsible for DNA replication; Holliday junction resolvases and topoisomerases involved in genome DNA manipulation and processing; transcription factors that function in transcription initiation and elongation; ATPase pumps for DNA packaging; chaperones involved in the capsid assembly and the capsid proteins themselves [1–3, 8]. Although only 5 genes are conserved in all NCLDV (with sequenced genomes), evolutionary reconstruction using maximum parsimony or maximum likelihood approaches mapped between 40 and 50 genes to the putative common ancestor of the NCLDV . Given the compelling evidence in favor of the monophyly of the NCLDV, it has been recently proposed to formally recognize this group of viruses as a new taxon, the order Megavirales.
The best characterized family of the NCLDV is the Poxviridae that includes numerous viruses infecting animals including smallpox virus, the causative agent of one the most devastating human infectious diseases, and vaccinia virus, a classic model of molecular virology . Recently, however, the group of the NCLDV that had attracted the most attention had been the family Mimiviridae that encompasses by far the largest known viruses [11–13]. The giant Mimivirus, the prototype of the family, was isolated from Acanthamoeba polyphaga and shown to possess ~1.2 Mb genome and encompass more than 1000 protein-coding genes . Subsequently, 3 more genomes of related viruses have been sequenced, 2 of these even slightly larger than the Mimivirus genome [11, 15–19]. In addition, approximately 20 mimiviruses have been detected through genomic and proteomic surveys but have not yet been characterized in detail . Most of the currently identified mimiviruses infect the freshwater protist (and an opportunistic human pathogen) Acanthamoeba but the current genome size record holder, Megavirus chiliensis, was isolated from ocean water although its specific host remains unknown . Recently a giant (albeit somewhat smaller than the previously isolated mimiviruses, with a 700 Kb genome) virus has been isolated from the marine flagellate Cafeteria roenbergensis (and accordingly designated CroV after Cafeteria roenbegensis virus) [22, 23]. Phylogenetic analysis of the core NCLDV genes indicated that, among the other NCLDV, CroV was the closest relative of the mimiviruses and could be classified as a distant member of the family Mimiviridae[22, 24]. Furthermore, numerous sequences homologous to mimivirus genes have been identified in marine metagenomic samples indicating that mimiviruses are common in these habitats [25, 26]. Taken together, these findings indicate that Mimiviridae is an expansive family of giant viruses the true diversity of which remains largely untapped.
In addition to all the core NCLDV genes, members of the family Mimiviridae possess many genes the presence of which in viruses is unexpected, in particular genes encoding components of the translation systems such as aminoacyl-tRNA synthetases and translation factors [14, 21]. The discovery of these genes that comprise parts of the core molecular machinery of all cellular life forms but are uncharacteristic of viruses fueled the debate on the controversial possibility that mimiviruses represent a “fourth domain of life” [9, 14, 24, 27–29].
A notable feature of giant viruses is that they harbor their own mobilome, a collection of diverse selfish elements that depend on a giant virus for their reproduction. In addition to self-splicing introns and inteins, mimiviruses support the replication of transpovirons, a distinct type of linear plasmids, and virophages, small viruses that replicate within the intracellular factories of the host giant virus [30, 31]. The first discovered virophage, dubbed Sputnik, is a parasite of the Mamavirus and closely related mimiviruses, and is an icosahedral virus with an approximately 20 kilobase dsDNA genome . Subsequently, it has been shown that Sputnik can integrate into the genome of the host mimiviruses . Two distinct virophages have been shown to infect CroV  and Organic Lake phycodnavirus ; these virophages resemble Sputnik in terms of the overall virion and genome structure but substantially differ in their gene repertoires.
As part of an effort to understand the evolutionary history and ultimately the origin of the giant viruses, we constructed Clusters of Mimivirus Orthologous Genes (mimiCOGs) and reassessed the relationship of the family Mimiviridae with the other NLCDV. The result is a potential major expansion of the family Mimiviridae that is shown to include several viruses previously classified as members of Phycodnaviridae.
Results and discussion
Comparative genomics of the putative expanded family Mimiviridae
To further investigate the evolutionary provenance of these poorly characterized giant viruses (hereinafter OLPG, after Organic Lake and Phaeocystis globosa viruses), we conducted an in depth phylogenomic analysis of the previously identified and putative new members of the family Mimiviridae. To this end, we constructed clusters of orthologous genes (COGs [36, 37]) from the genomes of 4 mimiviruses (Acanthamoeba castellanii mamavirus, Acanthamoeba polyphaga mimivirus, Megavirus chiliensis, and Moumouvirus), CroV, and 3 OLPG [Organic Lake phycodnavirus 1, Organic Lake phycodnavirus 2 (these two genomes are still incomplete) and Phaeocystis globosa virus 12 T)]. The gene products encoded in these 8 genomes were retrieved from GenBank yielding a total of 5,677 protein sequences. These viral proteins were grouped into clusters of likely orthologs using a modified COG procedure  (see Methods for details). Clusters were manually edited and annotated using the results of RPS-BLAST and PSI-BLAST searches for the constituent proteins (Additional file 2 and see Methods). This procedure yielded 898 clusters of candidate orthologous genes from the putative expanded family Mimiviridae (hereinafter mimiCOGs). The mimiCOGs then were merged into the previously constructed clusters of orthologous genes for all NCLDV (NCVOGs ) (see Methods for details).
Conserved proteins of the putative extended Mimiviridae family
Proteins present in all 8 Mimiviridae genomes
A1L transcription factor VLTF-2
Proliferating cell nuclear antigen
A2L transcription factor VLTF-3
protein disulfide Isomerase/thioredoxin family
asnB, asparagine synthetase B
putative DNA-directed RNA polymerase II subunit N
replication factor C small subunit
replication factor C small subunit
DEAD/SNF2-like helicase or ATP-dependent RNA helicase
DNA directed RNA polymerase subunit L
DNA mismatch repair ATPase MutS
ribonucleosidediphosphatereductase large subunit
DNA polymerase elongation subunit family B
ribonucleosidediphosphatereductase small subunit
DNA topoisomerase IB
DNA topoisomerase II
Transcription factor S-II (TFIIS)-domain-containing protein
DNA-dependent RNA polymerase subunit Rpb9/M
transcription initiation factor IIB
DNA-directed RNA polymerase subunit 5 (RPB5)
ubiquitin-conjugating enzyme E2
DNA-directed RNA polymerase subunit 6
DNA-directed RNA polymerase subunit alpha
VV A18-like helicase
DNA-directed RNA polymerase subunit beta
VV A32 virion packaging ATPase
DNA-directed RNA polymerase subunit E’ (RPB7)
YqaJ-like viral recombinase
Erv1 / Alr family oxidoreductase
eukaryotic translation initiation factor 4E-like protein
Holliday junction resolvase
mRNA capping enzyme
poxvirus poly(A) polymerase catalytic subunit-like protein
probable ubiquitin carboxyl-terminal hydrolase
Genes missing in one or two OLPG genomes but present in all the other Mimiviridae genomes
AAA family ATPase
Lon domain protease
chaperone protein DnaJ
chaperone protein DnaJ
heat shock 70 kDa protein
XRN 5'-3' exonuclease
The Mimiviridae-OLPG clade in the phylogenetic trees of conserved NCLDV genes
We used the mimiCOGs to conduct a new phylogenomic analysis of the ancestral NCLDV genes in an attempt to elucidate the evolutionary affinity of the OLPG (Additional file 3: Table S2). Phylogenetic trees were constructed for all clusters of orthologous genes that included the mimiviruses, OLPG and phycodnaviruses and for which the number of informative sites in the multiple sequence alignment was sufficient for phylogenetic analysis.
Genes involved in DNA replication, recombination and repair
In the D5 helicase tree (Figure 4B), OLPG and mimiviruses are paraphyletic but form a well-supported clade with iridoviruses and Marseilleviruses whereas phycodnaviruses group with bacteria and bacteriophages, probably as a result of xenologous gene displacement .
The phylogenetic tree of DNA topoisomerase II contains a strongly supported OLPG-Mimiviridae clade (Figure 4C); the topology of this tree is nearly identical to that of the DNA polymerase tree. The tree of the YqaJ-like recombinase also supports the OLPG-Mimiviridae clade (Figure 4D). By contrast, in the tree of RuvC-like Holliday junction resolvases, the OLPG fail to cluster with either phycodnaviruses or mimiviruses (Figure 4E).
Genes involved in transcription and RNA processing
The phylogenetic analysis of the Major Coat Protein (MCP) gene required a modified approach because the mimiviruses [18, 19] as well as OLPG  encompass multiple paralogous MCP genes some of which are extremely diverged in sequence [18, 19], hampering the construction of robust phylogenetic trees. Therefore we first aligned all detected MCP sequences from Mimiviridae, OLPG, Phycodnaviridae, Iridoviridae and Marseilleviridae (the sequences from Asfarviridae and Poxviridae being in this case too distant) and constructed a preliminary phylogenetic tree. This tree was used to identify the fastest evolving MCP homologs (the longest branches) which were then removed from the sequence alignment that was when used to construct the final phylogenetic tree. In this MCP phylogeny, the OLPG-Mimiviridae clade was recovered with moderate statistical support (Figure 7B).
Reconstruction of the evolution of giant viruses
Taken together, the phylogenomic results presented here indicate that the OLPG are the sister group of the family Mimiviridae within the NCLDV phylogeny. This conclusion is supported by the topologies of the phylogenetic trees for most of the core NCLDV genes that show monophyly of OLPG and the mimiviruses (Figures 4, 5, 6, 7, 8 and Additional file 3: Table S2). Although some of the phylogenies are poorly resolved, none of them shows clustering of the OLPG with or within the phycodnaviruses. Moreover, for some of the core NCLDV genes, conservative statistical tests reject affiliation of OLPG with Phycodnaviruses. Given that the OLPG, at least so far, are a group with limited diversity, it seems plausible that eventually the family Mimiviridae is expanded to include these viruses. Alternatively, OLPG could become a new family within the proposed order Megavirales.
The OLPG encompass few genes encoding translation system components that are one of the signatures of the mimivirus genomes [14, 21] (the only translation-related gene that was apparently acquired by the common ancestor of the OLPG and the mimiviruses is the homolog of the elongation factor 2E) indicating that these genes largely were acquired by an ancestral mimivirus.
An Organic Lake “phycodnavirus” has been identified as a host to a distinct virophage (OLV)  that is distantly related to the Sputnik virophage infecting mimiviruses [16, 31] and the Mavirus virophage infecting CroV . The findings described here indicate that so far only viruses within the (extended) family Mimiviridae support the reproduction of virophages. Recently, numerous sequences of putative virophages have been assembled from metagenomics sequences originating from diverse environments . In particular, 4 complete virophage genomes distantly related to the OLV have been assembled from Yellowstone Lake metagenomic data. The presents results lead us to hypothesize that these novel virophages also infect member of the family Mimiviridae, in particular still unknown representatives of the OLPG group.
Finally, it is worth noting that the mimiCOGs developed in the course of this work are expected to become a key resource for a comprehensive phylogenomic study of the giant viruses, and in particular a full assessment of the fourth domain hypothesis.
For the construction of mimiCOGs, the following genomes were downloaded from GenBank (http://www.ncbi.nlm.nih.gov/): Acanthamoeba polyphaga mimivirus (GI:311977355), Acanthamoeba castellanii mamavirus (GI:351737110), Megavirus chiliensis (GI:350610932), Cafeteria roenbergensis virus BV-PW1 (GI:310830989), Phaeocystis globosa virus 12 T (GI: 357289534), Organic Lake phycodnavirus 1 (GI:322510471),Organic Lake phycodnavirus 2 (GI:322510873), Marseillevirus (GI:284504040), and Lausannevirus (GI:327409548). The complete dataset consisted of 6,548 protein sequences. The mimiCOGs were constructed as previously described . Briefly, the procedure included the following steps: 1) Initial clusters based on triangles of symmetrical best hits were constructed using a modified COG algorithm using as the input the results of all-against-all BLASTP  comparison; 2) Multiple alignments of the initial cluster members were constructed using the MUSCLE program . The alignments were used to generate position-specific scoring matrices (PSSM) for a PSI-BLAST search  against the original protein dataset. Significantly similar proteins were added to the corresponding clusters; 3) Clusters with nearly complementary phyletic patterns and high inter-cluster sequence similarity were manually examined and merged whenever appropriate; 4) The mimiCOGs were manually edited and annotated using annotations of Moumouvirus and Mamavirus proteins present and RPS-BLAST  and PSI-BLAST of other cluster members; 5) MimiCOG-NCVOG correspondence was established by PSI-BLAST search initiated with PSSMs constructed from NCVOG alignments  against proteins included in the mimiCOGs. The mimiCOGs are available at ftp://ftp.ncbi.nih.gov/pub/koonin/mimivirus/mimiCOGs.
Neighbor-Joining tree based on the phyletic patterns
Presence-absence matrices of mimiCOGs and corresponding NCVOGs were combined, whenever correspondence was established, and binarized yielding 584 patterns (see Additional file 5). Nineteen NCVOG patterns were amended by adding OLPG proteins that have not been included in the mimiCOGs based on the result of PSI-BLAST searches initiated by NCVOG PSSMs against proteins used for mimiCOG construction. The remaining 727 NCVOGs and 393 mimiCOGs were considered non-overlapping and added to the pool resulting in the total of 1,723 patterns. For each pair of species the number of clusters where each of them were present (N1 and N2) as well as the number of clusters where both species were present (NU) were computed. The gene content similarity measure (s) was calculated as s = NU/sqrt(N1 × N2) and converted to a distance measure (d) as d = -ln(s). A neighbor-joining tree was constructed from the distance matrices using the NEIGHBOR program of Phylip 3.66 . Bootstrap values were obtained by 1,000 resamplings of the 1,723 patterns.
Multiple alignment and phylogenetic tree construction
The sequences for phylogenetic analysis were collected using (i) BLAST searches against nr and environmental (env_nr) databases initiated by distant mimiCOG members; (ii) the corresponding NCVOG sequences ; and (iii) reference sequences used for the core NCVOG study . Nearly identical sequences were eliminated using BLASTCLUST (http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/doc/blast/blastclust.html). The sequences were aligned using MUSCLE . All alignments were manually checked for the conservation of domain architecture and presence of diagnostic motifs. Positions including gaps in more than one-third of the sequences and positions with low information content were removed prior to tree computation . A preliminary maximum-likelihood tree was constructed using the FastTree program with default parameters (JTT evolutionary model, discrete gamma model with 20 rate categories; ). The preliminary tree and the alignment were then used to determine the best substitution matrix using Prottest . Final maximum-likelihood trees were constructed using TreeFinder (1,000 replicates, Search Depth 2 ), with the substitution matrix found to be the best for a given alignment. The Expected-Likelihood Weights (ELW) of 1,000 local rearrangements were used as confidence values of TreeFinder tree branches. For topology testing, whenever applicable, alternative (constrained) topologies were constructed and compared to the initial trees using TreeFinder. Approximately unbiased (AU) test P value cutoff 0.05 was used for rejecting tree topologies .
Reconstruction of gene losses and gains
The Neighbor-Joining gene content tree of the NCLDV and the gene presence-absence matrix for the mimiCOGs and NCVOGs were used to reconstruct the gene loss and gain events in the evolution of the NCLDV using the COUNT program , as previously described .
NY and EVK are supported by intramural funds of the US Department of Health and Human services (to the National Library of Medicine).
- Iyer LM, Aravind L, Koonin EV: Common origin of four diverse families of large eukaryotic DNA viruses. J Virol 2001,75(23):11720–11734.PubMedView Article
- Iyer LM, Balaji S, Koonin EV, Aravind L: Evolutionary genomics of nucleo-cytoplasmic large DNA viruses. Virus Res 2006,117(1):156–184.PubMedView Article
- Koonin EV, Yutin N: Origin and evolution of eukaryotic large nucleo-cytoplasmic DNA viruses. Intervirology 2010,53(5):284–292.PubMedView Article
- Netherton CL, Wileman T: Virus factories, double membrane vesicles and viroplasm generated in animal cells. Curr Opin Virol 2011,1(5):381–387.PubMedView Article
- de Castro IF, Volonte L, Risco C: Virus factories: biogenesis and structural design. Cell Microbiol 2013,15(1):24–34.PubMedView Article
- Van Etten JL, Dunigan DD: Chloroviruses: not your everyday plant virus. Trends Plant Sci 2012,17(1):1–8.PubMedView Article
- Van Etten JL: Unusual life style of giant chlorella viruses. Annu Rev Genet 2003, 37:153–195.PubMedView Article
- Yutin N, Wolf YI, Raoult D, Koonin EV: Eukaryotic large nucleo-cytoplasmic DNA viruses: clusters of orthologous genes and reconstruction of viral genome evolution. Virol J 2009, 6:223.PubMedView Article
- Colson P, de Lamballerie X, Fournous G, Raoult D: Reclassification of giant viruses composing a fourth domain of life in the new order Megavirales. Intervirology 2012,55(5):321–332.PubMedView Article
- Moss B: Poxviridae: the viruses and their replication. In Fields Virology. 2nd edition. Edited by: Knipe DM, Howley PM. Philadelphia: Lippincott Williams & Wilkins; 2007:2905–2946.
- Claverie JM, Abergel C: Mimivirus: the emerging paradox of quasi-autonomous viruses. Trends Genet 2010,26(10):431–437.PubMedView Article
- Claverie JM, Ogata H, Audic S, Abergel C, Suhre K, Fournier PE: Mimivirus and the emerging concept of “giant” virus. Virus Res 2006,117(1):133–144.PubMedView Article
- Raoult D, Forterre P: Redefining viruses: lessons from Mimivirus. Nat Rev Microbiol 2008, 6:315–319.PubMedView Article
- Raoult D, Audic S, Robert C, Abergel C, Renesto P, Ogata H, La Scola B, Suzan M, Claverie JM: The 1.2-megabase genome sequence of Mimivirus. Science 2004,306(5700):1344–1350.PubMedView Article
- Suzan-Monti M, La Scola B, Raoult D: Genomic and evolutionary aspects of Mimivirus. Virus Res 2005,117(1):145–155.PubMedView Article
- La Scola B, Desnues C, Pagnier I, Robert C, Barrassi L, Fournous G, Merchat M, Suzan-Monti M, Forterre P, Koonin E: The virophage as a unique parasite of the giant mimivirus. Nature 2008,455(7209):100–104.PubMedView Article
- Claverie JM, Abergel C, Ogata H: Mimivirus. Curr Top Microbiol Immunol 2009, 328:89–121.PubMedView Article
- Colson P, Yutin N, Shabalina SA, Robert C, Fournous G, La Scola B, Raoult D, Koonin EV: Viruses with more than 1,000 genes: Mamavirus, a new Acanthamoeba polyphaga mimivirus strain, and reannotation of Mimivirus genes. Genome Biol Evol 2011, 3:737–742.PubMedView Article
- Yoosuf N, Yutin N, Colson P, Shabalina SA, Pagnier I, Robert C, Azza S, Klose T, Wong J, Rossmann MG: Related giant viruses in distant locations and different habitats: Acanthamoeba polyphaga moumouvirus represents a third lineage of the Mimiviridae that is close to the megavirus lineage. Genome Biol Evol 2012,4(12):1324–1330.PubMedView Article
- La Scola B, Campocasso A, N’Dong R, Fournous G, Barrassi L, Flaudrops C, Raoult D: Tentative characterization of new environmental giant viruses by MALDI-TOF mass spectrometry. Intervirology 2010,53(5):344–353.PubMedView Article
- Arslan D, Legendre M, Seltzer V, Abergel C, Claverie JM: Distant Mimivirus relative with a larger genome highlights the fundamental features of Megaviridae. Proc Natl Acad Sci USA 2011,108(42):17486–17491.PubMedView Article
- Fischer MG, Allen MJ, Wilson WH, Suttle CA: Giant virus with a remarkable complement of genes infects marine zooplankton. Proc Natl Acad Sci USA 2010,107(45):19508–19513.PubMedView Article
- Van Etten JL: Another really, really big virus. Viruses 2011,3(1):32–46.PubMedView Article
- Colson P, Gimenez G, Boyer M, Fournous G, Raoult D: The giant Cafeteria roenbergensis virus that infects a widespread marine phagocytic protist is a new member of the fourth domain of Life. PLoS One 2011,6(4):e18935.PubMedView Article
- Monier A, Claverie JM, Ogata H: Taxonomic distribution of large DNA viruses in the sea. Genome Biol 2008,9(7):R106.PubMedView Article
- Monier A, Larsen JB, Sandaa RA, Bratbak G, Claverie JM, Ogata H: Marine mimivirus relatives are probably large algal viruses. Virol J 2008, 5:12.PubMedView Article
- Williams TA, Embley TM, Heinz E: Informational gene phylogenies do not support a fourth domain of life for nucleocytoplasmic large DNA viruses. PLoS One 2011,6(6):e21080.PubMedView Article
- Legendre M, Arslan D, Abergel C, Claverie JM: Genomics of Megavirus and the elusive fourth domain of Life. Commun Integr Biol 2012,5(1):102–106.PubMedView Article
- Nasir A, Kim KM, Caetano-Anolles G: Giant viruses coexisted with the cellular ancestors and represent a distinct supergroup along with superkingdoms Archaea. Bacteria and Eukarya. BMC Evol Biol 2012,12(1):156.View Article
- Desnues C, La Scola B, Yutin N, Fournous G, Robert C, Azza S, Jardot P, Monteil S, Campocasso A, Koonin EV: Provirophages and transpovirons as the diverse mobilome of giant viruses. Proc Natl Acad Sci USA 2012,109(44):18078–18083.PubMedView Article
- Desnues C, Boyer M, Raoult D: Sputnik, a virophage infecting the viral domain of life. Adv Virus Res 2012, 82:63–89.PubMedView Article
- Fischer MG, Suttle CA: A virophage at the origin of large DNA transposons. Science 2011,332(6026):231–234.PubMedView Article
- Yau S, Lauro FM, DeMaere MZ, Brown MV, Thomas T, Raftery MJ, Andrews-Pfannkoch C, Lewis M, Hoffman JM, Gibson JA: Virophage control of antarctic algal host-virus dynamics. Proc Natl Acad Sci USA 2011,108(15):6163–6168.PubMedView Article
- Baudoux AC, Brussaard CP: Characterization of different viruses infecting the marine harmful algal bloom species Phaeocystis globosa. Virology 2005,341(1):80–90.PubMedView Article
- Brussard CPG, Bratbak G, Baudoux AC, Ruardij P: Phaeocystis and its interaction with viruses. Biogeochemistry 2007, 83:201–215.View Article
- Tatusov RL, Koonin EV, Lipman DJ: A genomic perspective on protein families. Science 1997,278(5338):631–637.PubMedView Article
- Kristensen DM, Wolf YI, Mushegian AR, Koonin EV: Computational methods for Gene Orthology inference. Brief Bioinform 2011,12(5):379–391.PubMedView Article
- Kristensen DM, Kannan L, Coleman MK, Wolf YI, Sorokin A, Koonin EV, Mushegian A: A low-polynomial algorithm for assembling clusters of orthologous groups from intergenomic symmetric best matches. Bioinformatics 2010,26(12):1481–1487.PubMedView Article
- Yutin N, Koonin EV: Proteorhodopsin genes in giant viruses. Biol Direct 2012, 7:34.PubMedView Article
- Wolf YI, Rogozin IB, Grishin NV, Tatusov RL, Koonin EV: Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol Biol 2001, 1:8.PubMedView Article
- Yutin N, Koonin EV: Hidden evolutionary complexity of Nucleo-Cytoplasmic Large DNA viruses of eukaryotes. Virol J 2012,9(1):161.PubMedView Article
- Csuros M: Count: evolutionary analysis of phylogenetic profiles with parsimony and likelihood. Bioinformatics 2010,26(15):1910–1912.PubMedView Article
- Colson P, Raoult D: Gene repertoire of amoeba-associated giant viruses. Intervirology 2010,53(5):330–343.PubMedView Article
- Greub G, Raoult D: Microorganisms resistant to free-living amoebae. Clin Microbiol Rev 2004,17(2):413–433.PubMedView Article
- Raoult D, Boyer M: Amoebae as genitors and reservoirs of giant viruses. Intervirology 2010,53(5):321–329.PubMedView Article
- Boyer M, Yutin N, Pagnier I, Barrassi L, Fournous G, Espinosa L, Robert C, Azza S, Sun S, Rossmann MG: Giant Marseillevirus highlights the role of amoebae as a melting pot in emergence of chimeric microorganisms. Proc Natl Acad Sci USA 2009,106(51):21848–21853.PubMedView Article
- Zhou J, Zhang W, Yan S, Xiao J, Zhang Y, Li B, Pan Y, Wang Y: Diversity of virophages in metagenomic datasets. J Virol 2013,87(8):4225–4236.PubMedView Article
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997,25(17):3389–3402.PubMedView Article
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004,32(5):1792–1797.PubMedView Article
- Marchler-Bauer A, Bryant SH: CD-Search: protein domain annotations on the fly. Nucleic Acids Res 2004,32(Web Server issue):W327-W331.PubMedView Article
- Felsenstein J: Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. Methods Enzymol 1996, 266:418–427.PubMedView Article
- Yutin N, Makarova KS, Mekhedov SL, Wolf YI, Koonin EV: The deep archaeal roots of eukaryotes. Mol Biol Evol 2008,25(8):1619–1630.PubMedView Article
- Price MN, Dehal PS, Arkin AP: FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS One 2010,5(3):e9490.PubMedView Article
- Darriba D, Taboada GL, Doallo R, Posada D: ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 2011,27(8):1164–1165.PubMedView Article
- Jobb G, von Haeseler A, Strimmer K: TREEFINDER: a powerful graphical analysis environment for molecular phylogenetics. BMC Evol Biol 2004, 4:18.PubMedView Article
- Shimodaira H: An approximately unbiased test of phylogenetic tree selection. Syst Biol 2002,51(3):492–508.PubMedView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.