Mimivirus relatives in the Sargasso sea
© Ghedin and Claverie; licensee BioMed Central Ltd. 2005
Received: 01 May 2005
Accepted: 16 August 2005
Published: 16 August 2005
The discovery and genome analysis of Acanthamoeba polyphaga Mimivirus, the largest known DNA virus, challenged much of the accepted dogma regarding viruses. Its particle size (>400 nm), genome length (1.2 million bp) and huge gene repertoire (911 protein coding genes) all contribute to blur the established boundaries between viruses and the smallest parasitic cellular organisms. Phylogenetic analyses also suggested that the Mimivirus lineage could have emerged prior to the individualization of cellular organisms from the three established domains, triggering a debate that can only be resolved by generating and analyzing more data. The next step is then to seek some evidence that Mimivirus is not the only representative of its kind and determine where to look for new Mimiviridae. An exhaustive similarity search of all Mimivirus predicted proteins against all publicly available sequences identified many of their closest homologues among the Sargasso Sea environmental sequences. Subsequent phylogenetic analyses suggested that unknown large viruses evolutionarily closer to Mimivirus than to any presently characterized species exist in abundance in the Sargasso Sea. Their isolation and genome sequencing could prove invaluable in understanding the origin and diversity of large DNA viruses, and shed some light on the role they eventually played in the emergence of eukaryotes.
The discovery and genome sequence analysis of Mimivirus [1, 2], the largest of the Nucleo-cytoplasmic Large DNA Viruses (NCLDV), challenged much of the accepted dogma regarding viruses. Its particle size (>400 nm), genome length (1.2 million bp) and extensive gene repertoire (911 protein coding genes) all contribute to blur the established boundaries between viruses and the smallest parasitic cellular organisms such as Mycoplasma or Nanoarchea . In the universal tree of life, the Mimivirus lineage appears to define a new branch, predating the emergence of all established eukaryotic kingdoms . Although this result is compatible with various hypotheses implicating ancestral DNA viruses in the emergence of eukaryotes [3–5], it requires confirmation from additional data. An urgent task is thus to convince ourselves that Mimivirus is not the sole representative of its kind (i.e. a viral counterpart to the platypus) and to provide some rational guidance as to where to begin the search for eventual new Mimiviridae.
Mimivirus was serendipitously discovered within Acanthamoeba polyphaga, a free-living ubiquitous amoeba, prevalent in aquatic environments. Phylogenetic analysis of the most conserved genes common to all nucleo-cytoplasmic large double-stranded DNA viruses (NCLDV)  positions Mimivirus as an independent lineage, roughly equidistant from the Phycodnaviridae (algal viruses) and Iridoviridae (predominantly fish viruses). Given the ecological affinity of these virus families for the marine environment, we have examined the sequence data set gathered through environmental microbial DNA sampling in the Sargasso Sea  to look for possible Mimivirus relatives.
Matching Status of Mimivirus core genes (type 1 to 4).
Best score in nr
Best score in DNA viruses
Best score in Sargasso Sea
Reciprocal Best match
Helicase III / VV D5
DNA pol (B family) extein
A32 virion packaging ATPase
VV A18 helicase
VV D6R – helicase
F10L – prot. Kinase
MuT-like NTP PP-hydrolase
Myristoylated virion prot. A
Small Ribonucl. reduct
Large Ribonucl. reduct
RNA pol. sub 2 (Rbp2)
RNA pol. sub.1 (Rpb1)
CIV193R BIR domain
Virion memb. prot
Topoisomerase I bacterial
Topoisomerase I pox-like
N1R/P28 DNA binding prot
Pre-mRNA helicase – splicing
RNA pol subunit5
GDP mannose 4,6-dehydratase
Ankyrin repeats (66 paralogs)
NAD-dependent DNA ligase
Our results predict that DNA viruses of 0.1 to 0.8 microns in size exist in the Sargasso Sea that are evolutionarily closer to Mimivirus than to any presently characterized species. These viruses are abundant enough to have been collected by environmental sampling. It must be noticed that a similar approach attempting to find relatives to two other unique NCLDVs, the African swine fever virus (the unique member of Asfarviridae) and the White spot syndrome virus, a major shrimp pathogen (the sole Nimaviridae), failed to provide convincing results (Claverie, data not shown). The identification of numerous Mimivirus-like sequences in the Sargasso Sea data is thus not simply the result of a large number of sequences been compared, but truly suggests that viruses from this clade are specifically abundant in the sampled marine environment. It is actually expected that many novel viruses will be encountered in natural waters in which they constitute the most abundant microrganisms [11, 12]. There might be as many as 10 billion virus particles per litre of ocean surface waters . Interestingly, the specialized literature abounds of descriptions of large virus-like particle associated with algae [e.g. [14–16]], or various marine protists [17, 18]. With the exception of Phycodnaviruses [19–21], the genomic characterization of these viruses has not been attempted. Guided by the results presented here, their isolation and genome sequencing could prove invaluable in understanding the diversity of DNA viruses and the role they eventually played in the evolution of eukaryotes.
Materials and methods
The protocols used to collect Sargasso Sea environmental micro-organisms and generate DNA sequences from these samples has been described elsewhere ). The data analyzed here correspond to "bacteria-sized" organisms that have passed through 3 μm filters and been retained by 0.8 μm to 0.1 μm filters. Mimivirus-like particles (0.8–0.4 μm) belong in this range.
Database similarity searches were performed using the Blast suite of programs  (default options) as implemented on the http://www.giantvirus.org web server and as implemented at The Institute for Genomic Research. Final similarity searches were performed on the non-redundant peptide sequence databases (nr) and environmental data (env-nr) downloaded from the National Institute for Biotechnology Information ftp server ftp://ftp.ncbi.nlm.nih.gov/blast/db/ on March 14, 2005. To avoid missing potential better matches with annotated virus ORFs, all Mimivirus ORFs exhibiting a best match (blosum62 scoring scheme) in env-nr were also searched against all DNA virus genomes using TblastN (peptide query against translated nucleotide sequence). The comprehensive list of Mimivirus ORFs exhibiting a best match in the env-nr database is given in Additional file: 1. Phylogenetic analyses were conducted using MEGA version 3.0  (option: Neighbor joining, 250 pseudo-replicates, and gaps handled by pairwise deletion). Tree branches were condensed for bootstrap values <50%.
Only Mimivirus ORFs with best matching homologues in DNA viruses and belonging to the nucleo-cytoplasmic large DNA virus core gene set (2, 6) were analyzed in detail. These ORFs (and matching status) are listed in Table 1. Phylogenetic analyses were limited to viral homologues and environmental sequences exhibiting a reciprocal best match relationship with the corresponding Mimivirus ORF (putative orthologues) (YES in the rightmost column). The three cases (red lines in Table 1) exhibiting the best bootstrap values are shown in Figure 1. Cases of complex relationships, for instance due to the presence of many paralogues (e.g. capsid proteins), are also indicated. These cases of non-reciprocal best matches are frequent (i.e. the closest homologue of a Mimivirus ORFs being an environmental sequence, but the latter sequence exhibiting a better match with a different ORF in the nr database).
Two environmental sampling contigs – contig IBEA_CTG_1979672 (AACY01022731, GI:44566181) and contig IBEA_CTG_1979673 (AACY01022732, GI:44566179) – are linked in a 4,465 bp scaffold (scaffold IBEA_SCF = 2208413) found to contain four ORFs with strong matches to Mimivirus peptides (R368, L377, L375, and L687). The three colinear ORFs (R368, L377, L375) are found on one contig while the orthologue to Mimivirus ORF L687 is found in the second contig. It is conceivable that the lack of colinearity for this fourth ORF is due to an assembly error.
We are indebted to James van Etten for pointing out some ancient observations of very large virus-like particles in algae and marine protists. We thank Stéphane Audic for his help with the http://www.giantvirus.org server and Hiroyuki Ogata and Vish Nene for reading the manuscript. This work was supported by internal funding from TIGR, CNRS, and the French National Genopole Network.
- La Scola B, Audic S, Robert C, Jungang L, de Lamballerie X, et al.: A giant virus in amoebae. Science 2003, 299: 2033. 10.1126/science.1081867View ArticlePubMed
- Raoult D, Audic S, Robert C, Abergel C, Renesto P, et al.: The 1.2-megabase genome sequence of Mimivirus. Science 2004, 306: 1344-1350. 10.1126/science.1101485View ArticlePubMed
- Villarreal LP, DeFilippis VR: A hypothesis for DNA viruses as the origin of eukaryotic replication proteins. J Virol 2000, 74: 7079-7084. 10.1128/JVI.74.15.7079-7084.2000PubMed CentralView ArticlePubMed
- Bell PJ: Viral eukaryogenesis: was the ancestor of the nucleus a complex DNA virus? J Mol Evol 2001, 53: 251-256. 10.1007/s002390010215View ArticlePubMed
- Takemura M: Poxviruses and the origin of the eukaryotic nucleus. J Mol Evol 2001, 52: 419-425.PubMed
- Iyer LM, Aravind L, Koonin EV: Common origin of four diverse families of large eukaryotic DNA viruses. J Virol 2001, 75: 11720-11734. 10.1128/JVI.75.23.11720-11734.2001PubMed CentralView ArticlePubMed
- Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, et al.: Environmental genome shotgun sequencing of the Sargasso Sea. Science 2004, 304: 66-74. 10.1126/science.1093857View ArticlePubMed
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389-3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMed
- Van Etten JL: Unusual life style of giant chlorella viruses. Annu Rev Genet 2003, 37: 153-195. 10.1146/annurev.genet.37.110801.143915View ArticlePubMed
- Kumar S, Tamura K, Nei M: MEGA3: Integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform 2004, 5: 150-163. 10.1186/1471-2105-5-150View ArticlePubMed
- Wommack KE, Colwell RR: Virioplankton: viruses in aquatic ecosytems. Microbiol Mol Biol Rev 2000, 64: 69-114. 10.1128/MMBR.64.1.69-114.2000PubMed CentralView ArticlePubMed
- Ghedin E, Fraser C: A virus with big ambitions. Trends Microbiol 2005, 13: 56-57. 10.1016/j.tim.2004.12.008View ArticlePubMed
- Fuhrman JA: Marine viruses and their biogeochemical and ecological effects. Nature 1999, 399: 541-548. 10.1038/21119View ArticlePubMed
- Mattox KR, Stewart KD: Probable virus infections in four genera of green algae. Can J Microbiol 1972, 18: 1620-1621.View ArticlePubMed
- Dodds JA, Cole A: Microscopy and biology of Uronema gigas, a filamentous eukaryotic green alga, and its associated tailed virus-like particle. Virology 1980, 100: 156-165.View ArticlePubMed
- Van Etten JL, Lane LC, Meints RH: Viruses and virus-like particles of eukaryotic algae. Microbiol Rev 1991, 55: 586-620.PubMed CentralPubMed
- Shin W, Boo SM: Virus-like particles in both nucleus and cytoplasm of Euglena viridis. Algological Studies 1999, 95: 125-131.
- Growing MM: Large VLPs from vacuoles of phaeodarian radiolarians and from other marine samples. Marine Ecology progress Series 1993, 101: 33-43.View Article
- Van Etten JL, Graves MV, Muller DG, Boland W, Delaroque N: Phycodnaviridae – large DNA algal viruses. Arch Virol 2002, 147: 1479-516. 10.1007/s00705-002-0822-6View ArticlePubMed
- Claverie JM: Giant viruses in the oceans: the 4th algal virus workshop. Virol J 2005, 2: 52. 10.1186/1743-422X-2-52PubMed CentralView ArticlePubMed
- Wilson WH, Schroeder DC, Allen MJ, Holden MTG, Parkhill J, et al.: Complete Genome Sequence and Lytic Phase Transcription Profile of a coccolithovirus. Science 2005, 309: 1090-1092. 10.1126/science.1113109View ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.