The Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) comprise a major, apparently monophyletic group of viruses that consists of 6 established virus families and a 7th putative family [1–3]. The NCLDV infect animals and diverse unicellular eukaryotes and either replicate exclusively within the so-called virus factories in the cytoplasm of the host cells [4, 5], or go through both cytoplasmic and nuclear stages in their reproduction cycle .
With the exception of some viruses in the Phycodnaviridae family that do not encode their own RNA polymerase subunits and hence depend on the host for transcription, the NCLDV do not show strong dependence on the host replication or transcription systems for completing their replication [6, 7]. This relative independence of the NCLDV from the host cells is consistent with the fact that these viruses encode many conserved proteins that mediate most of the processes essential for viral reproduction. These key proteins include DNA polymerases, primases, helicases, flap nucleases and DNA clamps that are responsible for DNA replication; Holliday junction resolvases and topoisomerases involved in genome DNA manipulation and processing; transcription factors that function in transcription initiation and elongation; ATPase pumps for DNA packaging; chaperones involved in the capsid assembly and the capsid proteins themselves [1–3, 8]. Although only 5 genes are conserved in all NCLDV (with sequenced genomes), evolutionary reconstruction using maximum parsimony or maximum likelihood approaches mapped between 40 and 50 genes to the putative common ancestor of the NCLDV . Given the compelling evidence in favor of the monophyly of the NCLDV, it has been recently proposed to formally recognize this group of viruses as a new taxon, the order Megavirales.
The best characterized family of the NCLDV is the Poxviridae that includes numerous viruses infecting animals including smallpox virus, the causative agent of one the most devastating human infectious diseases, and vaccinia virus, a classic model of molecular virology . Recently, however, the group of the NCLDV that had attracted the most attention had been the family Mimiviridae that encompasses by far the largest known viruses [11–13]. The giant Mimivirus, the prototype of the family, was isolated from Acanthamoeba polyphaga and shown to possess ~1.2 Mb genome and encompass more than 1000 protein-coding genes . Subsequently, 3 more genomes of related viruses have been sequenced, 2 of these even slightly larger than the Mimivirus genome [11, 15–19]. In addition, approximately 20 mimiviruses have been detected through genomic and proteomic surveys but have not yet been characterized in detail . Most of the currently identified mimiviruses infect the freshwater protist (and an opportunistic human pathogen) Acanthamoeba but the current genome size record holder, Megavirus chiliensis, was isolated from ocean water although its specific host remains unknown . Recently a giant (albeit somewhat smaller than the previously isolated mimiviruses, with a 700 Kb genome) virus has been isolated from the marine flagellate Cafeteria roenbergensis (and accordingly designated CroV after Cafeteria roenbegensis virus) [22, 23]. Phylogenetic analysis of the core NCLDV genes indicated that, among the other NCLDV, CroV was the closest relative of the mimiviruses and could be classified as a distant member of the family Mimiviridae[22, 24]. Furthermore, numerous sequences homologous to mimivirus genes have been identified in marine metagenomic samples indicating that mimiviruses are common in these habitats [25, 26]. Taken together, these findings indicate that Mimiviridae is an expansive family of giant viruses the true diversity of which remains largely untapped.
In addition to all the core NCLDV genes, members of the family Mimiviridae possess many genes the presence of which in viruses is unexpected, in particular genes encoding components of the translation systems such as aminoacyl-tRNA synthetases and translation factors [14, 21]. The discovery of these genes that comprise parts of the core molecular machinery of all cellular life forms but are uncharacteristic of viruses fueled the debate on the controversial possibility that mimiviruses represent a “fourth domain of life” [9, 14, 24, 27–29].
A notable feature of giant viruses is that they harbor their own mobilome, a collection of diverse selfish elements that depend on a giant virus for their reproduction. In addition to self-splicing introns and inteins, mimiviruses support the replication of transpovirons, a distinct type of linear plasmids, and virophages, small viruses that replicate within the intracellular factories of the host giant virus [30, 31]. The first discovered virophage, dubbed Sputnik, is a parasite of the Mamavirus and closely related mimiviruses, and is an icosahedral virus with an approximately 20 kilobase dsDNA genome . Subsequently, it has been shown that Sputnik can integrate into the genome of the host mimiviruses . Two distinct virophages have been shown to infect CroV  and Organic Lake phycodnavirus ; these virophages resemble Sputnik in terms of the overall virion and genome structure but substantially differ in their gene repertoires.
As part of an effort to understand the evolutionary history and ultimately the origin of the giant viruses, we constructed Clusters of Mimivirus Orthologous Genes (mimiCOGs) and reassessed the relationship of the family Mimiviridae with the other NLCDV. The result is a potential major expansion of the family Mimiviridae that is shown to include several viruses previously classified as members of Phycodnaviridae.