- Open Access
Genome comparison of two Coccolithoviruses
© Allen et al; licensee BioMed Central Ltd. 2006
- Received: 25 October 2005
- Accepted: 22 March 2006
- Published: 22 March 2006
The Coccolithoviridae is a recently discovered family of viruses that infect the marine coccolithophorid Emiliania huxleyi. Following on from the sequencing of the type strain EhV-86, we have sequenced a second strain, EhV-163.
We have sequenced approximately 80% of the EhV-163 genome, equating to more than 200 full length CDSs. Conserved and variable CDSs and a gene replacement have been identified in the EhV-86 and EhV-163 genomes.
The sequencing of EhV-163 has provided a wealth of information which will aid the re-annotating of the EhV-86 genome and identified a gene insertion in EhV-163.
- Basic Local Alignment Search Tool
- Homing Endonuclease
- High Selection Pressure
- Algal Virus
- Putative Membrane Protein
We recently determined the whole genome sequence of the Coccolithoviridae strain EhV-86, a giant dsDNA algal virus from the family Phycodnaviridae that infects the marine coccolithophorid Emiliania huxleyi . Core genes common to nuclear-cytoplasmic large DNA virus (NCLDV) genomes were identified and eight of these genes were used to create a phylogenetic tree in which EhV-86 was placed at the root of the Phycodnaviridae . Due to the placement of EhV-86 on a branch distinct from other Phycodnaviridae and the presence of six RNA polymerase subunits (unique among the Phycodnaviridae) we suggested this genus would eventually be renamed as a subfamily of the Phycodnaviridae termed Coccolithovirinae.
Strain EhV-86 was originally isolated, along with many others, in 1999 from an Emiliania huxleyi bloom in the English Channel [3, 4]. In contrast, EhV-163 was isolated from the geographically distinct area of Western Norway during a mesocosm experiment in 2000 . Both virus genomes were initially estimated to be approximately 410 kbp in size. We have subsequently sequenced the entire EhV-86 genome and shown it to be 407, 933 base pairs (bp) . Phylogenetic analysis based on the DNA polymerase gene has previously shown that EhV-163 is distinct from all English Channel strains isolated thus far . In order to gain further insight into both the common and unique relationship these two viruses have with their host, Emiliania huxleyi, and their possible placement within a putative subfamily, we have undertaken to sequence a second coccolithovirus genome, EhV-163.
The sequencing of EhV-86 was hindered by the highly repetitive nature of the genome (three different types of repeat family were identified ), which suggested the elucidation, in a much smaller scaled project, of a second closely related strain would be difficult. However, by using a random shotgun approach at first, followed by a second directed approach to fill in missing sequence based on an EhV-86 backbone, we have managed to sequence approximately 322 kbp of the EhV-163 genome in 267 contigs, equating to around 80% of the estimated genome size. This has provided enough genetic information to perform an analysis of the two coccolithovirus genomes. Of the 472 CDSs predicted in the EhV-86 genome , from the EhV-163 contigs, full sequence was obtained for 202 CDSs and partial sequence was obtained for a further 182 CDSs. Contigs from EhV-163 were typically between 95–100% identical to EhV-86 sequence (Additional file 1). Regardless of contig size and content (intergenic or genic), EhV-163 contigs aligned with perfect colinearity (except in one case, discussed below) to the EhV-86 genome sequence.
Highly conserved CDSs
Of the 202 CDSs that had complete sequence, 20 were identical at DNA level and a further 17 were identical at the amino acid level (Additional file 1). These 37 conserved CDSs are distributed throughout the genome; however there are some that appear to be clustered together in 4 regions. CDSs ehv027 (unknown function), ehv028 (putative ligase) and ehv029 (putative membrane protein); ehv135 (putative membrane protein) and ehv136 (unknown function); ehv165 (putative membrane protein), ehv166 (putative RING finger containing protein), ehv167 (RNA polymerase subunit 10) and ehv168 (putative membrane protein); and ehv260 (unknown function), ehv261 (unknown function) and ehv263 (unknown function) are found in these four clusters. The high degree of conservation among these 37 CDSs implies they are under high selection pressure or were recently acquired by the last common ancestor of EhV-86 and EhV-163. Since it has been shown previously that RNA polymerase was present in the ancestral NCLDV prior to the divergence of the Poxviridae, Iridoviridae, Asfariviridae, Phycodnaviridae and Mimiviridae families, it is likely that for ehv167, at least, the high degree of conservation is due to a high selection pressure [2, 5, 6]. This also implies that RNA polymerase function is crucial to the infection strategy of coccolithoviruses, providing further evidence for a life style distinct from the other previously sequenced Phycodnaviridae (PBCV-1 and ESV-1).
Variation in CDSs
Examples of genetic changes in the predicted CDSs of EhV-163 in comparison with EhV-86.
3' variable region
21 bp deletion
7 amino acid insertion
27 bp variable region containing a 3 bp insertion
9 amino acid variable region
24 bp and 12 bp insertions
8 and 4 amino acid inserts
1 bp insert
Numerous point mutations
Highly variable protein sequence
1 bp insertion
12 bp deletion, 1 bp insertion
Two 3 bp deletions, 3 bp and 21 bp insertions
Variable protein sequence
24 bp insertion, 15 bp insertion, point mutation creating stop codon
Inserts of 8 and 5 amino acids. Truncated protein.
9 bp and 18 bp insertions
Inserts of 3 and 6 amino acids
Point mutation in stop codon
3 bp insertion, 11 bp deletion, numerous small deletions
Point mutation creating stop codon
Point mutation in stop codon
1 bp insert
14 bp insert
16 bp insert
Point mutation creating stop codon
Point mutation in stop codon
21 bp deletion
7 amino acid deletion
Point mutation in start codon, 1 bp deletion
Altered Start of translation
Six 1 bp deletions
Variable protein sequence
When annotating a genome it is often necessary to predict where the start of translation codons are. The advantage in having two related genomes is that you can re-check your annotation. This is particularly important in the coccolithoviruses since the majority of CDSs have no database homologues making gene prediction difficult. The vast majority of CDSs in EhV-86 appear to be very similar to their EhV-163 equivalents. However, there are some CDSs that appear to need re-annotating in the light of the sequence data from EhV-163 (Table 1, Additional file 1).
For example, although the overlapping of CDSs is common is some virus genomes , this is not a common occurrence in the EhV-86 genome. However, an overlap of CDSs occurs in EhV-86 with ehv380 and ehv381. This overlap does not occur in EhV-163, due to a change in the predicted start of translation methionine codon (ATG to ATA) and a 1 bp deletion that would otherwise cause a frameshift. It therefore appears likely that, in EhV-163 at least, the start of translation occurs from the ATG that is present 36 bp downstream of current predicted ATG start codon of ehv381 in EhV-86.
EhV-86 and EhV-163 belong to a unique family of algal viruses whose genomes contain a high proportion of genes of unknown function. The sequencing of EhV-163 has provided a wealth of information which will aid the re-annotating of parts of the EhV-86 genome and identified an intriguing gene replacement and a highly divergent CDS in the two genomes. Furthermore, the discovery of highly conserved non-core genes of unknown function in these strains suggests their importance to these viruses, adding further credence to the hypothesis that the Coccolithovirus genus has lifestyle distinct from other members of the Phycodnaviridae.
Preparation of EhV-163 concentrate
Six 1L cultures of exponentially growing E. huxleyi CCMP1516, at a cell concentration of 1.2 × 106 cells/ml, were each inoculated with 1 ml of EhV-163 (~2 × 105 pfu/ml). Growth was monitored by cell counts in a Reichert haemocytometer under a light microscope. Four days post-inoculation, the decimated cultures were subjected to a filtration, concentration and purification regime [3, 11].
Virus DNA extraction
DNA was extracted from CsCl-purified EhV-163 by initially treating the sample with proteinase K (5 mg/ml) in a lysis buffer containing 20 mM EDTA, pH 8.0 and 0.5% SDS (w/v) at 65°C for 1 h. 0.1 × volume aliquots of phenol were added to the samples, after which the DNA was extracted with an equal volume of chloroform:isoamyl alcohol (24:1). The DNA was precipitated with the addition of 0.5 × volume 7.5 M ammonium acetate, pH 7.5 and 2.5 × volume absolute ethanol. Virus DNA was stored in molecular grade water (Sigma) prior to genome sequencing.
Genomic DNA was sheared by sonication, ligated into pCR-Blunt (Invitrogen) and sequenced using M13 forward and reverse primers. After 2700 reads, the sequence was assembled into contigs and analysed using SeqMan (DNAstar). Following alignment to the backbone of EhV-86, 229 primer pairs were designed, specific to the EhV-163 gDNA sequence, to attempt to amplify the missing gaps. The sequence, annealing temperature and genomic location (in relation to EhV-86) of the primers designed can be found in the NERC environmental genomic data catalogue at http://envgen.nox.ac.uk under EnvBase accession number egcat:00010. When a PCR product was obtained, it was sequenced directly using both primers and the resulting sequence added to the contig library. The depth of sequence coverage varied across the genome due to the random nature of the initial sequencing strategy. Depth of coverage varied from just one sequence read for some regions to up to18 for others, with an average coverage of approximately 3. In areas of low coverage, sequence reads containing ambiguous results were removed from the analysis. 267 contigs were generated, covering approximately 80% of the EhV-163 genome. These contigs have been submitted to Genbank under the accession numbers DQ127552-DQ127818. This data is also available from http://envgen.nox.ac.uk, EnvBase accession number egcat:00010.
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences by comparing nucleotide or protein sequences to sequence databases and calculating the statistical significance of matches. Protein-protein BLAST (BLAST-P) and Position-specific iterated BLAST (PSI-BLAST) were performed on CDSs of interest online at http://www.ncbi.nlm.nih.gov/BLAST/. Artemis Comparison Tool (ACT) (http://www.sanger.ac.uk/Software/ACT/) was used to compare the EhV-163 contigs against the EhV-86 genome.
This research was supported by grants awarded to WHW from the Natural Environment Research Council (NERC) Environmental Genomics thematic program (ref. NE/A509332/1) and from Marine Genomics Europe, through framework programme FP6 of the European Commission. DCS is a Marine Biological Association of the UK (MBA) Research Fellow funded by grant in aid from the NERC. WHW is supported through the NERC-funded core strategic research programme of the Plymouth Marine Laboratory. We would like to acknowledge support from NERC Environmental Bioinformatics Centre, Centre for Ecology and Hydrology, Oxford for help with data storage and administration.
- Wilson WH, Schroeder DC, Allen MJ, Holden MTG, Parkhill J, Barrell BG, Churcher C, Hamlin N, Mungall K, Norbertczak H, Quail MA, Price C, Rabbinowitsch E, Walker D, Craigon M, Roy D, Ghazal P: Complete Genome Sequence and Lytic Phase Transcription Profile of a Coccolithovirus. Science 2005,309(5737):1090-1092. 10.1126/science.1113109View ArticlePubMedGoogle Scholar
- Allen MJ, Schroeder DC, Holden MT, Wilson WH: Evolutionary History of the Coccolithoviridae. Mol Biol Evol 2006,23(1):86-92. 10.1093/molbev/msj010View ArticlePubMedGoogle Scholar
- Schroeder DC, Oke J, Malin G, Wilson WH: Coccolithovirus (Phycodnaviridae): Characterisation of a new large dsDNA algal virus that infects Emiliania huxleyi. Arch Virol 2002,147(9):1685-1698. 10.1007/s00705-002-0841-3View ArticlePubMedGoogle Scholar
- Wilson WH, Tarran GA, Schroeder D, Cox M, Oke J, Malin G: Isolation of viruses responsible for the demise of an Emiliania huxleyi bloom in the English Channel. J Mar Biol Ass UK 2002, 82: 369-377. 10.1017/S002531540200560XView ArticleGoogle Scholar
- Allen MJ, Schroeder DC, Wilson WH: Preliminary characterisation of repeat families in the genome of EhV-86, a giant algal virus that infects the marine microalga Emiliania huxleyi. Arch Virol 2006, 151: 525–535. 10.1007/s00705-005-0647-1View ArticlePubMedGoogle Scholar
- Iyer LM, Aravind L, Koonin EV: Common Origin of Four Diverse Families of Large Eukaryotic DNA Viruses. J Virol 2001,75(23):11720-11734. 10.1128/JVI.75.23.11720-11734.2001PubMed CentralView ArticlePubMedGoogle Scholar
- Firth AE, Brown CM: Detecting overlapping coding sequences with pairwise alignments. Bioinformatics 2005,21(3):282-292. 10.1093/bioinformatics/bti007View ArticlePubMedGoogle Scholar
- Pires de Miranda M, Reading PC, Tscharke DC, Murphy BJ, Smith GL: The vaccinia virus kelch-like protein C2L affects calcium-independent adhesion to the extracellular matrix and inflammation in a murine intradermal model. J Gen Virol 2003,84(Pt 9):2459-2471. 10.1099/vir.0.19292-0View ArticlePubMedGoogle Scholar
- Kochneva G, Kolosova I, Maksyutova T, Ryabchikova E, Shchelkunov S: Effects of deletions of kelch-like genes on cowpox virus biological properties. Arch Virol 2005.Google Scholar
- Tulman ER, Afonso CL, Lu Z, Zsak L, Sur JH, Sandybaev NT, Kerembekova UZ, Zaitsev VL, Kutish GF, Rock DL: The genomes of sheeppox and goatpox viruses. J Virol 2002,76(12):6054-6061. 10.1128/JVI.76.12.6054-6061.2002PubMed CentralView ArticlePubMedGoogle Scholar
- Schroeder DC, Oke J, Hall M, Malin G, Wilson WH: Virus Succession Observed during an Emiliania huxleyi Bloom. Appl Environ Microbiol 2003,69(5):2484-2490. 10.1128/AEM.69.5.2484-2490.2003PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.