Sequence characteristics of T4-like bacteriophage IME08 benome termini revealed by high throughput sequencing
© Jiang et al; licensee BioMed Central Ltd. 2011
Received: 16 February 2011
Accepted: 27 April 2011
Published: 27 April 2011
T4 phage is a model species that has contributed broadly to our understanding of molecular biology. T4 DNA replication and packaging share various mechanisms with human double-stranded DNA viruses such as herpes virus. The literature indicates that T4-like phage genomes have permuted terminal sequences, and are generated by a DNA terminase in a sequence-independent manner;
genomic DNA of T4-like bacteriophage IME08 was subjected to high throughput sequencing, and the read sequences with extraordinarily high occurrences were analyzed;
we demonstrate that both the 5' and 3' termini of the IME08 genome starts with base G or A. The presence of a consensus sequence TTGGA|G around the breakpoint of the high frequency read sequences suggests that the terminase cuts the branched pre-genome in a sequence-preferred manner. Our analysis also shows that terminal cleavage is asymmetric, with one end cut at a consensus sequence, and the other end generated randomly. The sequence-preferred cleavage may produce sticky-ends, but with each end being packaged with different efficiencies;
this study illustrates how high throughput sequencing can be used to probe replication and packaging mechanisms in bacteriophages and/or viruses.
T4-like bacteriophages share several characteristics with other double-stranded viruses, including lambda-like phages and herpes viruses[1–3]. One striking feature is the DNA replication and packaging mechanisms that involve the coordinated actions of numerous proteins[4–6]. There is evidence that T4 phage DNA replication and recombination processes generate a highly branched concatemeric DNA, which is then cut and packaged into an empty protein shell (prehead)[2, 7]. Terminases (gp16 and gp17) are thought to carry out the digestion, whereby gp16 recognizes a DNA substrate and directs the larger gp17 subunit to the cleavage site, which is then cleaved by gp17 associated endonucleolytic activity[8, 9].
Two general packaging mechanisms have been described for double-stranded DNA bacteriophages. In phage lambda, T7 and T3, DNA ends with unique sequences are generated by terminases that recognize and cleave the cos sites. While in phage P22, SPP1 and P1, the pre-genome DNA is cleaved in a strictly headful mechanism, in which DNA packaging starts in the vicinity of a packaging site, pac, and terminates at a length a little longer than the genome, with both termini generated by sequence-independent cleavage. The mechanism of T4-like phage DNA packaging remains obscure. The T4-like phage genome does not contain a cos site. Although the T4-like phage genome is also packaged in a headful mode, the terminal ends appear to be generated at random sequences across the genome[12, 13].
High throughput sequencing is a novel tool for molecular biology studies and has found application in a variety of fields. It can serve as a powerful tool for in-depth genome sequence and gene expression analysis. We have isolated a novel Enterobacteria bacteriophage, IME08, from local hospital sewage. Preliminary study with Sanger sequencing of random PCR clones revealed that IME08 was a T4-like phage. Since the T4-like bacteriophages have a genome of 150 to 230 kilobases, we adopted a high throughput sequencing strategy to sequence its genome. Following genome sequence assembly and gene annotation, we observed several interesting characteristics of the genome termini that would have never been revealed using conventional techniques. Here we demonstrate with high throughput sequencing that the termini of IME08 carries consensus sequences, indicating that it may adopt a mechanism of sequence-preferred cleavage, contrary to previous assertions. This study also demonstrates the use of high throughput sequencing techniques to study virus DNA replication and packaging.
DNA sequencing and sequence assembly
Phage IME08 genomic DNA was extracted as previously described, and sequenced using the Solexa Genome Analyser (Illumina, San Diego, CA, USA) at BGI (formerly known as the Beijing Genomics Institute). The sample preparation, library construction and sequencing by synthesis were performed according to Illumina's paired end sequencing protocols. Briefly, the phage IME08 genomic DNA sample was sheared to about 500 bp using a compressed air device nebulizer. After the ends of the sheared DNA were blunted using Illumina's Blunting Enzyme Mix, an A base was added to the 3' termini to generate 3' protrude double-stranded DNA molecules. A Y structure adapter (formed by Oligo1: 5' ACA CTC TTT CCC TAC ACG ACG CTC TTC CGA TCT 3', and Oligo2: 5' p-GAT CGG AAG AGC GGT TCA GCA GGA ATG CCG AG 3') was ligated to the 3' protrude DNA fragments. The ligated fragments of about 550 bp were isolated via gel extraction and amplified by 15 cycles of PCR using the primer PE1 (5' CAA GCA GAA GAC GGC ATA CGA GAT CGG TCT CGG CAT TCC TGC TGA ACC GCT CTT CCG ATC T 3') and PE2 (5' AAT GAT ACG GCG ACC ACC GAG ATC TAC ACT CTT TCC CTA CAC GAC GCT CTT CCG ATC T) to generate sequences with different adaptor sequences. The PCR products were loaded onto the flowcell of Illumina Solexa Genome Analyser machine, where the DNA molecules hybridized with the flowcell bound single stranded oligonucleotides complementary to the sequences of the abovementioned PCR primers. After PCR-based "cluster generation" by "bridge amplification" , "sequencing by synthesis" was performed sequentially using read 1 sequencing primer (5' TGT GAG AAA GGG ATG TGC TGC GAG AAG GCT AGA 3') and read 2 sequencing primer (5' CGG TCT CGG CAT TCC TGC TGA ACC GCT CTT CCG ATC T 3') to generate two raw sequencing read files (1.fq and 2.fq) with read length of 73 bp and 75 bp respectively. The complete genomic sequence of phage IME08 was then assembled by Velvet and verified with other assembly software including ABYSS and SOAPdenovo.
The potential coding regions of the IME08 genome was predicted using the software Kodon (Applied Math, Sint-Martens-Latem, Belgium) with a minimum open reading frame (ORF) size of 50 amino acids, and with the "Bacterial and Plant Plastid Code" as translation table. The putative coding regions were then BLASTed against the bacteriophage genome database downloaded from the European Molecular Biology Laboratory (EMBL). The best matches were used to annotate the IME08 ORFs. A total of 253 ORFs were identified and annotated. tRNA genes were predicted using tRNAscan-SE (v.1.21).
Nucleotide sequence accession number
IME08 sequence data were deposited at GenBank under the accession number NC_014260.
Results and Discussion
High throughput sequencing, contig assembly and gene annotation
High throughput sequencing of the IME08 genome by the Solexa Genome Analyser generated 5,011,480 pairs of reads (73 bp and 75 bp, respectively), which is about 742 Mbp, or 4,300-fold coverage. To assemble the genomic sequence using Velvet, we first extracted a small fraction of paired-end data (about 20-fold coverage) from the original data files to test the k-mer value, and determined that k-mers of 27 to 31 gave optimal results. Different amount of paired-end data were tested to assemble the IME08 genome and the results showed that 20-100 fold coverages (random subset data) were capable of assembling the full-length genome sequence as a single contig, without any gaps or unresolved nucleotides.
The assembled full-length contigs varied in length and contained redundant duplicated sequences at the ends, suggesting a circular genome. This is consistent with the characteristic T4-like phage genome, which is circularly permuted and terminally redundant. The accuracy of the Velvet assembly was further verified by assembly tools ABYSS and SOAPdenovo, and by aligning the assembled sequence with homologous T4-like phage genomes.
The genome of IME08 consists of 172253 bp, with an average GC Content of 39%. About 90% of the genome encodes a total of 253 predicted protein genes (CDSs) and three tRNA genes. Homology analysis indicated that IME08 is closely related with T4-like phage JS98[21, 22]. Detailed genomic analysis and evolutionary study of IME08 is in preparation.
High frequency read sequences suggest that T4-like phage genome termini are located at hot-spots
Top 20 high frequency sequences in raw sequencing data
The phage IME08 genome is relatively small compared with the genomes of cellular organisms (such as prokaryotic or eukaryotic species). Since no obvious repeat sequences were observed, the highly elevated occurrence of particular unique sequences suggests they are not randomly sheared at library construction, but are the original terminal sequences which already exist prior to shearing in large amount in the phage genomic DNA sample. This result indicates that the termini of the T4-like phage IME08 genome were generated by terminase cleavage at particular hot spots.
T4-like phage genomes start with G or A bases
Consensus sequences of HFSs reveal sequence-preferred cleavage by T4-like phage terminase
All the above analyses demonstrate that the mechanism for generating the T4-like phage IME08 termini involves a highly sequence-preferred cleavage. This is inconsistent with previous reports by Bhattacharyya et al that T4 terminase (gp16/gp17) cleaves genomic DNA in a sequence-independent manner[12, 13]. The most possible explanation for this inconsistency may be that the terminase is only sequence-preferred, but not strictly sequence-specific, and it can cut both in a sequence-preferred manner (the first cut of a particular genome) and in a nearly random manner (the second cut of a particular genome, see discussion below). When the terminase was expressed from a strong promoter like phage T7 or λ pL promoters which was used by Bhattacharyya et al, the terminase was produced in large quantity and led to excessive random digestion of the plasmid DNA and the host bacterial chromosome DNA[12, 13]. Since the plasmid was too small and might not contain any sequence suitable for sequence-preferred digestion, and the host bacterial DNA was too large and contained large amount of terminase preferred sequences, the resultant cleavage product was a mixture of DNA molecules with both consensus sequence termini and nearly random sequence termini. These termini can not be characterized be conversional sequencing techniques due to the large amount of random terminal sequences. In this context, high throughput sequencing is the best tool for the study.
Unbalanced occurrence of forward and reverse HFSs indicate packaging of the genome was asymmetric and that 5' termini are less permuted than 3' termini
To further characterize HFSs, the top 20 HFSs (Table 1) as well as the top 200 HFSs (Additional file 1) were analyzed for their orientation relative to the genome. This analysis showed that the forward HFSs dominate in the two raw sequencing data files, with both having a forward vs. reverse ratio of roughly 1.5:1. Among the top 10 HFSs there was only one reverse sequence (Table 1). The higher occurrence of forward HFSs suggests that the 5' terminus (relative to genome orientation) were less permuted than the 3' terminus. This phenomenon also indicates that the genomic DNA packaging is asymmetric. This difference may be explained by either of the following mechanisms. The first mechanism is that recombination-dependent DNA replication is asymmetric, and forward replication dominates the process, resulting in more genomes with forward HFSs being generated and packaged. Alternatively, it may be that the forward sequence contained some sequences more preferred by the terminase.
Sequence-preferred cleavage may produce sticky-ends and both ends may be packaged but with different efficiency
Since the forwards HFSs and their reverse counterparts R+2 occur at different frequencies (some of these differences are very large), the two ends generated by the same cleavage may not be simultenously packaged into two virions (i.e., one packaged, the other lost), and there may be a packaging preference between the two ends derived from the same cleavage.
Unpaired distribution of HFSs on the genome suggests distinct processes for upstream and downstream genome ends, with one end generated in a sequence-preferred manner and the other randomly generated
Genome distribution of the top 50 forward and top 50 reverse HFSs also indicates that the HFSs are not evenly distributed alone the genome, which may reflect the activities of genome replication initiation or transcription. The overall frequencies of forward HFSs are greater than that of the reserve HFSs (Figure 7 and Additional file 3), again suggesting that the replication and packaging of the genome is asymmetric and that the forward ends (5' relative to the genome) are more permuted than the reverse ends (3' relative to the genome).
This work was supported by the Hi-Tech Research and Development (863) Program of China (No. 2009AA02Z111, http://program.most.gov.cn/), the National Natural Science Foundation of China (No. 30872223, http://www.nsfc.gov.cn/), and Funds of the State Key Laboratory of Pathogen and Biosecurity http://www.skl-pbs.com. The funders had no role in study design, data collection and analysis, decision to publish, and preparation of the manuscript.
- Baker ML, Jiang W, Rixon FJ, Chiu W: Common ancestry of herpesviruses and tailed DNA bacteriophages. J Virol 2005, 79:14967–14970.PubMedView Article
- Mitchell MS, Matsuzaki S, Imai S, Rao VB: Sequence analysis of bacteriophage T4 DNA packaging/terminase genes 16 and 17 reveals a common ATPase center in the large subunit of viral terminases. Nucleic Acids Res 2002, 30:4009–4021.PubMedView Article
- Davison AJ: Channel catfish virus: a new type of herpesvirus. Virology 1992, 186:9–14.PubMedView Article
- Black LW: DNA packaging in dsDNA bacteriophages. Annu Rev Microbiol 1989, 43:267–292.PubMedView Article
- Black L, Showe M, Steven A: Morphogenesis of the T4 head. In Molecular biology of bacteriophage T4 ASM Press (Washington, DC) Edited by: Karam JD. 1994, 218–258.
- Rao VB, Black LW: [http://www.ncbi.nlm.nih.gov/books/NBK6418/] DNA Packaging in Bacteriophage T4.
- Rao VB, Mitchell MS: The N-terminal ATPase site in the large terminase protein gp17 is critically required for DNA packaging in bacteriophage T4. J Mol Biol 2001, 314:401–411.PubMedView Article
- Rao VB, Black LW: Cloning, overexpression and purification of the terminase proteins gp16 and gp17 of bacteriophage T4. Construction of a defined in-vitro DNA packaging system using purified terminase proteins. J Mol Biol 1988, 200:475–488.PubMedView Article
- Powell D, Franklin J, Arisaka F, Mosig G: Bacteriophage T4 DNA packaging genes 16 and 17. Nucleic Acids Res 1990, 18:4005.PubMedView Article
- Casjens SR, Gilcrease EB, Winn-Stapley DA, Schicklmaier P, Schmieger H, Pedulla ML, Ford ME, Houtz JM, Hatfull GF, Hendrix RW: The generalized transducing Salmonella bacteriophage ES18: complete genome sequence and DNA packaging strategy. J Bacteriol 2005, 187:1091–1104.PubMedView Article
- Catalano CE, Cue D, Feiss M: Virus DNA packaging: the strategy used by phage lambda. Mol Microbiol 1995, 16:1075–1086.PubMedView Article
- Bhattacharyya SP, Rao VB: Structural analysis of DNA cleaved in vivo by bacteriophage T4 terminase. Gene 1994, 146:67–72.PubMedView Article
- Bhattacharyya SP, Rao VB: A novel terminase activity associated with the DNA packaging protein gp17 of bacteriophage T4. Virology 1993, 196:34–44.PubMedView Article
- Wang S, Jiang H, Chen J, Liu D, Li C, Pan B, An X, Zhang X, Zou Y, Tong Y: Isolation and rapid genetic characterization of a novel T4-like bacteriophage. Journal of Medical Colleges of PLA 2010, 25:331–340.View Article
- Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR: Accurate whole human genome sequencing using reversible terminator chemistry. Nature 2008, 456:53–59.PubMedView Article
- Zerbino DR, Birney E: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 2008, 18:821–829.PubMedView Article
- Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I: ABySS: a parallel assembler for short read sequence data. Genome Res 2009, 19:1117–1123.PubMedView Article
- Li R, Li Y, Kristiansen K, Wang J: SOAP: short oligonucleotide alignment program. Bioinformatics 2008, 24:713–714.PubMedView Article
- Lowe TM, Eddy SR: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 1997, 25:955–964.PubMedView Article
- Miller ES, Kutter E, Mosig G, Arisaka F, Kunisawa T, Ruger W: Bacteriophage T4 genome. Microbiol Mol Biol Rev 2003, 67:86–156. table of contentsPubMedView Article
- Chibani-Chennoufi S, Canchaya C, Bruttin A, Brussow H: Comparative genomics of the T4-Like Escherichia coli phage JS98: implications for the evolution of T4 phages. J Bacteriol 2004, 186:8276–8286.PubMedView Article
- Zuber S, Ngom-Bru C, Barretto C, Bruttin A, Brussow H, Denou E: Genome analysis of phage JS98 defines a fourth major subgroup of T4-like phages in Escherichia coli. J Bacteriol 2007, 189:8206–8214.PubMedView Article
- Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res 2004, 14:1188–1190.PubMedView Article
- Kreuzer KN: Recombination-dependent DNA replication in phage T4. Trends Biochem Sci 2000, 25:165–173.PubMedView Article
- Louie D, Serwer P: Blunt-ended ligation can be used to produce DNA ladders with rung spacing as large as 0.17 Mb. Nucleic Acids Res 1990, 18:3090.PubMedView Article
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.