Complete genome sequence of bacteriophage vB_YenP_AP5 which infects Yersinia enterocolitica of serotype O:3

Background Bacteriophage vB_YenP_AP5 is a lytic bacteriophage capable of infecting Yersinia enterocolitica strains of serotype O:3, an epidemiologically significant serotype within this bacterial species that causes yersiniosis in humans. This work describes the complete genome sequence of this phage. Results The genome consists of linear double-stranded DNA of 38,646 bp, with direct terminal repeats of 235 bp in length, and a GC content of 50.7%. There are 45 open reading frames which occupy 89.9% of the genome. Most of the proteins encoded by this virus exhibit sequence similarity to Yersinia phage φYeO3-12 and Salmonella phage φSG-JL2 proteins. Conclusions Genomic and morphological analyses place the bacteriophage vB_YenP_AP5 in the T7likevirus genus of the subfamily Autographivirinae within the family Podoviridae.


Background
Yersinia enterocolitica, a facultatively anaerobic, Gramnegative, non-sporulating, short bacillus, is an important zoonotic pathogen leading to human and animal enteric infection [1]. Among the species of the genus Yersinia, Y. enterocolitica is considered highly heterogeneous and is grouped into a biochemical scheme composed of six biotypes divided into three lineages: avirulent strains belonging to biotype 1A, highly pathogenic strains of biotype 1B, and weakly pathogenic strains of biotypes 2-5 that do not kill mice [2,3]. Most strains associated with yersiniosis belong to bioserotypes 1B/O:8, 2/O:5,27, 2/O:9, 3/O:3, and 4/O:3, with the latter being the most common in Europe, Japan, Canada, and the USA [1,4]. Although several yersiniophages have been described for typing Y. enterocolitica [5][6][7][8], few have been studied in detail via whole genome sequencing. To date, phage φYeO3-12 displaying specificity for Y. enterocolitica O:3 [9], phage PY54 exhibiting a host range restricted to Y. enterocolitica O:5 and O:5,27 [10], phage φR1-37 with a broader host range within Y. enterocolitica [11], and PY-100 exhibiting a broad host range restricted to the genus Yersinia [12], have been described. Given the considerable interest in bacteriophages because of their potential use as typing, diagnostic, therapeutic, decontaminating, and bio-control agents, our research is aimed at isolating and characterizing novel yersiniophages in order to expand the repertoire of phages available for targeting clinically significant Y. enterocolitica bioserotypes. In this manuscript we report the morphology, genome sequence, and transcriptomic analysis of phage vB_YenP_AP5 (hereafter referred to as AP5).

Results and discussion
Isolation and host range Analysis of preliminary treated sewage resulted in the initial isolation of 12 phages infecting Y. enterocolitica strains. From these, AP5 was chosen for detailed study because of its ability to infect Y. enterocolitica strains of serotype O:3. The host range of AP5 was determined using 60 strains belonging to ten Yersinia species at 25°C and at 37°C. The results (Table 1) show that AP5  [14], was sensitive to AP5. In contrast, its derivative YeO3-OCR, a rough mutant which is also missing the entire core operon yet is unable to produce O antigen [14] was not sensitive to phage AP5. These results indicate the host receptor for phage AP5 lies within the O antigen of the lipopolysaccharide of Yersinia enterocolitica O:3 strains, and suggests the O side chain of this serotype (6-deoxy-L-altropyranose) is involved.

Morphology
AP5 was negatively stained and examined by transmission electron microscopy ( Figure 1). The head is icosahedral in shape exhibiting T7 symmetry of approximately 55.0 nm in diameter. The phage particles are each decorated with a short non-contractile tail of approximately 12.0 nm in length and 8 nm in width. Collectively, these morphological features indicate that this virus belongs to the family Podoviridae.

General features of the AP5 genome
The DNA sequence of the phage AP5 consists of linear double stranded DNA of 38,646 bp in length. The size of this phage correlates well with other T7-like phage members, which range from 37.4 kb (Pseudomonas phage gh-1) to 45.9 kb (Erwinia phage Era103 [17]. The genomes of T7-like phages typically contain direct terminal repeats (DTRs) that are used during genome replication and packaging [18]. The lengths of the DTRs of AP5 (235 bp) are in agreement with the reported lengths for members of the T7 group, for example phage Salmonella phage φSG-JL2 and Yersinia phage φYeO3-12 have DTRs of 230 bp and 232 bp, respectively [9], whereas Enterobacteria phage T7 has DTRs of 160 bp [19]. Moreover, an alignment of the DTR sequences of phage AP5 and representative members of the T7likevirus genus show a high degree of conservation ( Figure 2). Phage AP5 has also an overall genomic guanine plus cytosine (GC) content of 50.7%, compared to 48.5 ± 1.5 mol% for its host [20].  Degree of lysis: 4+, complete lysis; 3+ clearing throughout but with faint hazy background, 2+ substantial turbidity throughout cleared zone, 1+ a few individual plaques; N: No effect of phage on bacterial growth as described by Kutter [16].
The GC content of phage AP5 is in agreement with other T7-like phages which range from 46.2 -62.3% [21].

Open reading frames and comparative genomics
The genome of AP5 was scanned for open reading frames (ORF) using computational software. A total of 34,743 nucleotides were involved in the coding of 45 ORFs with sizes ranging from 113 to 3,962 nucleotides ( Table 2). The temporal and functional distributions of genes are tightly organized and packed close to each other so that they occupy 89.9% of the genome ( Figure 3). The initiation codon ATG is present in 93.3% of the protein-coding genes. Only two other initiation codons occur, TTG and GTG at a frequency of 0.5%, and 0.2%, respectively. All predicted protein-coding genes were screened using BLASTP and Psi-BLAST algorithms against the nonredundant protein database at NCBI. From the 45 coding sequences (CDSs) of AP5, 30 (66.6%) have assigned function, and 15 (33.3%) are similar to proteins of unknown function. While the great majority of the homologs are to Multiple sequence alignment of direct terminal repeat sequences was performed using Clustal Omega [22]. Positions which have a single, fully conserved base pair are indicated by an asterisk (*).      [25]. Thus, the T7 gene nomenclature was adopted for naming the genes of AP5. Since at the protein level phage AP5 showed the greatest sequence identity with Yersinia phage φYeO3-12 proteins, the genomes of the two phages were compared using progressive Mauve [26] (Figure 4). The gene arrangement of essential genes is collinear, highly conserved, and only some genes coding for hypothetical proteins present in φYeO3-12 are dissimilar or absent in AP5. The pairwise % identity of the phage AP5 genome to Yersinia phage φYeO3-12 genome was estimated at 89.6%.

Nucleotide metabolism, DNA replication and recombination
In the AP5 genome, at least eleven genes were identified that play a role in nucleotide metabolism, DNA replication, and recombination. The transcribed genes function to overcome host restriction and to convert the metabolism of the host cell to the production of phage proteins. The product of gene 0.3 is a small protein, which mimics B-form DNA, and binds to and inhibits type I restriction endonucleases [28][29][30], as well as possessing S-adenosyl-L-methionine hydrolase (SAMase) activity acting to degrade the methyl group donor and the methylation activities present in the host [31]. Dam (DNA adenine methyltransferase) methylase modifies GATC, and Dcm (DNA cytosine methyltransferase) methylase modifies CC (A/T)GG sequences [9]. As in Yersinia phage φYeO3-12, the sequences corresponding to restriction enzyme

DNA packaging and morphogenesis
Several genes were identified that play a role in morphogenesis and DNA packaging. We identified two CDSs which display sequence similarity to the capsid proteins of phages belonging to T7-like viruses. The upstream gene 10A displays homology to Yersinia phage φYeO3-12 major capsid protein 10A [NP_052108], while the downstream gene 10B is similar to the minor capsid protein 10B in Salmonella phage φSG-JL2 [YP_001949782]. Some T7-like phages display two "versions" of the major capsid protein, which are designated as 10A and 10B [34]. The sequences of the amino termini of these proteins are identical, but during translation a −1 ribosomal frameshift allows for alternative reading frames within one mRNA, permitting the elongation of the protein product. The features of this system are a slippery site in the DNA/RNA and a downstream stem-loop structure capable of forming a pseudoknot [35,36]. Analysis of AP5 using pKiss [37] did not yield evidence for a potential pseudoknot. Gene 9 was identified as the capsid assembly protein required for the formation of procapsids. The structure of this phage is therefore made up of gene 10A and gene 10B (capsid), the head to tail joining protein (gene 9), and an internal core formed by the products of gene 13 (internal virion protein A), gene 14 (internal virion protein B), gene 15 (internal virion protein C), and gene 16 (internal virion protein D). These proteins are homologous to those that form the internal core of the T7 virion. In T7, along with internal virion proteins B and C, the internal virion protein D, is ejected from the phage head and forms part of a putative channel that spans the entire host cell envelope and allows entry of DNA. The Nterminus of this protein has similarity to a lytic transglycosylase and may help form a channel for phage DNA translocation through the peptidoglycan layer of the host envelope [18]. BLASTN analysis of gene 16 (internal virion protein D) confirms the presence of a peptidoglycan hydrolase motif at the N-terminus. Gene 7.3 was identified as the tail assembly protein required for assembly of tail fibers on capsids. Genes 11 and 12 correspond to tail tubular proteins A and B respectively required for assembly of tails. Gene 17, codes for the tail fiber protein or host recognition binding protein and shares 89.3% identity with gp17, the tail fiber protein of Yersinia phage φYeO3-12 [NP_052117], and only 67% identity with gp17 of Salmonella phage φSG-JL2 [YP_001949790]. As with other gp17 homologs, sequence similarity is only found at the N-terminus, the part of the protein that is associated with the tail structure. The C-terminus is involved in ligand interactions and exhibits considerable differences, despite that phage AP5 shares a similar host range with Yersinia phage φYeO3-12 [9]. The large and small terminase subunit homologs were determined to be the products of gene 18 (DNA Packaging Protein A) and gene 19 (DNA Packaging Protein B).

Host cell lysis
The final stage of the phage lytic cycle is degradation of the bacterial cell wall and release of progeny phages.
The lysis of the cell wall is typically induced by two phage encoded proteins, a holin and an endolysin [38].
Endolysins are muralytic enzymes produced by dsDNA phages, which hydrolyze the peptidoglycan layer of bacterial cell walls. As in other T7 phages, gene 3.5 of phage AP5 is proposed to be the endolysin protein since it possesses N-acetylmuramoyl-L-alanine amidase activity. Access of endolysins to the cell wall occurs through the presence of a secondary lysis factor, known as a holin.
Holins are usually small proteins characterized by the presence of transmembrane domains (TMD) [39]. The predicted proteins of AP5 were scanned for TMDs using TMHMM [40]. TMDs were identified in gene 0.6, gene 6.3, gene 17.5, and gene 19.5, which code for small proteins of 67, 37, 67, and 49 amino acids, respectively. The derived protein from gene 17.5 of AP5 is proposed as a holin since it is a small protein containing an N-terminal TMD and shares sequence similarity to Yersinia phage φYeO3-12 lysis protein [NP_052118]. Phage AP5 has also one more lysis gene (gene 18.5) coding for a phage λ Rz-like lysis protein (PHA00276), an i-spanin of 150 amino acids which presents 98.7% sequence identity to λ Rz-like protein [YP_00194793] in Salmonella phage φSG-JL2. Further inspection of the gene 18.5 sequence, confirms the presence of a nested ORF of 255 bp (in the +1 reading frame) embedded entirely within the sequence coding for an o-spanin with homology to Rz1 (18.7) of bacteriophage T7. Based on these observations, gene 18.5 is proposed as an Rz/Rz1 equivalent lysis gene coding for transmembrane spanins involved in the disruption of the outer membrane of the host [41].

Transcriptional and regulatory sequences
Phage AP5 was not found to contain tRNA genes, which is not an unexpected observation since no T7-like phages have been found to harbour them. A promoter was identified at position 550-580 bp of the genome with sequence similarity to host promoter consensus TTGACA(N15-18)TATAAT with a 2 bp miss-match suggesting the early genes of this type of virus are transcribed by the host RNA polymerase. This is a major dissimilarity between phage AP5 and T3/T7 phages where the latter possess multiple strong promoters recognized by the host RNA polymerase. As with all T7 group phages, the AP5 phage encoded RNA polymerase (RNAP), is responsible for the recognition of phage specific promoters. In phage AP5, we identified 14 phagespecific promoters using PHIRE [42], which are named according to the downstream gene (Table 3). The promoter sequences lie within intergenic regions and show the greatest similarity to those of Yersinia phage φYeO3-12 and bacteriophage T3.

Conclusions
In this manuscript we have reported on the morphology and genome of the phage vB_YenP_AP5. Due to its lytic nature and marked specificity to Y.enterocolitica strains of serotype O:3, this phage is a potential biotechnological tool for diagnostic, therapeutic, and/or biocontrol uses, given that O:3 is the most predominant serotype involved in human food-borne infections [4]. Additionally, the genome of this phage does not contain any undesirable laterally transferable genes that are related to bacterial toxins, pathogenicity, antibiotic resistance and/or lysogeny on the basis of homologies with known virulence and resistance genes available in GenBank.  After incubation, the enrichments were centrifuged at 10,000 g for 20 minutes at 4°C and the supernatant filtered through a sterile disposable filter of 0.45 μm pore size, and the filtrates stored at 4°C. Phages were detected by spot tests [16] on indicator strains incubating for 16-20 h at 25°C. Complete or partial lysis zones were then removed by cutting the soft layer from the plates using a sterile pipette tip and placing them separately in 1 mL of SM buffer (5.8 g of NaCl per liter, 2.0 g of MgSO47H2O per liter, 50 mM Tris-HCl [pH 7.5]), and used in standard double agar overlay plaque assays [43] to identify plaques showing different size and plaque morphology. Three rounds of repeated single plaque isolation were then performed to ensure unique phages were obtained. Purified phages were named following the naming convention of Kropinski et al. [44]. The small drop plaque assay was used to determine the titer of phage preparations [45].

Host range determination
The lytic activity of vB-YenP-AP5 was tested against 60 Yersinia strains as determined by standard spot tests [16]. Briefly, 10 μl from a purified phage suspension containing approximately 10 8 pfu/mL were spotted in the middle of a lawn of bacteria and left to dry before incubation for 18-24 h. Each strain was tested three times at 25°C and at 37°C. The degree of lysis was recorded using a four-point scale: (+4) complete clearing, (+3) clearing throughout but with a faint hazy background, (+2) substantial turbidity throughout the cleared zone, and (+1) a few individual plaques.

Transmission electron microscopy
The phage was pelleted at 25,000 × g for 1 hour at 4°C, using a Beckman high-speed centrifuge and a JA-18.1 fixed-angle rotor (Beckman, Palo Alto, CA, USA). The phage pellet was washed twice under the same conditions in neutral 0.1 M ammonium acetate [46]. The final phage sediment was re-suspended in 150 μL of SM-buffer supplemented with 5 mM CaCl 2 . Samples were then deposited onto carbon-coated Formvar films on copper grids, and stained with 2% uranyl acetate (pH 4) or 2% potassium phosphotungstate (PT, pH 7.2), air dried, and examined under a Tecnai G2 F20 transmission electron microscope (FEI, Hillsboro, OR, USA), operating at 120 KEv. Images were collected and analyzed using Digital Micrograph™ Software (Gatan, Pleasanton, CA, USA).

Isolation of phage DNA
To separate phage from bacterial debris, a crude phage lysate was centrifuged at 10,000 × g for 15 min at 4°C and the supernatant filtered through 0.22 μm low protein binding filter (Millipore, USA). Contaminating nucleic acids in the supernatant were digested with pancreatic DNase 1, and RNase A, each added to obtain a final concentration of 10 μg/mL (Sigma-Aldrich Canada Ltd., Oakville, ON), for 15 min at room temperature. DNA isolation was then performed with a commercial Phage DNA Isolation Kit (Norgen BioTek Corp.,Thorold, ON., Canada), as per the manufacturer's instructions. The DNA was characterized spectrophotometrically.

Genome sequencing and assembly
Phage genomic DNA was fragmented using Ion Xpress™ Plus gDNA Fragment Library kit following the manufacturer's protocol (Life Technologies, Foster City, CA). The fragmented DNA was collected using Pippin Prep DNA Size Selection System (Sage Science, Beverly, MA) and assessed for concentration and size distribution using a Bioanalyzer 2100 (Agilent Technologies, Mississauga, ON). The DNA fragments were then attached to the surface of Ion Sphere particles (ISPs) using an Ion Xpress Template kit (Life Technologies) according to the manufacturer's instructions. Template-ISPs were sequenced using 316 micro-chips using an Ion Torrent Personal Genome Machine (PGM) with an Ion PGM Sequencing 400 kit (Life Technologies). The sequence reads were filtered using PGM software to remove low quality sequences, trimmed to remove adaptor sequences and the filtered sequences were assembled. The assembled genome had a coverage of 33.4×. Gaps were identified using the Lasergene® Genomics Suite of DNAStar software (DNAStar Inc., Madison, WI). The gaps were closed by PCR using primers flanking regions adjacent to the gaps and sequencing using a 3730 Genetic Analyzer (Life Technologies). The final assembled genome was manually curated for errors.

Bioinformatics analysis
The phage genome was analyzed for coding sequences using Kodon version 2.0 (Applied Maths Inc., Austin, TX, USA). Genes were identified from among the predicted coding sequences based on the presence of ATG, GTG, CTG or TTG start codons, followed by at least 30 additional codons, and an upstream sequence resembling the following ribosome-binding site, GGAGGT [47,48]. A search for phage-encoded tRNA genes was performed with tRNAScan-SE and Aragorn, using default parameters [49,50]. Preliminary annotation of genes was performed using myRAST [51]. Additional manual functional annotation was performed using the Geneious software version 7.1.5 (Biomatters) [52,53]. Phage-specific promoters were discovered using PHIRE [42] using a length (L) of 22 bp and a degeneracy (D) of 4 bp. Determination of theoretical molecular weight and isoelectric point employed ExPASy via http://web.expasy.org/compu-te_pi/ [54][55][56]. BLASTP and Psi-BLAST algorithms were used to determine the similarity to described proteins in the National Center for Biotechnology Information [NCBI] database (http://www.ncbi.nlm.nih.gov). Whole genome comparisons were carried out using Mauve [26], and CoreGenes [24].