Genome and proteome analysis of 7-7-1, a flagellotropic phage infecting Agrobacterium sp H13-3
© Kropinski et al.; licensee BioMed Central Ltd. 2012
Received: 3 January 2012
Accepted: 4 May 2012
Published: 31 May 2012
Skip to main content
© Kropinski et al.; licensee BioMed Central Ltd. 2012
Received: 3 January 2012
Accepted: 4 May 2012
Published: 31 May 2012
The flagellotropic phage 7-7-1 infects motile cells of Agrobacterium sp H13-3 by attaching to and traveling along the rotating flagellar filament to the secondary receptor at the base, where it injects its DNA into the host cell. Here we describe the complete genomic sequence of 69,391 base pairs of this unusual bacteriophage.
The sequence of the 7-7-1 genome was determined by pyro(454)sequencing to a coverage of 378-fold. It was annotated using MyRAST and a variety of internet resources. The structural proteome was analyzed by SDS-PAGE coupled electrospray ionization-tandem mass spectrometry (MS/MS).
Sequence annotation and a structural proteome analysis revealed 127 open reading frames, 84 of which are unique. In six cases 7-7-1 proteins showed sequence similarity to proteins from the virulent Burkholderia myovirus BcepB1A. Unique features of the 7-7-1 genome are the physical separation of the genes encoding the small (orf100) and large (orf112) subunits of the DNA packaging complex and the apparent lack of a holin-lysin cassette. Proteomic analysis revealed the presence of 24 structural proteins, five of which were identified as baseplate (orf7), putative tail fibre (orf102), portal (orf113), major capsid (orf115) and tail sheath (orf126) proteins. In the latter case, the N-terminus was removed during capsid maturation, probably by a putative prohead protease (orf114).
Bacteriophage 7-7-1 is known to infect motile cells of Agrobacterium sp H13-3 (formerly Rhizobium lupini), and as such is termed flagellotropic. Using electron microscopy, Lotz et al.  demonstrated translocation of phage 7-7-1 along flagellar filaments. Filament associated phage particles initially possess DNA-filled heads, which are subsequently found emptied when attached to the phage receptor at the flagellar base. This bimodal mechanism of adsorption dramatically increases the chance for finding the receptor at the cell surface, because (i) swimming bacteria with their flagella spread out act as a five- to 10-fold expanded target for the phage and, (ii) once attached, phage particles are directed to the receptor by a one-dimensional walk along the flagellum (instead of a random ‘search’ by three-dimensional diffusion). In no case has the process of phage translocation along the flagellum been visualized. Based on circumstantial evidence, Samuel et al.  have estimated that the flagellotropic phage χ of Salmonella needs < 1 s to reach the flagellar base. These authors have also provided evidence for a ‘nut and bolt’ mechanism by which phage χ moves along the filament. They argue that the long tail fiber fits the right-handed grooves between helical rows of flagellin subunits and that the counter-clockwise (CCW) rotation of the flagellum forces the phage to follow the grooves as a nut follows the threads of a bolt.
7-7-1 is the first flagellotropic phage shown to infect a soil bacterium driven by the uni-directional CW rotation of its complex flagella, a pattern clearly different from the CCW-CW bias of the plain flagella driving Salmonella. This departure from the well-studied enterobacterial paradigm and the rare phage morphology prompted us to analyze the genome and the structural proteome of 7-7-1.
DNA replication of this phage involves a helicase (orf23) and a polymerase (orf17). The latter shows greatest sequence similarity to the DNA polymerases of Pseudomonas phage 73 (YP_001293433) and Burkholderia phage BcepGomr (YP_001210246) which are members of the Siphoviridae, and Burkholderia phage BcepB1A (YP_024903) which, like 7-7-1, is a myovirus. An InterProScan shows it to be a member of the DNA/RNA polymerases superfamily (SUPERFAMILY SSF56672) with the motif located between residues 318 and 480. Two other proteins potentially involved in replication are the products of genes 28 and 33. Gp28 is a 255 amino acid protein possessing ParB-like nuclease motifs (Pfam PF02195 ParBc; SMART SM00470 ParB-like nuclease domain and SUPERFAMILY  SSF110849 ParB/Sulfiredoxin) as well as ParB-like partition TIGRFAMs  protein motif TIGR00180 parB_part: ParB-like partition proteins. This type of protein has also been found in myoviruses such as Burkholderia ambifaria phage BcepF1 (YP_001039693), Mycobacterium phage Pio (AER49600) and enterobacterial phage P1 (AAQ14139). Gp33 contains a N-(deoxy)ribosyltransferase-like superfamily (SUPERFAMILY SSF52309) motif.
Based upon the assumption that the genome circularizes via cohesive termini (not identified), there are two large transcriptional units encompassing orf 22–13 and orf 23–127, 1–12. Since another member of the class α-proteobacteria, Rhizobium etli, possesses rpoD-dependent promoters which closely resemble the Escherichia coli consensus sequence (TTGACA[N15-17]TATAAT)  we assumed that this phage might contain recognizable promoters. We identified five potential promoter sequences, including divergent promoters between the two transcription units (Additional file 2, Table S2). In addition four rho-independent terminators were identified and two high ΔG stem-loop structures. Interestingly, no bidirectional terminators were discovered between orf12 and orf13 (Additional file 2, Table S2). No evidence was found as to how transcription is temporally regulated in this virus.
The genome of phage 7-7-1 encodes for two proteins involved in DNA synthesis – a helicase (gp23) and a polymerase (gp17). The polymerase displayed no conserved motifs, and is distantly related to gp43 homologs from cyanomyoviruses. The helicase contained a high scoring (E-value: 1.01e-41) COG1061, DNA or RNA helicases of superfamily II protein motif (SSL2); and, homology to helicases from Burkholderia phage BcepB1A , and Vibrio phages VP16C and VT16T .
PSI-BLAST analysis of Gp3 against the NCBI virus database resulted in hits described as tail/DNA circulation protein (Salmonella phage ST64B , Enterobacteria phage SfV , Pseudomonas phage DVM-2008, and Burkholderia phage KS10 . This protein possesses two protein motifs: COG4228, Mu-like prophage DNA circulation protein, and pfam07157, DNA circulation protein N-terminus (DNA_circ_N) which are conserved protein domains of indeterminate function. Gp4 contains two inconsistent overlapping motifs: COG4379, Mu-like prophage tail protein gpP (E-value: 2.99e-22), and, pfam05954, phage late control gene D protein (Phage_GPD; E-value: 1.76e-17). The homologs include tail proteins from Mu, D108, SfV and ST64B. These results, coupled with the genome location and the observation that Gp3 is a structural protein (see next section), suggest that both of these proteins are involved in the sequence/assembly of the phage tail.
BLAST analysis revealed several proteins as being involved in phage morphogenesis including baseplate protein (gp7), tail fibre (gp102), portal (gp113), prohead protease (gp114), major capsid (gp115) and tail sheath (gp126). HHpred [18, 19] analysis on other proteins in the morphogenesis cassette was used to identify three other proteins - gp5, gp6 and gp10. Gp10 which we had termed a conserved hypothetical membrane protein shows structural similarity (Probability = 91.01; E-value = 0.11) to RCSB Protein Data Bank  3BKH, the lytic transglycosylase (gp144) of Pseudomonas phage φKZ which is probably the endolysin for this virus . Gp6 is related (Probability = 83.90; E-value = 0.63) to 2IA7 – a putative tail lysozyme (T4 gp5 analog); while gp5 is a structural analog of 3AJQ, phage P2 protein V which is the tailspike protein (Probability = 96.23; E-value = 0.021) .
Overview of the structural proteins identified by ESI-MS/MS
Protein MW (Da)
Max. No. of unique spectra
Maxi. sequence coverage (%)
Slice in which most abundant
Conserved hypothetical protein
Putative DNA circulation protein
Conserved hypothetical protein
Baseplate protein; phage P2 GpJ homolog
Putative tail fibre
Major capsid protein
11 -13 - 14
only 'C-terminal' sequence coverage
20 - 21 - 22
protein identification probability of 87.70%
Tail sheath protein
Although the major capsid protein gp115 is clearly the most abundant protein, only peptides of its C-terminus were found. This suggests that the N-terminal part is cleaved off during maturation of the capsid. Indeed, similarity searches indicate that the C-terminal part of gp115 has high similarity with the major capsid protein of the HK97 family and that gp114 is similar to various prohead proteases. As the N-terminal part of the HK97 capsid is cleaved off by a prohead protease encoded by the upstream gene [24, 25], the protein band with a molecular weight of approximately 33 kDa refers to the mature major capsid protein.
A final, remarkable finding is the identification of a small, 28 amino acid protein which originally fell below the threshold of gene prediction (i.e. 100 bp). Though the function of this polypeptide is unknown, the high ‘protein identification probability’ of 100% and the coverage of 85.7% confirmed its presence in the phage particle. This proves that proteogenomics, namely the use of proteome analysis to annotate the genome, is a powerful tool to identify missed protein-coding genes and thereby complements genome annotation.
While a number of flagella-specific phages have been isolated – Salmonella phage χ, Caulobacter phages φCp34 , ϕCb13 and ϕCbK , and φ6 ; Bacillus phages AR9, 3NT, PBS1 , SP3 , and PBP1 ; Proteus phage PV22 ; Pseudomonas phage φCTX , Agrobacterium tumefaciens phages GS2 and GS6 ; Aeromonas hydrophila phage PM3 , and, Asticcacaulis biprosthecum φAcS2, and φAcM4  – to the best of our knowledge only χ (Denyes, personal communication) and φCTX  have been sequenced. Using the BLASTP feature in BioEdit  the products of five 7-7-1 genes (13, 21, 26, 72 and 102) possessed homologs in Salmonella phage χ. Interestingly, we defined gp102 as a putative tail fibre protein; and, it shows weak sequence similarity from residues 203–300 to a similarly defined protein from phage χ. In view of the quite different tail fibre morphologies observed in phage χ and phage 7-7-1, respectively, the region of similarity may define a general motif involved in phage-flagellum interaction.
Bacteriophage 7-7-1 shows relatively little overall DNA sequence similarity to other phages. At the protein level, CoreGenes revealed eight homologs of BcepB1A proteins, restricted to TerS and a variety of hypothetical proteins. These results indicate that phage 7-7-1 is unique and deserving of recommendation to ICTV as the type phage in a new genus: the 7-7-1-like bacteriophages.
Agrobacterium sp H13-3 (formerly Rhizobium lupini H13-3) was isolated from the rhizosphere of Lupinus luteus. Phage 7-7-1, which is an isolate from garden compost , exclusively infects Agrobacterium sp H13-3 .
Bacteria were grown in NY medium (8 g nutrient broth, 3 g yeast extract per liter) at 40 rpm in a gyratory shaker at 30 °C. Phage lysates up to 2x10¹¹ PFU per ml were obtained by infection of an exponentially growing culture at OD650nm = 0.1 (8 x 107 CFU per ml) with phage at an MOI of 5 x 10-3 followed by threefold dilution with pre-warmed NY and further incubation pending lysis.
Purified phage particles were spread on carbon-coated copper grids, washed once with distilled water and then negatively stained with 4% uranyl acetate, pH 4.8. Microscope magnifications were calibrated with a replica of an optical grating and micrographs were taken with a JEOL 7A (Japan Electron Optics Laboratory Co., Ltd.).
Phage DNA was isolated by phenol-chloroform extraction  and purified by using the Lambda DNA kit of Qiagen (Hilden, Germany). The DNA was subjected to pyrosequencing (454 technology) at the McGill University and Genome Québec Innovation Centre (Montreal, QC, Canada) to 378X coverage.
The 7-7-1 sequence was initially subjected to automated annotation using MyRAST (http://blog.theseed.org/servers/presentations/t1/running-a-job-with-the-desktop-rast.html), tRNAScan-SE  and ARAGORN , following which all open reading frames (ORFs) were confirmed using Kodon (Applied Maths Inc., Austin, TX. USA). The individual proteins were screened against the nonredundant protein databases in NCBI using Batch BLAST (http://greengene.uml.edu/programs/ NCBI_Blast.html). In addition they were screened for conserved motifs using InterProScan , Pfam , TMHMM v2.0  and Phobius .
Putative promoters were identified based upon sequence similarity to the consensus RpoD-specific E.coli promoter sequence TTGACA[N15-17]TATAAT while rho-independent terminators were identified using ARNold [48, 49] complemented with MFOLD .
The genome was submitted to NCBI and accorded accession number JQ312117.
Structural phage proteins were purified as described by Moak and Molineux . Briefly, a solution of CsCl-purified phage particles (1011 PFU) was mixed with methanol and chloroform (1:1:0.75 by volume). After agitation and centrifugation, the upper layer was discarded and an equal volume of methanol was added. The protein pellet obtained by centrifugation at 14, 000 rpm for 6 min, was dried and resuspended in 12.5 mM NH4HCO3. Subsequently, the heat denatured sample (95 °C, 5 min) was loaded on a 12% SDS-PAGE gel. The Coomassie-stained gel (Simply Blue Safestain; Invitrogen) was cut into slices, which were subjected to trypsin digestion . Peptides were analyzed using electrospray ionization-tandem mass spectrometry (MS/MS) as described previously by Lavigne et al. . The obtained spectra were screened against a database containing all ‘stop-to-stop’ protein sequences in all six frames. Generally, the identification parameters were a ‘protein identification probability’ of at least 99.8% and a ‘best peptide identification probability’ of 95%.
Basic Local Alignment Search Tool
electrospray ionization tandem mass spectrometry
: Gene product
Homology detection & structure prediction by HMM-HMM comparison
: Multiplicity of Infection, ratio of infective phage particles to vulnerable hosts
Difco nutrient broth plus yeast extract
: Plaque Forming Unit, a measure of the number of viable viral particles
denaturing (sodium dodecyl sulfate) polyacrylamide gel electrophoresis
TransMembrane prediction using Hidden Markov Models.
We wish to thank Paul Thienel for artwork and Hermine Reisner for technical assistance. A.M.K. was supported by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada. AV holds a PhD scholarship of the FWO Vlaanderen.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.