The persistence and stabilization of auxiliary genes in the human skin virome
Virology Journal volume 20, Article number: 49 (2023)
The human skin contains a diverse microbiome that provides protective functions against environmental pathogens. Studies have demonstrated that bacteriophages modulate bacterial community composition and facilitate the transfer of host-specific genes, potentially influencing host cellular functions. However, little is known about the human skin virome and its role in human health. Especially, how viral-host relationships influence skin microbiome structure and function is poorly understood.
Population dynamics and genetic diversity of bacteriophage communities in viral metagenomic data collected from three anatomical skin locations from 60 subjects at five different time points revealed that cutaneous bacteriophage populations are mainly composed of tailed Caudovirales phages that carry auxiliary genes to help improve metabolic remodeling to increase bacterial host fitness through antimicrobial resistance. Sequence variation in the MRSA associated antimicrobial resistance gene, erm(C) was evaluated using targeted sequencing to further confirm the presence of antimicrobial resistance genes in the human virome and to demonstrate how functionality of such genes may influence persistence and in turn stabilization of bacterial host and their functions.
This large temporal study of human skin associated viruses indicates that the human skin virome is associated with auxiliary metabolic genes and antimicrobial resistance genes to help increase bacterial host fitness.
Viruses are the most abundant biological entity on this planet. It is estimated that the ocean alone contains 100-billion times as many viruses than there are grains of sand on Earth . In addition to being the most abundant biological entity, viruses are arguably the most diverse biological entities on Earth due to their mutation rate and speed of replication. Despite their abundance and diversity, our understanding of viral ecology and function is in its infancy.
Viruses infect all forms of life and may influence host population dynamics, yet many viruses' functional influence has yet to be elucidated. Viruses may influence the host organism by increasing fitness through genetic transfer and expression of host-specific genes that confer fitness [2,3,4,5,6,7]. Most microbiome studies have focused on the bacteriome [8, 9]. However, the viral component of the microbiome is an essential factor contributing to community assembly and function. It has been demonstrated that bacteriophages can acquire genes that may provide the host with advantageous evolutionary adaptations, such as biofilm inducing genes, proviral genes that promote cell repair, toxin and virulence genes that promote host fitness [4, 6, 10,11,12]. Additionally, specific bacteriophage genomes retain rate-limiting metabolic genes that increase host fitness [4, 6, 13, 14]. Acquisition and expression of beneficial host metabolic genes were first described in marine bacteriophage populations, where marine bacteriophages were shown to carry auxiliary metabolic genes that contribute to photosynthetic processes .
For animals and humans, the skin is the first barrier of defense against external pathogens. Shifts in the population dynamics of the skin microbiome have been shown to contribute to human diseases such as infections caused by Staphylococcus aureus or Streptococcus pyogenes [8, 16]. Although studies have speculated about complex interactions of the human skin virome and its effects on bacterial pathogen colonization, the magnitude of host-viral interactions and the role of viruses in shaping the skin microbiome is poorly understood [17,18,19,20]. Especially in regards to the viral contribution to human disease. Alternatively, bacteriophages have long been of interest in treating bacterial diseases and conditions by controlling bacterial population dysbiosis and antimicrobial resistance [21,22,23,24], yet little is known about skin phage diversity and their role in shaping the skin microbiome.
Previous studies have demonstrated that bacteriophages on the skin carry auxiliary metabolic genes (AMGs), which have the potential to bolster host fitness and increase microbiome function [17, 25]. We previously reported the persistence and abundance of viruses, including bacteriophages on the skin virome over time . In this present study, we investigated the diversity of the human skin virome and identified auxiliary metabolic genes present within the skin virome and their effect on microbiome function and host fitness. To this end, we evaluated the skin virome longitudinally across 60 human individuals across three different anatomical locations at five different time points by expanding the dataset developed previously by this team . Additionally, we identified and validated antimicrobial-resistant genes (AMR) using an amplicon sequencing approach in this study. We further evaluated the bacteriophage content of the human skin virome to identify factors that confer stability of the acquired gene content.
Materials and methods
Dataset acquisition and virome analysis for auxiliary metabolic genes
In addition to 42 previously sequenced subjects presented in Graham et al.  (SRA project PRJNA754140) (n = 642), skin viral metagenome samples from 18 study participants (n = 80) were collected and processed as described previously . We sampled study participants at three different anatomical locations (right hand, left hand, and scalp) across five time points spanning a six-month period post the initial collection time point using a tandem wet-dry swab technique . Viral particle enrichment was performed by filtration through a 0.22 µm filter to remove cellular and bacterial contaminants but no DNAse treatment was used due to low viral yield. The resulting viral particles were extracted using the QiAmp Ultra-Sensitive Virus Kit (Qiagen, Hilden, Germany) according to the manufacturer’s protocol. Viral DNA was amplified using the TruePrime whole genome amplification (WGA) Kit (Syngen Biotechnology, Inc, Taipei City, Taiwan). Library preparation for sequencing was performed using the NEBNext Ultra II Library preparation kit (New England Biolabs, Ipswich, MA, USA) according to the manufacturer's protocol. The resulting libraries were sequenced using the Illumina HiSeq platform using the 250 bp paired-end sequencing strategy.
All sequencing reads were quality filtered and processed to remove host contamination, as described in Graham et al. . Bacterial contamination was assessed by mapping quality filtered reads to the Silva 16S ribosomal database v.138.1 using BBMap parameters described for high precision mapping of contamination as suggested in the BBMap Guide . Ninety-six-point five percent (96.5%) of the samples contained less than 0.2, 16S rRNA reads per thousand reads per sample and were thus considered to be enriched for viral sequences with negligible bacterial contamination based on the criteria described by Roux et al.  (Additional file 1: Table S1).
Using MEGAHIT v.1.2.8 , a dual assembly approach was employed to assemble the quality filtered viral reads. Assembly datasets consisted of a master meta-assembly using all reads, and an assembly dataset of assemblies performed within each sample. The same bioinformatic assessment of samples was performed on the negative control samples, as described in Graham et al. . A meta-assembly was constructed using all negative control reads and sample contigs from both assembly datasets were mapped to the negative control contigs greater than 1 kb using BWA-mem  to remove reads resulting from contaminants. Unmapped viral contigs greater than 1 kb were retained for further analysis.
Bacteriophage annotation, diversity, and taxonomic classification
Contigs larger than 1 kb were retained and analyzed using the VIBRANT (Virus Identification By Iterative Annotation) package  (see Additional file 2: Table S2 for V-score information). VIBRANT phage outputs were used to identify auxiliary metabolic genes present within the skin virome and to identify persistent auxiliary metabolic genes across samples. The contigs identified through VIBRANT as having bacteriophage origin were taxonomically classified using Kaiju v.1.7  and used for subsequent analyses.
To identify abundance of auxiliary metabolic genes in phages, quality filtered reads were mapped to VIBRANT identified bacteriophage contigs to determine abundance. Read mapping was performed using Bowtie 2 v.2.3.5  and SAMtools v.1.9  to identify the abundance of each viral contig within each sample. The resulting read counts were further analyzed using the Phyloseq package in R v.3.6.3 . Diversity analysis of viruses and AMG containing bacteriophage populations was performed using the Phyloseq object table created using the read count abundance and metadata.
AMR gene identification in bacteriophage contigs
Bacteriophage contigs containing all phage types generated via VIBRANT  were validated using the Blast-N algorithm  against a custom-built Blast database generated using the MEGARes v.2.0.0. AMR gene database  was used to identify AMR genes within the viral contigs identified. Filtration criteria of e-values < 1 × 10–3 with greater than 40% AMR gene coverage, > 80% nucleotide synteny, and bit scores greater than 70 were used to identify true AMR gene hits, as suggested by Enault et al. . All contigs were mapped to AMR genes present in the MEGARes v.2.0.0. AMR gene database  using alignment program Bowtie 2 v.2.3.5  and SAMtools v.1.9  to obtain AMR gene abundance count data for each sample.
Statistical analysis of bacteriophage diversity and stability
Alpha diversity of bacteriophage viral contig abundance identified by VIBRANT was assessed using the observed, Shannon, and inverted Simpsons alpha diversity metrics [39, 40]. A one-way repeated measures model with Shannon diversity metrics being used as the dependent variable was used to evaluate the overall stability of the bacteriophage population over time. To evaluate changes within-subject (intrasubject) variation over time in the phage community composition, the variable “subject” was used as a fixed effect with repeating measures of “time”. A one-way repeated measures ANOVA was performed to establish if the predicted phage diversity of a study participant significantly changes over time.
A Bray–Curtis dissimilarity matrix was generated and visualized using a principal coordinate analysis (PCoA) plot for Beta diversity analysis. Permutations of ANOVA (PERMANOVA) were performed using the Adonis test in the R package Vegan v.2.5–7  to assess the effect of bacteriophage changes in beta diversity across the study participants. This was done to determine if the skin bacteriophage community of a study participant was significantly different from another subject’s bacteriophage community (i.e., intersubject variation). Alpha and Beta diversity was assessed across the bacteriophage count data.
The total abundance for each AMR gene containing contigs was evaluated by summing across time points for each anatomical location within a subject. Additionally, contigs containing specific AMR genes were statistically analyzed for temporal changes over time within-subject (intrasubject) and between-subjects (intersubject) using Freidman’s test utilizing an unweighted diversity matrix using a repeated measures generalized linear model. This test was a generalized linear mixed model with subject as a fixed effect to emulate repeated measures and a zero-inflated negative binomial distribution (using the glmmADMB package) for AMR genes with inflated zero counts .
Amplicon sequencing of the erm(C) gene diversity
Before whole genome amplification from all 60 individuals, viral DNA samples were subjected to amplicon-based sequencing of the erm(C) gene (n = 704). Using primers specific to erm(C) (Forward 5′ AATCGTCAATTCCTGCATGT; Reverse 5′ TAATCGTGGAATACGGGTTTG) , a 299 bp region of the erm(C) gene was amplified and sequenced using the sequencing strategy described by Koztch et al. . PCR reactions contained 1X Terra™ PCR Direct Polymerase Mix, 0.5U of Terra Taq polymerase (Takara Bio, Kusatsu, Shiga, Japan), and 1–2 ng of DNA. The PCR cycle included: initial denaturation of 98 °C for three minutes, followed by 30 cycles of 98 °C for 30 s, 55 °C for 30 s, and 68 °C for 45 s. Following the 30 cycles, a final extension of 68 °C for 4 min was performed. The resulting erm(C) amplicon products were visualized using 1.5% agarose gel electrophoresis to ensure amplification of the correct product size. Only samples that contained the correct fragment size were used for subsequent processing and sequencing. Samples were normalized using Norgen’s (Norgen Biotek Corporation, Canada) NGS Normalization 96-well Kit, following the manufacturer’s procedure recommendations using 10 µl of PCR product per sample and eluted in 20 µl. After normalization, 10 µl of each sample was pooled and concentrated using the NucleoSpin Gel and PCR clean-up kit (Macherey–Nagel, Düren, Germany) and associated PCR clean-up protocols. The concentrated libraries were further purified using a 2% agarose gel, and bands consistent with the correct amplicon size were cut from the gel to reduce primer-dimer contamination in the library samples before sequencing. Library DNA was purified using the NucleoSpin Gel and PCR clean-up kit (Macherey–Nagel, Düren, Germany) using described protocols for agarose gel PCR clean-up. Each sequencing library's nucleotide length (bp) distribution was assessed using an Agilent 2100 Bioanalyzer (Agilent Technologies, Inc, Santa Clara, CA, USA) with high sensitivity DNA chips to identify sample base pair distribution and sample concentration. Additionally, libraries were quantified using the DeNovix dsDNA High Sensitivity Kit (DeNovix, Inc, Wilmington, DE, USA). The libraries were then sequenced using the 250 bp paired-end sequencing strategy on the Illumina Miseq platform (Illumina, Inc, San Diego, CA, USA). Three blank negative controls containing PCR-grade water were assessed using the same protocol for every plate amplified and sequenced. Additionally, negative control samples from sample collection and extraction procedures were also assessed. All negative and blank controls did not produce any amplicons during the post-amplification and gel electrophoresis steps, nor did they produce any erm(C) positive sequencing reads.
Erm(C) sequence variation and abundance
The quality filtered reads (quality threshold of Q30 or greater) were analyzed using the DADA2 pipeline . To identify erm(C) reads, a custom Blast database was produced using all erm(C) gene sequences provided in the MEGARes v.2.0.0. database . All sequence variants identified using DADA2 were Blasted to the custom erm(C) gene MEGARes database. Sequences that did not have a positive Blast hit with > 70% nucleotide sequence identity to an erm(C) gene were removed from the dataset, thus retaining only positive erm(C) sequence variants.
The quality filtered Amplicon Sequence Variants (ASVs) were aligned using MUSCLE v.3.8.1551  to obtain a multiple sequence alignment to compare nucleotide differences between ASVs. Alignments were trimmed to retain similar sequence lengths, and a phylogenetic tree of the variants was generated using IQ-Tree v.1.6.12  under the best fit HKY + F + G4 substitution model as determined by IQ-Tree. The resulting phylogenetic tree was visualized with iTOL v.6.5.2 .
Conformational and tertiary protein mutagenesis due to nucleotide polymorphisms were identified using the protein variant structural effect prediction tools PROVEAN (Protein Affect Variation Analyzer) , SIFT (Sorting Intolerant from Tolerant) , and Missense-3D . Heatmaps associated with functional viability of the erm(C) gene as determined using the protein variant effect tools were produced using iTOL . 3D visualization of polymorphic regions on the erm(C) monomer were generated by mapping variants to pre-established 3D models of the erm(C) gene using the SWISS-MODEL Workspace  and associated erm(C) reference models present in the SWISS-MODEL Repository .
Human skin virome bacteriophage diversity and temporal stability
Data from our previous study (n = 624)  was combined with 80 additional samples from 18 new study participants to evaluate bacteriophage diversity and abundance in the human skin virome. The resulting contigs greater than 1 kb were analyzed to identify contigs that may have originated from bacteriophage. We identified 3230 contigs, > 1 kb in length, to be of bacteriophage origin using VIBRANT. This included 3162 contigs from lytic phage and 68 contigs from lysogenic phage. Of the identified phages, 485 had a circular conformational genome, and the remaining 2745 phage contigs were presumed to have a linear, conformational genome (Fig. 1A). When the raw sample reads were mapped back to the assembled contigs, we determined that the phages identified using VIBRANT from each individual represented between 5 and 57% of the total skin virome reads per sample. This lower annotation of viral reads combined with very low contamination of bacterial sequences in our meta viromes and bioinformatic filtering of human genome contaminants suggests that current databases poorly represent viral phage diversity, and the human skin virome carries a more extensive diversity of viruses than previously described [17,18,19].
Initial diversity assessments were performed in a reference-independent manner treating each contig as its own viral classification to reduce reference bias. Alpha diversity of the assembled contigs represented by the Shannon diversity index displayed gender and anatomical location within-subject (intrasubject) to be not-significantly different (P = 0.6481 and P = 0.0923, respectively). However, alpha diversity across subjects (intersubject) was highly significant (P = 1.68 × 10–8). Additionally, a significant difference in alpha diversity within-subject over time was observed (P = 0.0048).
Beta diversity was evaluated using a Bray–Curtis dissimilarity matrix and visualized using a PCoA plot (Fig. 1B). Clustering by sample location (i.e., scalp, right hand, left hand) was evident in the plot. Using the distances generated in the Bray–Curtis dissimilarity matrix as the dependent variable, a PERMANOVA was performed. All variables tested had a significant impact on viral diversity changes with 15% (R2 = 0.15102) of the variation explained by subject-to-subject variation (P = 0.001) and 14% (R2 = 0.13965) explained by location within a subject (P = 0.001). Beta diversity was evaluated using Jaccard dissimilarity distances for binary presence and absence of each phage contig within the study population. Similar results were observed with Jaccard dissimilarity distances where 10% (R2 = 0.10668) of the variation in phage diversity was explained by subject-to-subject variation (P = 0.0001) and 13% (R2 = 0.13975) was explained by anatomical location within-subject (P = 0.016).
The largest proportion of the phages identified that could be taxonomically classified were classified into the Class Caudoviricetes. Of those that could be classified, viruses with morphological features of Myoviridae or Siphoviridae were predominant (Fig. 2). Consistent with common commensal human skin bacteria, most of the phages classified were phages known to infect Staphylococcus species. However, only approximately 16% of the total phage contigs were able to be taxonomically classified using current viral reference databases. The remaining unclassified contigs consisted of sequences representing putatively novel phages. Because a high percentage of phages were unclassified, this justified utilizing database-independent approaches to evaluate the human skin phage diversity in subsequent ecological and diversity analysis.
Auxiliary metabolic genes found in the human skin virome
Identified bacteriophage contigs were assessed for the presence of bacterial host-specific auxiliary metabolic genes (AMGs). A total of 648 AMG genes were identified using VIBRANT within contigs identified as bacteriophages. The annotation and distribution of AMGs, as defined by their KEGG  annotation, are described in Fig. 3B–C. The greatest diversity in the overall categories of AMGs was observed in genes involved in carbohydrate metabolism, amino acid synthesis, and cofactors and vitamins. Each metabolism category was broken down further by the metabolism pathway and is shown in Fig. 3A. The pathways most associated with AMGs included cysteine and methionine metabolism and folate biosynthesis. The overall abundance and persistence of each metabolism category across all five time points by subject and location are shown in Additional file 4: Figs. S1 and S2. Abundance and persistence were only assessed for the meta-assembly dataset containing only subjects with complete collection sets (i.e., for 42 out of the 60 subjects) due to the possibility of missing time points affecting abundance. As shown in Additional file 4: Figs. S1 and S2, the abundance of each metabolic pathway varied by subject and location within-subject. AMGs involved in lipid metabolism, protein folding and sorting, amino acid metabolism, and metabolism of cofactors and vitamins were the most abundant within the population for certain subjects (Additional file 4: Fig. S1).
Upon further inspection of the AMGs involved in the biosynthesis of secondary metabolites and biosynthesis of terpenoids and polyketides, it was noted that certain identified AMGs were involved in biosynthesis pathways of production of antibiotics such as penicillin, cephalosporin, vancomycin, and streptomycin (Fig. 3A). However, these AMGs were not persistent over time in the sampled population (Additional file 4: Fig. S2). The presence of AMGs within the skin bacteriophage population is notable as the production of these antibiotics could potentially affect bacterial community structure and function, leading to increased fitness of the bacterial host.
Antimicrobial resistant genes in the human skin bacteriophage population
To identify AMR gene presence and abundance in the bacteriophage contigs, all viral contigs were compared using Blast-N to a custom database generated using the MEGARes v.2.0.0. AMR gene database . Blast results were filtered to only contain hits with recommended cutoffs for accurate AMR gene identification . Sixteen contigs were identified that contained at least one AMR gene that passed the cutoff recommendations (Table 1). These included AMR genes, rlm(H), par(E), par(EF), par(C), msr(D), msr(E), mef(A), fus(B), mph(E), and erm(C), that encode antibiotic drug resistance; mer(B) and mer(G) which encodes mercury resistance; and multi-compound drug and biocide resistance gene qac(F). Alignment information, e-value, coverage, and percent identity information for all AMR genes identified are reported in Table 1. It was noted that contigs contained more than one AMR gene, such as those containing both mef(A) and msr(D). In addition to contigs containing dual mef(A) and msr(D) genes, one additional phage contig contained an msr-related gene of msr(E) and two copies of the phosphotransferase mph(E) gene.
Temporal stability of AMR erm(C) gene in the human skin virome
The erythromycin resistance gene erm(C) was the most abundant AMR gene and was highly abundant in certain subjects across multiple anatomical locations and time points. To assess erm(C) gene temporal stability within the population, a repeated measures generalized linear mixed model was used over time across subjects (Additional file 4: Fig. S3) to evaluate the abundance and persistence of erm(C) genes. This analysis identified erm(C) abundance to be stable over time as suggested by there being no significant change in abundance of erm(C) over time (P < 0.05 = 0.734). Using the Friedman test, it was determined that there was no significant difference in abundance over time within-subject (intrasubject) (P < 0.05 = 0.216808).
Though the presence of erm(C) in phage populations is stable, the abundance and the variation of the erm(C) within populations are poorly understood. Additionally, current studies have failed to identify (1) if variations in the gene exist, (2) if dominant variations are present within the microbiome, and (3) if different variations of the gene are present among different viral strains. To assess erm(C) functionality and genetic variation within the skin virome, we performed targeted paired-end amplicon-based sequencing of a monomeric region of the erm(C) gene in all 60 subjects (n = 894). The resulting reads were binned to identify amplicon sequence variants (ASVs) for the erm(C) gene. All sequence variants were then compared against the MEAGARes database . Sequence variants that did not have high similarity to the erm(C) gene were removed, which resulted in 79 variants with ASV_1 having 100% nucleotide and amino acid sequence similarity to the reference erm(C) protein (WP_001003263.1). All erm(C) ASVs identified were aligned and phylogenetically compared with trees representing their phylogeny in Fig. 4A and Additional file 4: Fig. S4. Some erm(C) variants were present across all three skin anatomical locations and at all time points. In contrast, others were only present in certain locations, as shown in Fig. 4B–C.
To investigate if functionality is associated with the abundance and persistence of erm(C) sequence variants, all variants were translated, and their amino acid sequences were assessed using protein mutagenesis prediction tools [49,50,51] to establish potential point mutations that putatively affect the function of the erm(C) protein. Predicted deleterious mutations and missense mutations that putatively result in structural damage to the protein were identified in silico. Structural damages identified mainly consisted of substitutions resulting in the expansion of the protein cavity (Additional file 4: Figs. S5 and S6). An additional substitution at the W19R region of the protein monomer resulted in a replacement of a buried hydrophilic charged residue with a hydrophilic uncharged residue in one ASV (Additional file 4: Fig. S7). Though not necessarily functionally detrimental, as determined using in silico predictions, missense mutations and deletions resulted in structural variants of the erm(C) that presented differing protein folding conformations, such as the conversion from alpha-helices to beta-sheets (Fig. 5). All highly abundant sequence variants displayed the ability to synthesize a functional protein, whereas the low abundance ASVs contained mutations that resulted in putatively non-functional proteins (Fig. 6).
Viruses play vital roles in shaping microbial communities by improving host fitness, controlling bacterial populations through predation, and helping increase metabolism by overcoming metabolic bottlenecks [4, 6, 17, 55, 56]. For instance, over time, filamentous bacteriophages have acquired numerous host-specific genes that have allowed them to have symbiotic relationships with their bacterial hosts, thus promoting host survival and the production and spread of phage progeny . The retention of host genes and the metabolomic influence of phages on bacterial hosts have been described in aquatic communities [13, 15], and human and animal gut microbiomes [25, 31, 57].
Studies are limited that describe phage diversity and ecology in humans other than the gut [17,18,19, 26, 58,59,60,61,62,63,64,65,66]. To date, only a few studies have addressed the human skin virome and its taxonomic composition [17,18,19, 26, 59, 62]. Additionally, studies describing AMR genes in the skin virome and other auxiliary genes are limited . Here we describe bacteriophage diversity, AMGs, and AMR genes present in the human skin virome and possible roles of the human skin virome using temporal information from five time points spanning six-months across three anatomical skin locations (left hand, right hand, and scalp) from 60 human participants.
We identified 3230 bacteriophage contigs from human skin viral metagenomes. Bacteriophage contigs identified in this study predominantly originated from the viral Families Myoviridae and Siphoviridae, which both belong to Order Caudovirales. Previous studies have reported that the skin virome consists primarily of Caudovirales bacteriophages [17, 18, 26, 62]. While our study is consistent with previous reports, current viral reference databases consist of a disproportionate amount of Caudovirales viral genomes, and this fact could contribute to an annotation bias in virome studies, ours included. This is further reflected by the fact that only 16% of the putative phage contigs we assembled could be taxonomically classified using current reference databases. This fact underscores the lack of knowledge regarding phage diversity in the human skin virome. Therefore, we speculate that most of the phages that we have identified in this study are novel phages. Additional work assessing bacteriophage identification, isolation, annotation, culture, and comparative genomics is needed to fully understand bacteriophage taxonomic composition and diversity in the human skin virome. However, our filtration using 0.22 μm filters may have resulted in removal of large viruses from the skin virome and may be lacking large viral particles. Additionally, the whole genome amplification used in this study may have led to amplification bias.
Though many studies have shown the presence of host-specific genes such as AMR genes in S. aureus and that transduction via S. aureus phage is a common form of horizontal gene transfer of these AMR genes, very few culture-independent studies have been done that look at the human skin virome and its composition and abundance of host-related genes in phages . Studies have investigated the skin virome composition and the temporal stability and diversity of the virome; however, many of these works did not address the presence of AMG or AMR genes or were smaller-scale studies with lower sample numbers with short sampling periods of only a couple of weeks without repeated sampling periods [17,18,19, 26, 62].
The acquisition of bacteria-specific genes, especially genes associated with host immune defense evasion and host genes associated with increased replication or cell proliferation to aid in viral replication and spread, is not a new concept in virology [4, 17, 31, 67, 68]. However, it is crucial to understand what genes are enriched in viromes and how expression of such genes could affect host function and persistence. To this end, our phage contigs contained 648 AMGs within the human skin virome. This study identified the human skin virome to carry genes associated with carbohydrate metabolism and amino acid synthesis. These genes are critical when the virus overtakes the cell for replication as carbohydrate metabolism could lead to increased energy for replication, and amino acid synthesis genes could help synthesize viral capsid during replication. Recently, it was shown that Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) could use host folate and one-carbon metabolism to bolster replication . Similarly, it may be possible that bacteriophages carry folate metabolism genes to increase de novo purine synthesis to increase viral replication. The presence of auxiliary metabolic genes associated with folate biosynthesis has been reported in rumen viruses and has been suggested in one-carbon metabolism . As such, the increased folate metabolism genes may help utilize one-carbon substrates to provide energy during replication.
We also identified AMR genes associated with three main antimicrobial drug resistance mechanisms among the bacteriophages identified. This included efflux transport pumps found in gram-positive and gram-negative bacteria that help transport toxic compounds and antibiotic drugs out of the cell . Of the five main families of efflux pumps associated with antibiotic resistance, genes encoding efflux pumps belonging to ATP-binding cassette (ABC), the major facilitator superfamily (MFS), and the small multidrug resistance family (SMR) were all identified within the phage population. We identified the MFS efflux pump protein mef(A) and the ABC efflux pump protein msr(D), present on the same contigs. In Streptococcus pneumoniae and Staphylococcus epidermidis, when dually expressed, these two proteins act as a dual efflux pump that gives moderate resistance against 14- and 15- membered macrolides such as erythromycin, clarithromycin, and azithromycin . Having the mef(A) gene alone results in high resistance levels to varying macrolide drugs and is considered a common gene associated with multidrug-resistant pathogens such as MRSA. The presence of the dual efflux pump of mef(A)-msr(D) (also commonly referred to as mef(A)-mel) confers a higher level of resistance to macrolides than that of just mef(A) alone . Both mef(A) alone and dual mef(A)-msr(D) genes have been identified in prophage regions of bacteria and in phage genomes such as that of the Streptococcus infecting phages Tn1207.3 and Tn1207.1 [73,74,75]. The detection of dual efflux pumps within the phage population suggests that AMR genes can be moved across bacteria via phage-mediated transfection.
The efflux pump transcribing gene, qacF, was also observed in multiple viral contigs. These efflux pumps confer quaternary ammonium compound resistance and have been hypothesized to increase bacterial tolerance to antibiotics, especially antibiotics that inhibit cell wall synthesis [76, 77]. Bacteria containing qacF and qac-related genes have been reported in soil and agriculture related environments, including livestock related industries. The presence of this AMR gene in bacteriophages is not surprising since multiple study participants reported being in recent contact and working with livestock. This demonstrates that the phage diversity in the skin virome constantly changes and is, at least in part, acquired from the environment. Additionally, this suggests that mobile genetic elements may be a route in which AMR genes are horizontally transferred.
One of the most common AMR genes associated with antibiotic resistance is genes encoding proteins that directly interact with or modify bacterial ribosomes. This mode of action confers resistance to antibiotic agents that target transcription and translation. The 23S rRNA interacting methyltransferases RlmH, erm(C), and phosphotransferase MphE were all identified in the phage population. These genes confer strong antimicrobial resistance to many drugs, such as aminoglycosides, macrolides, oxazolidinones, and streptogramins . The presence of erm(C) in multiple contigs is of clinical importance since erm(C) is the best-studied resistance mechanism to MLSB (macrolide, lincosamide, and streptogramin B) in bacteria and is one of the leading AMRs associated with MRSA [71, 78]. Due to its high abundance within the population and its clinical relevance to human health, we investigated the abundance and distribution of the erm(C) gene within the study population using targeted amplicon-based sequencing to identify the SNP variation within the erm(C) gene to evaluate functional capabilities and phylogeny of this gene within our study population.
The erm(C) gene was highly abundant and temporally stable in some individuals. The stability of the erm(C) gene across the skin virome of study participants over a six-month period suggests that the gene plays an important role in phage stability and, in turn, bacterial host fitness. One can hypothesize that temporal stability and evolutionary retention of functional versions of AMR genes give associated phages an evolutionary advantage over viral strains that do not carry AMR genes or do not have functional versions of AMR genes. Thus, to establish the importance of AMR genes for persistence and to identify the evolutionary advantage of phages having AMR genes, we investigated the abundance and the presence of functional sequence variants of the erm(C) gene using the amplicon variants obtained through targeted sequencing. This analysis revealed that only functional variants of the erm(C) gene persist within the viral population at high abundance (Fig. 5). This suggests that erm(C) provides an advantage for the phages to persist in the human virome and demonstrates that phages can acquire multiple host genes that can impact microbiome community diversity and evolutionary selection, including genes that transcribe antimicrobial activity resistance.
Viruses play an important role in modulating bacterial population and diversity. Here we investigated the human skin virome and skin associated bacteriophage population diversity, dynamics, and the auxiliary metabolic genes associated with these phages. Human skin viral metagenome samples revealed that the bacteriophage population on the skin is mainly composed of tailed bacteriophages in the viral order Caudovirales. Nevertheless, many phage contigs could not be classified due to the poor representation of human skin viruses in viral reference databases. We identified 648 different bacterial host-derived AMGs related to varying types of bacterial cell processes and functions. Additionally, we identified the antimicrobial-resistant genes erm(C), par(E), par(EF), par(C), fus(B), msr(D), msr(E), mef(A), mph(E), and rlm(H). These genes were, in some cases, subject-specific, whereas genes such as erm(C) were abundant across multiple individuals and were stable over time. This study demonstrates that the phages in the human skin virome carry auxiliary metabolic genes that increase host fitness and help with the persistence of the bacterial host and contribute greatly to bacterial-viral (phage) interactions. Findings from this study suggest that viral-host relationships are more complex than previously thought and highlight the importance of utilizing system-based approaches to study ecosystem interactions in order to fully understand microbiome diversity and function.
Availability of data and materials
Raw sequencing data is available through the NCBI Short Read Archive (SRA) under the project accession codes PRJNA754140 and PRJNA936861. Project metadata and bioinformatic pipeline scripts described in this manuscript are publicly available at: https://github.com/egrah3/Graham_2022_Virome_AMG_AMR.
Breitbart M, Bonnain C, Malki K, Sawaya NA. Phage puppet masters of the marine microbial realm. Nat Microbiol. 2018;3:754–66.
Suttle CA. Marine viruses—major players in the global ecosystem. Nat Rev Microbiol. 2007;5:801–12.
Breitbart M. Marine viruses: truth or dare. Ann Rev Mar Sci. 2012;4:425–48.
Chaturongakul S, Ounjai P. Phage-host interplay: examples from tailed phages and gram-negative bacterial pathogens. Front Microbiol. 2014. https://doi.org/10.3389/fmicb.2014.00442.
Von Wintersdorff CJH, Penders J, Van Niekerk JM, Mills ND, Majumder S, Van Alphen LB, et al. Dissemination of antimicrobial resistance in microbial ecosystems through horizontal gene transfer. Front Microbiol. 2016. https://doi.org/10.3389/fmicb.2016.00173.
Hay ID, Lithgow T. Filamentous phages: masters of a microbial sharing economy. EMBO Rep. 2019. https://doi.org/10.15252/embr.201847427.
Sutton TDS, Hill C. Gut bacteriophage: current understanding and challenges. Front Endocrinol (Lausanne). 2019. https://doi.org/10.3389/fendo.2019.00784.
Byrd AL, Belkaid Y, Segre JA. The human skin microbiome. Nat Rev Microbiol. 2018;16:143–55.
Gilbert JA, Blaser MJ, Caporaso JG, Jansson JK, Lynch SV, Knight R. Current understanding of the human microbiome. Nat Med. 2018;24:392–400.
Derbise A, Chenal-Francisque V, Pouillot F, Fayolle C, Prévost MC, Médigue C, et al. A horizontally acquired filamentous phage contributes to the pathogenicity of the plague bacillus. Mol Microbiol. 2007;63:1145–57.
Rice SA, Tan CH, Mikkelsen PJ, Kung V, Woo J, Tay M, et al. The biofilm life cycle and virulence of Pseudomonas aeruginosa are dependent on a filamentous prophage. ISME J. 2009;3:271–82.
Faruque SM, Mekalanos JJ. Phage-bacterial interactions in the evolution of toxigenic Vibrio cholerae. Virulence. 2012;3:556–65.
Huang X, Jiao N, Zhang R. The genomic content and context of auxiliary metabolic genes in roseophages. Environ Microbiol. 2021;23:3743–57.
Kieft K, Zhou Z, Anderson RE, Buchan A, Campbell BJ, Hallam SJ, et al. Ecology of inorganic sulfur auxiliary metabolism in widespread bacteriophages. Nat Commun. 2021;12:3503.
Lindell D, Jaffe JD, Johnson ZI, Church GM, Chisholm SW. Photosynthesis genes in marine viruses yield proteins during host infection. Nature. 2005;438:86–9.
Byrd AL, Deming C, Cassidy SKB, Harrison OJ, Ng WI, Conlan S, et al. Staphylococcus aureus and Staphylococcus epidermidis strain diversity underlying pediatric atopic dermatitis. Sci Transl Med. 2017;9(397):eaal4651.
Hannigan GD, Meisel JS, Tyldsley AS, Zheng Q, Hodkinson BP, Sanmiguel AJ, et al. The human skin double-stranded DNA virome: topographical and temporal diversity, genetic enrichment, and dynamic associations with the host microbiome. MBio. 2015;6:e01578.
Oh J, Byrd AL, Park M, Kong HH, Segre JA. Temporal stability of the human skin microbiome. Cell. 2016;165:854–66.
van Zyl LJ, Abrahams Y, Stander EA, Kirby-McCollough B, Jourdain R, Clavaud C, et al. Novel phages of healthy skin metaviromes from South Africa. Sci Rep. 2018;8:12265.
Tisza MJ, Buck CB. A catalog of tens of thousands of viruses from human metagenomes reveals hidden associations with chronic diseases. Proc Natl Acad Sci. 2021;118:e2023202118.
Furfaro LL, Payne MS, Chang BJ. Bacteriophage therapy: clinical trials and regulatory hurdles. Front Cell Infect Microbiol. 2018. https://doi.org/10.3389/fcimb.2018.00376.
Castillo DE, Nanda S, Keri JE. Propionibacterium (Cutibacterium) acnes bacteriophage therapy in acne: current evidence and future perspectives. Dermatol Ther (Heidelb). 2019;9:19–31.
Brives C, Pourraz J. Phage therapy as a potential solution in the fight against AMR: obstacles and possible futures. Palgrave Commun. 2020;6:100.
Pirnay JP. Phage therapy in the year 2035. Front Microbiol. 2020. https://doi.org/10.3389/fmicb.2020.01171.
Kieft K, Breister AM, Huss P, Linz AM, Zanetakos E, Zhou Z, et al. Virus-associated organosulfur metabolism in human and environmental systems. Cell Rep. 2021;36:109471.
Graham EH, Clarke JL, Fernando SC, Herr JR, Adamowicz MS. The application of the skin virome for human identification. Forensic Sci Int Genet. 2022;57:102662.
Bushnell B. BBMap: a fast, accurate, splice-aware aligner. 2014. [https://www.osti.gov/servlets/purl/1241166].
Roux S, Krupovic M, Debroas D, Forterre P, Enault F. Assessment of viral community functional potential from viral metagenomes may be hampered by contamination with cellular sequences. Open Biol. 2013;3:130160.
Li D, Liu CM, Luo R, Sadakane K, Lam TW. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31:1674–6.
Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. 2013. [http://arxiv.org/abs/1303.3997].
Kieft K, Zhou Z, Anantharaman K. VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences. Microbiome. 2020;8:90.
Menzel P, Ng KL, Krogh A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat Commun. 2016;7:11257.
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–9.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–9.
The CRAN R Team: R: a language and environment for statistical computing. R Found Stat Comput. 2013.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–10.
Doster E, Lakin SM, Dean CJ, Wolfe C, Young JG, Boucher C, et al. MEGARes 2.0: a database for classification of antimicrobial drug, biocide and metal resistance determinants in metagenomic sequence data. Nucleic Acids Res. 2020;48:D561–9.
Enault F, Briet A, Bouteille L, Roux S, Sullivan MB, Petit MA. Phages rarely encode antibiotic resistance genes: a cautionary tale for virome analyses. ISME J. 2017;11:237–47.
Shannon CE. A mathematical theory of communication. Bell Syst Tech J. 1948;27:379–423.
Simpson EH. Measurement of diversity. Nature. 1949;163:688.
Oksanen J. Vegan: community ecology package. 2020. [https://cran.r-project.org/package=vegan].
Odintsova V, Tyakht A, Alexeev D. Guidelines to statistical analysis of microbial composition data inferred from metagenomic sequencing. Curr Issues Mol Biol. 2017;24:17–36.
Strommenger B, Kettlitz C, Werner G, Witte W. Multiplex PCR assay for simultaneous detection of nine clinically relevant antibiotic resistance genes in Staphylococcus aureus. J Clin Microbiol. 2003;41:4089–94.
Kozich JJ, Westcott SL, Baxter NT, Highlander SK, Schloss PD. Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the miseq illumina sequencing platform. Appl Environ Microbiol. 2013;79:5112–20.
Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. DADA2: high-resolution sample inference from Illumina amplicon data. Nat Methods. 2016;13:581–3.
Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–7.
Nguyen LT, Schmidt HA, Von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32:268–74.
Letunic I, Bork P. Interactive tree of life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021;49:W293–6.
Choi Y, Chan AP. PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics. 2015;31:2745–7.
Ng PC, Henikoff S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003;31:3812–4.
Ittisoponpisan S, Islam SA, Khanna T, Alhuzimi E, David A, Sternberg MJE. Can predicted protein 3D structures provide reliable insights into whether missense variants are disease associated? J Mol Biol. 2019;431:2197–212.
Waterhouse A, Bertoni M, Bienert S, Studer G, Tauriello G, Gumienny R, et al. SWISS-MODEL: homology modeling of protein structures and complexes. Nucleic Acids Res. 2018;46:W296–303.
Bienert S, Waterhouse A, De Beer TAP, Tauriello G, Studer G, Bordoli L, et al. The SWISS-MODEL repository-new features and functionality. Nucleic Acids Res. 2017;45:D313–9.
Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017;45:D353–61.
Hall AR, Scanlan PD, Morgan AD, Buckling A. Host-parasite coevolutionary arms races give way to fluctuating selection. Ecol Lett. 2011;14:635–42.
Stone E, Campbell K, Grant I, McAuliffe O. Understanding and exploiting phage–host interactions. Viruses. 2019;11:567.
Anderson CL, Sullivan MB, Fernando SC. Dietary energy drives the dynamic response of bovine rumen viral communities. Microbiome. 2017;5:155.
Willner D, Furlan M, Haynes M, Schmieder R, Angly FE, Silva J, et al. Metagenomic analysis of respiratory tract DNA viral communities in cystic fibrosis and non-cystic fibrosis individuals. PLoS ONE. 2009;4: e7370.
Foulongne V, Sauvage V, Hebert C, Dereure O, Cheval J, Gouilh MA, et al. Human skin microbiota: high diversity of DNA viruses identified on the human skin by high throughput sequencing. PLoS ONE. 2012;7: e38499.
Abbas AA, Diamond JM, Chehoud C, Chang B, Kotzin JJ, Young JC, et al. The perioperative lung transplant virome: torque teno viruses are elevated in donor lungs and show divergent dynamics in primary graft dysfunction. Am J Transplant. 2017;17:1313–24.
Moustafa A, Xie C, Kirkness E, Biggs W, Wong E, Turpaz Y, et al. The blood DNA virome in 8,000 humans. PLOS Pathog. 2017;13:e1006292.
Tirosh O, Conlan S, Deming C, Lee-Lin SQ, Huang X, Barnabas BB, et al. Expanded skin virome in DOCK8-deficient patients. Nat Med. 2018;24:1815–21.
Garretto A, Miller-Ensminger T, Wolfe AJ, Putonti C. Bacteriophages of the lower urinary tract. Nat Rev Urol. 2019;16:422–32.
Ghose C, Ly M, Schwanemann LK, Shin JH, Atab K, Barr JJ, et al. The virome of cerebrospinal fluid: viruses where we once thought there were none. Front Microbiol. 2019. https://doi.org/10.3389/fmicb.2019.02061.
Jakobsen RR, Haahr T, Humaidan P, Jensen JS, Kot WP, Castro-Mejia JL, et al. Characterization of the vaginal DNA virome in health and dysbiosis. Viruses. 2020;12:1143.
Liang G, Bushman FD. The human virome: assembly, composition and host interactions. Nat Rev Microbiol. 2021;19:514–27.
Malik SS, Azem-e-Zahra S, Kim KM, Caetano-Anollés G, Nasir A. Do viruses exchange genes across superkingdoms of life? Front Microbiol. 2017;8:2110.
Weiss RA. Exchange of genetic sequences between viruses and hosts. Curr Top Microbiol Immunol. 2017;407:1–29.
Zhang Y, Guo R, Kim SH, Shah H, Zhang S, Liang JH, et al. SARS-CoV-2 hijacks folate and one-carbon metabolism for viral replication. Nat Commun. 2021;12:1676.
Reygaert CW. An overview of the antimicrobial resistance mechanisms of bacteria. AIMS Microbiol. 2018;4:482–501.
Chancey ST, Zhou X, Zähner D, Stephens DS. Induction of efflux-mediated macrolide resistance in Streptococcus pneumoniae. Antimicrob Agents Chemother. 2011;55:3413–22.
Daly MM, Doktor S, Flamm R, Shortridge D. Characterization and prevalence of MefA, MefE, and the associated msr(D) gene in Streptococcus pneumoniae clinical isolates. J Clin Microbiol. 2004;42:3570–4.
Giovanetti E, Brenciani A, Vecchi M, Manzin A, Varaldo PE. Prophage association of mef(A) elements encoding efflux-mediated erythromycin resistance in Streptococcus pyogenes. J Antimicrob Chemother. 2005;55:445–51.
Iannelli F, Santagati M, Santoro F, Oggioni MR, Stefani S, Pozzi G. Nucleotide sequence of conjugative prophage Φ1207.3 (formerly Tn1207.3) carrying the mef(A)/msr(D) genes for efflux resistance to macrolides in Streptococcus pyogenes. Front Microbiol. 2014. https://doi.org/10.3389/fmicb.2014.00687.
Chancey ST, Bai X, Kumar N, Drabek EF, Daugherty SC, Colon T, et al. Transcriptional attenuation controls macrolide inducible efflux and resistance in Streptococcus pneumoniae and in other gram-positive bacteria containing mef/mel (msr(D)) elements. PLoS ONE. 2015;10:e0116254.
Tandukar M, Oh S, Tezel U, Konstantinidis KT, Pavlostathis SG. Long-term exposure to benzalkonium chloride disinfectants results in change of microbial community structure and increased antimicrobial resistance. Environ Sci Technol. 2013;47:9730–8.
Li X, Rensing C, Vestergaard G, Arumugam M, Nesme J, Gupta S, et al. Metagenomic evidence for co-occurrence of antibiotic, biocide and metal resistance genes in pigs. Environ Int. 2022;158:106899.
Spiliopoulou I, Petinaki E, Papandreou P, Dimitracopoulos G. Erm(C) is the predominant genetic determinant for the expression of resistance to macrolides among methicillin-resistant Staphylococcus aureus clinical isolates in Greece. J Antimicrob Chemother. 2004;53:814–7.
We thank all participants for their contribution and participation in this study. We also thank those in the Fernando Lab who assisted with training of experimental methods and bioinformatic support and Carlos Riera Ruiz for all his help with revising this manuscript. This work was completed using the Holland Computing Center of the University of Nebraska, which receives support from the Nebraska Research Initiative.
This work was supported by the Department of Justice [2017-IJ-CX-0025, 2019-75-CX-0075, and 2019-R2-CX-0048] and USDA National Institute of Food and Agriculture Animal Nutrition, Growth and Lactation grant no. 2018-67015-27496, Effective Mitigation Srategies for Antimicrobial Resistance grant no. 2018-68003-27545. The funding agencies had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Ethics approval and consent to participate
The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board of the University of Nebraska‐Lincoln (protocol code #53835, project ID #16994, original date of approval 12/19/2017 with biennial review). Sample collection from human participants and experimentation commenced only after the IRB review process was completed.
Consent for publication
All authors have read this publication and have approved the manuscript.
All authors declare that they have no competing interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Graham, E.H., Tom, W.A., Neujahr, A.C. et al. The persistence and stabilization of auxiliary genes in the human skin virome. Virol J 20, 49 (2023). https://doi.org/10.1186/s12985-023-02012-3
- Human skin
- Antibiotic resistance