An antibody response to human polyomavirus 15-mer peptides is highly abundant in healthy human subjects

Background Human polyomaviruses (HPyV) infections cause mostly unapparent or mild primary infections, followed by lifelong nonpathogenic persistence. HPyV, and specifically JCPyV, are known to co-diverge with their host, implying a slow rate of viral evolution and a large timescale of virus/host co-existence. Recent bio-informatic reports showed a large level of peptide homology between JCPyV and the human proteome. In this study, the antibody response to PyV peptides is evaluated. Methods The in-silico analysis of the HPyV proteome was followed by peptide microarray serology. A HPyV-peptide microarray containing 4,284 peptides was designed and covered 10 polyomavirus proteomes. Plasma samples from 49 healthy subjects were tested against these peptides. Results In-silico analysis of all possible HPyV 5-mer amino acid sequences were compared to the human proteome, and 1,609 unique motifs are presented. Assuming a linear epitope being as small as a pentapeptide, on average 9.3% of the polyomavirus proteome is unique and could be recognized by the host as non-self. Small t Ag (stAg) contains a significantly higher percentage of unique pentapeptides. Experimental evidence for the presence of antibodies against HPyV 15-mer peptides in healthy subjects resulted in the following observations: i) antibody responses against stAg were significantly elevated, and against viral protein 2 (VP2) significantly reduced; and ii) there was a significant correlation between the increasing number of embedded unique HPyV penta-peptides and the increase in microarray fluorescent signal. Conclusion The anti-peptide HPyV-antibodies in healthy subjects are preferably directed against the penta-peptide derived unique fraction of the viral proteome.


Background
The Polyomaviridae are a family of non-enveloped circular double-stranded DNA viruses. The Polyomaviridae Study Group of the International Committee on Taxonomy of Viruses (ICTV) has proposed that the Polyomaviridae family will be comprised of three genera: two genera containing mammalian viruses (Orthopolyomavirus and Wukipolyomavirus) and one genus containing avian viruses (Avipolyomavirus) [1]. Besides the HPyVs that were discovered more than 40 years ago (JCPyV and BKPyV), several new polyomaviruses have been discovered over the last 7 years in human clinical samples, namely WUPyV [2], KIPyV [3], MCPyV [4], TSPyV [5], HPyV6 and HPyV7 [6], HPyV9 [7], HPyV10 [8] and MWPyV [9], STLPyV [10], and HPyV12 [11]. Based on pairwise percentage identity of the viral protein-1 (VP1) open reading frame, members of the same species have more than 90% identity, between species identity ranged from 61 to 85%, and viruses belonging to different genera have less than 61% identity [6]. The primate virus SV40 has been detected in human samples [12], but there is inadequate evidence about the relationship to human carcinogenesis [13]. The recently discovered human virus (HPyV9) is closely related to the African Green Monkey Lymphotropic PyV (LPyV) [7,14], and this discovery might explain the previously observed serological evidence that LPyV-like virus infections may occur in humans [15,16].
Multiple methods have been used to measure antibodies to polyomavirus virions. The most common method is based on the use of baculovirus-expressed VP1 virus-like-particles (VLP) in an enzyme immuno assay (EIA) [17][18][19][20]. Additionally, there are E.coliexpressed VP1 proteins that do not form VLP, but rather pentameric VP1 capsomers either used in an EIA, or in a Luminex multiplex platform [15,21]. Currently, the STRATIFY JCPyV ELISA is the only Food and Drug Administration (FDA) approved assay for JCPyV [22], while all the others are lab developed tests for 'research use only'. To a large extent, the immune response measured in these VLP-, or capsomer-based assays is directed against conformational epitopes [23]. There are few peptide EIA described that are presumably detecting linear epitopes/mimitopes [12].
Since there is considerable homology at the VP1 region for the human PyV belonging to the same genus, it does not come as a surprise that there is a considerable crossreactivity in serological assays [23]. For example, serological cross-reactivity in the alpha-PyV is explained by 77% amino acid identity between JCPyV and SV40, 83% between BKPyV and SV40, and 80% between JCPyV and BKPyV. The availability of VLP of the different PyV allows to conduct inhibition studies, and find virus specificantibodies [16,23].
By using phylogenetic methods, the worldwide distribution of JCPyV genotypes was found to mirror the migrations and genetics of the human family [24,25]. JCPyV, and most likely many other polyomaviruses, have co-evolved with their hosts over long evolutionary timescale, which allowed mechanisms of immune-evasion to be evolved. Indeed, analysis of JCPyV polyprotein for peptide sharing with the human proteome revealed that the virus has hundreds of pentapeptides sequences in common with the human proteins [26]. This type of immune-evasion may contribute to the asymptomatic character of the primary infection, and subsequent latency. But, several sequence domains that are JCPyVunique were also detected [26]. The role of these unique domains in the mechanisms and molecular basis for polyomavirus reactivation and pathogenesis remains unclear.
Since there is a great overlap in pentapeptide sequences between the human genome and the PyV genome, it is of particular interest to distinguish between domains that are recognized by auto-antibodies, and other domains that are characteristic for a polyomavirus infection. Therefore, in this study, we explored the following items: i) is there an immune response to HPyV-epitopes presented as peptides; ii) how do these peptide epitopes relate to unique viral domains with no overlap with the human proteome. The answers to these questions could help in understanding the immune response to HPyV infections, the discrimination between 'self' and 'non-self ' , the status of an uninfected individual, and hopefully contribute to the unraveling of the mechanisms underlying virus reactivation.

Polyomavirus peptide similarity with the human proteome
The HPyV reference sequence database was retrieved from NCBI. The viral proteins LTAg, stAg, VP1, and VP2 for 11 HPyV were cut in silico into either 5-mer (with 4 amino acid (aa) overlap), 6-mer (with 5 aa overlap), or 7-mer (with 6 aa overlap) peptides. This resulted into 17,396 penta-peptides, 17,347 hexa-peptides, and 17,304 hepta-peptides. These small peptides were presented to the complete human proteome (20,227 proteins in http://www.uniprot.org/faq/48) for pairwise comparison (in order to identify correct matches).
A total of 1,609 (9.25%) penta-peptides had no match in the human genome, while for the hexa-and heptapeptides, the numbers rose to 12,064 (69.5%), and 16,679 (96.39%), respectively. The distribution expressed in number of matches of hexa-and hepta-peptides follows a similar pattern, but a very different one as compared to the pentapeptides (Figure 1a). The degree of uniqueness and the sharp drop with increasing number of matches on the human genome suggest that hexa-and hepta-peptides are likely to be HPyV-specific. Consequently, if an epitope would encompass 6 or more amino acids in one continuous stretch, this epitope is also likely to be HPyV-specific. However, for the penta-peptides, the distribution is rather different, as only 9.25% of the peptides were found to be unique to polyomaviruses. The remaining 90.75% of peptides have at least one or more matches with the human proteome. There were 939 penta-peptides, 11 hexa-peptides, and 2 hepta-peptides with more than 30 matches in the human proteome (not shown); these motifs were often stretches containing 3 to 6 identical amino acids, like for example hexapeptide AAAAAA in the HPyV7 VP1 carboxyterminal region (… 356 SSNAAAAAAKISVA 370 P…), which was found 2,364 times in the human proteome.

Array results
A total of 4,284 peptides were incubated in a peptide microarray format with plasma samples from 49 HVs, resulting in 209,916 data points. This population has a median log 2 (signal/control) value of 1.683 (min: -1.222, 25 th : 1.135; 75 th : 2.50; 90 th : 3.319; and max: 6.909). We used the median values of 49 HV data points for each peptide to generate the figures used in this article. On  . But when analyzing the 5 different proteins (LTAg, stAg, VP1, VP2, and agnoprotein) that were present as peptides on the microarray, it was surprising to see that antibody responses to stAg peptide were significantly elevated (p = 3.45E-23), but also that the antipeptide antibody responses for VP2 were significantly less abundant (p = 4.44E-16) (Figure 2).
Correlation between 'unique polyomavirus peptides (not present in the human proteome)' and 'peptide microarray results' Linear peptide epitopes are most frequently between 7 -9 amino acids long (range 4 -12) [27]. We focused on 5-to 7-mer peptides. Figure 1 illustrated already that most of the hexa-and hepta-peptides are virus-specific, and in these cases, linear epitopes would likely be virus-specific. The analysis of the 5-mer peptides is less virus-restricted.
Since the microarray peptides were 15-mers, this means that up to 11 5-mer epitopes could be present on one single peptide. Some of these 11 epitopes might be virusspecific, but others might have identical motifs in the human genome. Therefore, results from the 4,284 peptides on the microarray were interpreted as a summary signal of 11 5-mer peptides, under the assumption that microarray peptides with viral-specific 5-mer epitopes would result in higher signals. As can be deduced from Figure 3, there was indeed a correlation between the 'number of embedded penta-peptides with no human homologue' in the microarray peptides and the strength of the signal obtained with human HS plasma. Based on the linear regression analysis using all data-points, there was a stepwise increase in expression value as given by the following formula: y = 0.17× + 1.55 (Table 1). The difference between each subset is significant (p < 0.05). The 95% CI on the slope were within 0.14 and 0.18. In addition, Table 1 provides the slopes and intercepts for each protein and virus separated. When ranking the groups according to the steepest slope, KIPyV and JCPyV were seen as the 2 most important contributors to the overall slope. Despite the fact that agno had only a small amount of unique peptides, the slope turned out to be very steep. In contrast, the slope was rather shallow for BKPyV and SV40, and stAg. The Y-intercept was highest for stAg (2.09, in agreement with the observation in Figure 2). In conclusion, microarray peptides with one or more embedded polyomavirus penta-peptides with no human homologue showed a higher signal on the microarray, and therefore are likely to represent viral-specific epitopes.

Discussion
The results of the human proteome scan can be summarized as follows; i) if 5-mer peptides are considered, up to 90.75% of the viral proteome is similar to the human proteome, and therefore seen as "self", but the percentage of 'self' drops to 30.5% with 6-mer peptides, and to 3.61% with 7-mer peptides [26]; and ii) with an average of 16.6% of unique pentapeptides, stAg is significantly less recognized as 'self' as compared to the other viral proteins; while VP2 proteins showed with 6.2% the highest degree of homology with the host.
The functionalities of stAg have been reviewed previously [30]. The evidence collected for stAg in this paper showed some specific features for this viral protein, suggesting that the protein has not evolved towards a higher percentage of "host self" (Figure 1b, 16.6% of unique pentapeptides), and thereby maintaining an elevated level of immune presentation and antibody generation against linear epitopes (Figure 2). This might be advantageous for diagnostic purposes, but does not educate on the pathological consequences. Opposite to stAg is the observation for VP2, for which it seems like there is an evolution towards an 'as high as possible' "self" (host) content, thereby reducing the immune response. This is unexpected, because VP2, as minor part of the -in majority VP1 composed -viral structure, must be one of the first proteins that are recognized by the immune system upon infection or exposure. A potential explanation might be that VP2 is crucial in structure and function, and therefore has to evolve towards a protein that is not or poorly immune-dominant (a pressure that is not or less evident for stAg). Note that these stAg and VP2 considerations were based on median values obtained on peptide microarrays.
In a previous study [26], pentamer domains were suggested to be desired motifs for eventual vaccine development. Our results however suggest that there is already a significant amount of antibodies build against these motifs in healthy volunteers, and thus it seems like a redundant approach to target unique pentamer motifs. Figure 3 also shows that there is a large fraction of peptides without unique pentapeptides that nevertheless showed high median signal intensity. This can be explained by either the fact that it does not need to be unique to be an epitope, or that the reactivity is against embedded linear epitopes that are 6-mer, 7-mers, or longer. An antibody response against a non-unique domain would be seen as an auto-immune response. The recent development of antigen microarray chip technology for detecting global patterns of antibody reactivities makes it possible to study the natural autoimmune repertories within healthy humans, the so called 'immunological homunculus (immunculus)' [31]. The immunculus is considered as the general network of constitutively expressed natural auto-antibodies against extracellular, membrane, cytoplasmic, and nuclear self-antigens (ubiquitous and organ-specific). The repertoires of natural auto-antibodies are surprisingly constant in healthy persons, independent of gender and age, and characterized by only minimal individual peculiarities [31,32]. Our approach however does not allow concluding whether the signals were against larger epitopes, or against 'self ' domains and be part of the "immunculus". Therefore, we cannot exclude that autoantibodies for peptide motifs encoded in the human proteome are responsible for the cross-reactivity (immunological homunculus), or that some of the microarray signals could be explained by non-specific binding (see below).
Previously, it was shown that the immune reactivity of human sera directed against native VP1 is far more important as compared to the denatured form of VP1 [33,34]. The fact that peptide microarrays sometimes gave high signals (Figures 2 and 3) is therefore at variance with the observations made in the literature. The biological meaning of the presence of antibodies against linear HPyV epitopes is unclear. One hypothesis might be that, besides the viral particle that presents conformational epitopes to the immune system, there is quite some presentation of degraded viral protein in form of small peptides, and in case this is a unique motif (= unique pentapeptide), the immune system is building a detectable immune response. It is of particular importance to note here that we could illustrate the presence of antibodies against linear epitopesmainly against viral unique pentapeptide fractionagainst not only VP1, but also LTAg, stAg, and VP2, proteins that are only present as a consequence of a replication cycle (and not merely exposure).
Obviously the large number of peptides on the microarray makes it impractical and technologically almost impossible to be evaluated and/or confirmed in ELISA.
Some confirmatory examples will be published elsewhere. In our opinion, the only way for future validation of all these possible epitope regions is by careful selection of significantly contributing peptides, and testing them on validated peptide microarray platforms. Despite the research progress that has been made by using peptide microarrays [27,[35][36][37], there is still hesitation to use these arrays beyond the initial screening, because of possibilities of a-specific reactivities, lack of reliable relation between signal intensities and antibody affinities, lack of array production reproducibility, and intra-and inter-assay variability. In order to evaluate larger panels of donors, patients, and certain risk groups against a large panel of HPyV peptides, array optimization will be required. Despite this, several other groups have tried to use peptide microarrays to miniaturize the antigen-antibody interaction while simultaneously studying several peptide sequences, e.g. in the field of GB virus C, Herpes simplex, and human coronaviruses. They concluded that antigenic peptides could be considered useful tools for designing new diagnostic systems with often sensitivities in the range of low-picomolar concentrations of mAbs and with a high specificity [38,39]. However, while evaluating our results, we were absolutely aware of the shortcomings of the initial experiments. But because the presentation of our results was population-based, and mainly derived from the median values, the observed tendencies were considered reliable, and will be used for future work and confirmations.

Conclusion
In this study, in essence 2 different topics were evaluated, namely: the correlation between the polyomavirus proteome in relation to the human proteome, and the study of a HPyV peptide microarray incubated with human plasma samples obtained from healthy subjects. A correlation between the presence of unique pentapeptides motifs embedded in 15-mer peptides and the signal obtained on the microarray was presented. Under the assumption that a linear epitope could be as small as a pentapeptide, on average 9.3% of the polyomavirus proteome is unique and could be recognized by the host as non-self and it is specifically against these 9.3% of unique motifs that the immune response has been seen.

Methods and materials
Healthy subject (HS) samples A total of 49 healthy subjects were included in this study. For this study, the protocol and the informed consent This resulted in an array of 4,284 15-mer peptides, overlapping by 11 residues. Each peptide was displayed in triplicates on one single array chip (3 sub-arrays). The peptide microarray was incubated with a primary antibody or subject serum, followed by incubation with a fluorescently labeled secondary antibody. Read-out was done by scanning the array by means of a fluorescent microscope. Several control incubations (no primary antibody) and control spots (human IgG) were included. The full procedure of the assay was as described by the microarray provider (JPT, Berlin, Germany). The triplicate quantitative values for each peptide were averaged, and one single value used for further analysis. All imaging and data manipulation was performed as described by JPT Innovative Peptide Solutions (Berlin, Germany). The data presented in this manuscript are log 2 (test peptide/control) values, derived from the original fluorescent values. Mapping (annotations) of the peptides was done against reference NCBI database sequences: JCPyV MAD1 (AAA82102, AAA82101, AAA82099, and AAA82103 for LTAg, VP1, VP2, and stAg, respectively), BKPyV Dunlop (CAA24300, CAA24299, CAA24297, and CAA24301), SV40 (YP_003708382, YP_003708381, YP_003708379, and YP_003708383), KIPyV CU-258 (ACB12028, ACB12026, ACB12024, and ACB12027), WUPyV CU-302 (ACB12038, ACB12036, ACB12034, and ACB12037), and MCPyV HF (AEM01097, AEM01098, AEM01099, AEM01096).

Bio-informatic analysis
Each of the 4284 15-mer sequences on the polyoma peptide JPT array were scanned for hits against the Uniprot human complete proteome [http://www.uniprot.org/ uniprot/?query=organism%3a9606+AND+keyword%3a% 22Complete+proteome+%5bKW-0181%5d%22+reviewed %3ayes&force=yes&format=fasta and motivated by http:// www.uniprot.org/faq/48] using R [40] and BioConductor [41,42]. Every 15-mer was scanned against every human protein and only exact matches were taken into account to compute the number of hits. The peptide array data were annotated with these hits and the joint information was used for all subsequent analyses. For all analyses involving micro-array intensities, the values were expressed as log2 (sample/control). Despite the transformation, the data still displayed slight skew (to the right). In order to take this into account, methods that are robust against such skew as well as outlying values have been used throughout the analysis. Descriptive statistics were performed using base R [40]. Comparisons of medians made use of linear rank methods as made available in [43]. The assessment of the relationship between presence in the human genome and signal intensity on the microarrays made use of robust linear models with MM-type estimators as implemented in Rousseeuw et al., 2011 [44].