Evidence of recombination within human alpha-papillomavirus

Background Human papillomavirus (HPV) has a causal role in cervical cancer with almost half a million new cases occurring each year. Presence of the carcinogenic HPV is necessary for the development of the invasive carcinoma of the genital tract. Therefore, persistent infection with carcinogenic HPV causes virtually all cervical cancers. Some aspects of the molecular evolution of this virus, as the putative importance of recombination in its evolutionary history, are an opened current question. In addition, recombination could also be a significant issue nowadays since the frequency of co-infection with more than one HPV type is not a rare event and, thus, new recombinant types could be currently being generated. Results We have used human alpha-PV sequences from the public database at Los Alamos National Laboratory to report evidence that recombination may exist in this virus. A model-based population genetic approach was used to infer the recombination signal from the HPV DNA sequences grouped attending to phylogenetic and epidemiological information, as well as to clinical manifestations. Our results agree with recently published ones that use a different methodology to detect recombination associated to the gene L2. In addition, we have detected significant recombination signal in the genes E6, E7, L2 and L1 at different groups, and importantly within the high-risk type HPV16. The method used has recently been shown to be one of the most powerful and reliable procedures to detect the recombination signal. Conclusion We provide new support to the recent evidence of recombination in HPV. Additionally, we performed the recombination estimation assuming the best-fit model of nucleotide substitution and rate variation among sites, of the HPV DNA sequence sets. We found that the gene with recombination in most of the groups is L2 but the highest values were detected in L1 and E6. Gene E7 was recombinant only within the HPV16 type. The topic deserves further study because recombination is an important evolutionary mechanism that could have high impact both in pharmacogenomics (i.e. on the influence of genetic variation on the response to drugs) and for vaccine development.

second most common in women after breast cancer [2]. Almost half a million new cases of cervical cancer occur each year among women world-wide causing 274.000 deaths, 85% of them happening in underdevelopment countries [3].
The HPV genome has three different regions: two coding (E -early and L -late expression) and a regulatory noncoding region. The early region codifies regulatory, transforming and replication proteins, among which E6 and E7 are known to act like oncoproteins in high risk virus types [4,5]. The late region (L) contains two coding genes, L1 and L2, which encode viral capsid proteins.
Among more than 100 types of HPV known today, approximately 30 infect the genital tract. Within these, HPV16 and HPV18 are the two types with the highest oncogenic power. A prospective way of fighting cervical cancer is with an anti-HPV vaccine. Phase III vaccine trials are being developed by Merck, GlaxoSmithKline and the National Cancer Institute [5,6]. Besides, research is ongoing on different aspects of HPV biology, such as the mechanisms of down-regulation through which HPV causes cell transformation, the evaluation of biomarkers for risk progression, the role of environmental co-factors and the determinants of immune response to the viral infection [1]. However, to have a better understanding of the HPV, it is key to gain an improved insight from an evolutionary perspective [7,8]. The possibility for HPV recombination was first suggested by several facts: The biological viability of artificial HPV strains with chimerical proteins [9]. The appearance of some HPV16 variants that seemed mosaics between different established types [10]. The isolation of a novel HPV type (HPV77) with an unusual pattern of sequence similarity over the E6, E7 and L1 regions [11,12]. The plurality of HPV types [13] and also by the frequent observed co-infection (M. Angulo, personal observation). The latter will be specially important in AIDS patients which are often co-infected with very diverse mixtures of human papillomavirus (HPV) [14][15][16][17].
Because of HPV extreme diversity, the occurrence of recombination was initially discarded, in part because of the technical difficulties for aligning extremely diverse sequences, and in part because of the less accurate methods available for the researchers until the past decade. Nevertheless, recently reported phylogenetic incongruence at the putative high-risk ancestor node, showing that one or more presumed old recombination events should explain a non monophyletic evolution of oncogenic HPVs [18,19], has provided new convincing support of recombination in α-PVs. In addition, a recent rigorous analysis, using several recombination estimation methods, has provided fresh evidence of ancient recombination in papillomavirus, especially for the L2 gene [20].
However, the methods used to assess the presence of recombination signals were either phylogenetic based or substitution based. No model-based method was used. Hence, the difficulty of aligning all currently sequenced PVs imposes an additional challenge. Here we addressed the existence of recombination in HPV using a very efficient composite likelihood method [21,22]. The advantages of this kind of method, which is a model-based method, over the model-free ones, are well-known [23] including the fact that with model-free ones the true level of recombination that has occurred is greatly underestimated.
We have centered on human alpha PV sequences, which alignment is much more reliable. Our goal was to get recombination estimates of different genes (E6, E7, L1 and L2) in different groups. We defined three groups (GI, GII and GIII) attending to their phylogenetic relationships but also to their epidemiology and clinical outcome. The identification of the specific recombinant sequences or the recombination break points is a more complex problem and requires different algorithms and software, being out of the scope of the present paper. Table 1 shows the best-fit models of nucleotide substitution selected for each data set. Different models were selected for the different genes and for the different groups. The simplest models were found within the HPV16 group and the most complex ones (GTR) for the gene L2 in GI and GII groups. L2 had also the most complex model within the HPV16 and the GIII groups (Table  1).

Evolutionary model and rate variation
In addition, rate variation among sites has been detected in several of the data sets, though only in few ones the shape value of the gamma distribution was below one indicating an important rate heterogeneity [24]. The significant rate variation (below one) was mainly detected for GII and GIII groups (Table 1).

Population recombination
Using a gene conversion model with a Jukes-Cantor nucleotide evolution model [25] and assuming no rate variation we found significant recombination in all genes and in all groups ( Figure 1). The highest value of recombination was found associated with E6 and the highest number of groups with recombination was linked to L2. Recombination was detected in the genes L1, L2 and E6 only for the GI group (high risk). For the gene E7 recombination was detected only within HPV16 type within  which, however, no other recombinant gene were detected.
When using the best-fit models of nucleotide substitution and considering the estimated rate variation the obtained results were qualitatively similar but with a bit higher estimates and a new signal for L1 in the GII group (cf. Figures  1 and 2).
Given the estimated recombination values and the number of sequences at each group, the expected number of recombination events associated with each data set can be computed [26]. For example, for HPV-16 with a recombination value of 13 for the gene E7 ( Figure 2) and n = 8 sequences, the expected total number of recombination events in the history of this sample is 34. The expected numbers of recombination events for the different data sets with detected significant recombination are shown in Table 2.

Simulation experiment
A set of simulations was performed using the parameter values corresponding to the evolutionary model, base composition and rate variation among sites for the E6 gene in the group GI (Table 1) to obtain some control samples of DNA sequences. This gene and group were selected because this combination had the highest estimated significant value (Figures 1 and 2 and Table 2). We set the recombination value to zero so that the simulated set of sequences was obtained without recombination. We then estimated from these sequences the population recombination value using the composite likelihood. The average recombination value for the simulated set of sequences was 0.08 ± 0.03 and the percentage of false positive recombination tests was 1%. The expected number of recombination events for this average number is 0.26.

Discussion
The estimation of the best-fit model of nucleotide substitution is relevant in phylogenetics [27]. However, modelbased approaches for estimating recombination do not rely on a specific phylogeny and in consequence, they are expected to be robust to model misspecification. This seems to be the case when estimating recombination using the composite likelihood [21,22]. Nevertheless, we have provided the best-fit models of nucleotide substitution under the Akaike information criteria (AIC) and have used them to estimate the population recombination rate.
As expected, the model complexity had not a major effect on the estimation.  The existence of the recombination signal in the DNA sequences of HPVs is important because the genes tested are commonly used to build HPV phylogenies [28] and it is known that recombination can mislead phylogenetic inferences [29]. Furthermore, the detected recombination is in agreement with recently reported phylogenetic incongruence at the putative high-risk ancestor node showing that one or more presumed old recombination events should explain a non monophyletic evolution of oncogenic HPVs [18,19]. The confirmation of ancient papillomavirus recombination has also been recently thoroughly argued by a statistical and phylogenetic recombination detection study [20].
In contrast to that previous study, we have used a modelbased method (the coalescent composite likelihood estimator) to infer recombination from HPV DNA sequences.
Model-based methods are known to be preferred over substitution and phylogenetic ones [23,[30][31][32]. Thus, the composite-likelihood estimator maximizes the chances of detecting recombination avoiding, however, the inference of recombination when it is absent (false positive detection).
Regarding the existence of model complexity and rate variation among sites in the HPV samples, it is known that the amount of divergence and rate variation in the data could affect recombination estimation in some cases [32]. However, rate variation i.e. variation in the rate of nucleotide substitution along the sites in the DNA sequence should have no effect on the recombination estimates obtained using the composite-likelihood estimator as has been previously shown [22]. However, the Pairwise program assumes a simple Jukes-Cantor model with two alle-  II  8  4  10  L2  I  14  9  29  L2  II  8  4  10  L2  III  12  3  9  E6  I  14  38  121  E6  II  8  5  13  E7  HPV16  8  13  34 HPV population recombination estimates under a gene con-version model assuming a Jukes-Cantor evolution model with 2 alleles les [21]. Therefore, to account for possible effects of model complexity and rate variation we also estimated the recombination rate using the extended model possibilities of Kpairwise [22] and confirmed that model complexity and rate variation had no qualitative effect onto recombination estimates.
Moreover, we have designed an experiment to check the possibility of false positive detection due to recombination artifacts because of the model complexity and the high diversity underlying the data. As shown above, there was not significant recombination estimation under the parameters considered.
In addition, we were able to estimate the expected number of recombination events in those cases with significant recombination detection. As expected, the higher numbers were found for group GI, which incorporates a major number of branches from PV phylogeny [19]. Importantly, not all recombination events are detectable [32], thus those expectations just provide an upper-bound below which the real number of detectable events should relay [30].
Although we have detected significant recombination signal at all genes and groups in one combination or another, perhaps the most important result is that recombination was detected at intra-type level in HPV16. This may indicate that recombination is occurring at a relative high frequency in current carcinogenic HPV types and variants. HPV recombination should not be exceptional nowadays since the frequency of co-infection with more than one HPV type is not a rare event, and new recombinant types could be currently being generated. Provided that the oncogenicity of specific HPV intra-type variants appear to vary geographically and also with the ethnic origin of the population studied [33], more research is necessary to assess whether such a variation could relate to different recombinant forms. Moreover, the majority of the vaccines under investigation, both the therapeutic and the prophylactic ones, are based on the use of these genes or their products, to obtain the prevention and the treatment of the infections of the more prevalent high risk (types 16 and 18) and low risk types (6 and 11) [6/and references there in, 34/and references there in]. From the present work it is clear that a better knowledge of HPV evolutionary relationships will be important concerning the optimal number of types to include in vaccines as well as the possibility of cross-reactive immunity among HPV types [2,5]. Also, to obtain consensus and ancestor HPV sequences, this could be used in vaccine design to minimize genetic differences between vaccine strains [35]. The existence of recombination should also be of interest to pharmacogenomic studies, i.e. to learn how genetic variation influence response to drugs.

Conclusion
We provide new support to the recent evidence of recombination in HPV. In addition, we perform an evolutionary characterization, estimating best-fit models of nucleotide substitution and rate variation among sites, of some important HPV DNA sequence sets. Using simulations, we have shown that the detected recombination signals should not be artifacts. Thus, we found that the gene with recombination in most of the groups is L2 but the highest values were detected in L1 and E6. Gene E7 was recombinant only within the HPV16 type. The topic deserves further study because recombination is an important evolutionary mechanism that could have high impact in both pharmacogenomics (i.e. the effect of genetic variation on response to drugs) and vaccine development.

Recombination estimation
To study HPV recombination we used the composite likelihood estimator [42] and its permutation test [21], which is one of the most powerful techniques to detect the recombination signal from DNA sequences [22]. This method is a model-based population genetic approach, which allows for both a linear recombination and a gene conversion model. The composite likelihood estimator is implemented by the program Pairwise from the package Ldhat, freely available at [43], and also by the extension Kpairwise [22] which allows for complex nucleotide mod-els and rate variation among sites and is freely available at [44]. We considered using a gene conversion model as more adequate since the PV genome is circular. Recombination was estimated as the population recombination rate, which is 4Nr where N is the effective population size and r the recombination rate per gene.
Given the recombination values estimated and the number of sequences at each group the expected number of recombination events E(R) associated to each data set can be computed by using formulae 5 in Hudson and Kaplan [26], , where n is the number of sequences in the sample.

Simulations
To check that HPV sequences do not generate recombination false positives under the composite likelihood estimator due to their particular combination of model complexity and rate variation, we simulated 100 DNA samples of 15 sequences each and longitude 500 bp using a coalescent model [45]. We used the software recoal1.7 from David Posada and available upon request from him [46]. Specifically, we set the simulations with the parame-ter values corresponding to the evolutionary model, base composition and rate variation for the gene and group which had the highest estimated significant value. We also set the recombination value to zero. We then used the obtained sequences to estimate the average recombination value and the percentage of false positive recombination tests.