Genetic variation of E6 and E7 genes of human papillomavirus type 16 from central China

Background Persistent high-risk human papillomavirus (HR-HPV) infection is an important factor in the development of cervical cancer, and human papillomavirus type 16 (HPV-16) is the most common HR-HPV type worldwide. The oncogenic potential of HPV-16 is closely related to viral sequence variation. Methods In order to clarify the variant characteristics of HPV-16 E6 and E7 genes in central China, E6 and E7 sequences of 205 HPV‐16 positive samples were amplified by polymerase chain reaction. PCR products of E6 and E7 genes were further sequenced and subjected to variation analysis, phylogenetic analysis, selective pressure analysis and B-cell epitope prediction. Results Twenty-six single nucleotide variants were observed in E6 sequence, including 21 non-synonymous and 5 synonymous variants. Twelve single nucleotide variants were identified in E7 sequence, including 6 non-synonymous and 6 synonymous variants. Four new variants were found. Furthermore, nucleotide variation A647G (N29S) in E7 was significantly related to the higher risk of HSIL and cervical cancer. Phylogenetic analysis showed that the E6 and E7 sequences were all distributed in A lineage. No positively selected site was found in HPV-16 E6 and E7 sequences. Non-conservative substitutions in E6, H31Y, D32N, D32E, I34M, L35V, E36Q, L45P, N65S and K75T, affected multiple B-cell epitopes. However, the variation of E7 gene had little impact on the corresponding B-cell epitopes (score < 0.85). Conclusion HPV-16 E6 and E7 sequences variation data may contribute to HR-HPV prevention and vaccine development in Jingzhou, central China.

initiation and malignant growth of HPV-positive cancer cells through the encoded E6 and E7 protein, which bind to p53 tumor suppressor protein and retinoblastoma gene product (pRb) respectively and result in degradation of the suppressor protein and the following cell proliferation and tumor formation [5].
Sequence diversity exists in each HPV type, with intra-type variation spectra varying from 1 to 10% and sub-spectra varying from 0.5 to 1% [6].There are 4 main lineages (A, B, C, D) and 16 sub-lineages of HPV-16, including A1-3 (European), A4 (Asian), B1-4 (African 1), C1-4 (African 2), D1 (North American), D2-3 (Asian-American) and D4 [7].HPV-16 intra-type variants are preferentially associated with specific histological types of cancer, and non-European HPV-16 variants have a higher oncogenicity due to their association with high-grade lesions of the cervix and invasive tumors [8,9].In one study, the association of HPV variant lineages with precancer and cancer is analyzed, showing that the presence of HPV-16 A4 variants is significatively associated with an increased risk of cervical cancer, compared to A1/A2; while women with non-A sub-lineages (B, C, D) have a higher risk of CIN 3 and cervical cancer [10].Furthermore, molecular and epidemiological data indicate that variants of the same HPV type are biologically distinct and may confer differential pathogenic risks.HPV-16 E6 variant E-G350 is closely related to HPV-16 persistent infection and cervical lesions transformation from LSIL to HSIL [11].High mutation rates of HPV-16 E6 L83V are associated with viral persistence and cervical intraepithelial neoplasia progression, conferring a higher risk for highgrade cervical intraepithelial neoplasia development [11].The HPV-16 E7 variant A647G (N29S) has been observed previously in some cervical cancer cases, and shows an increased oncogenicity in vitro, though at present cervical cancer risk association is conflicting across the various studies [12][13][14].
HPV prevalence and type distribution vary across the countries and the different regions of a country [15].HPV-16 is the most oncogenic HPV type, with relevance to above 50% of cervical cancer worldwide [16].The aims of this study were to identify gene variations and corresponding amino acid variations in HPV-16 E6 and E7 gene in Jingzhou, whose longitude and latitude are 111° 15′-114° 05′ E and 29° 26′-31° 37′ N respectively, west of Wuhan city in central China, construct phylogenetic trees of E6 and E7 sequences, predict the secondary structure and B-cell epitopes of the proteins, and perform selective pressure analysis.The results of this study may provide useful data for the prevention and treatment of HPV in Jingzhou area, central China.

Sample collection
In this study, cervical exfoliated cell samples were collected from the female patients underwent cervical screening of possible HR-HPV infection at gynecology department of Jingzhou Hospital Affiliated to Yangtze University from September 2019 to October 2021.Informed consents were obtained from all patients and the study was approved by the Ethics Committee of Jingzhou Hospital Affiliated to Yangtze University.

DNA extraction and typing
Based on the method of magnetic beads, DNA was extracted according to the instruction of the nucleic acid extracting kit (Guangzhou Magen Biotechnology Co., Ltd.).The DNA extraction products were measured by real-time quantitative PCR according to the instructions of HR-HPV typing kit (Shanghai ZJ Bio-Tech Co., Ltd.), covering 15 HR-HPV types (16,18,52,58,68,66,56,31,33,35,45,82,39,51,59).4μL extraction products, as PCR template, were used for HPV typing.Amplification parameters were initially 94 °C for 2 min, 93 °C for 10 s, and 62 °C for 31 s for 40 cycles.In each DNA amplification reaction, the total amount of product after each PCR cycle was measured with a fluorescent chemical at 62 °C.Both positive and negative controls were used for PCR amplification.Finally, the remained extracted DNA products were stored in − 80 °C for subsequent experiments.

Variation analysis
The sequencing results were examined for peak plots using Chromas software.Variation analysis of E6 and E7 variants was performed in comparison with the reference sequence of HPV-16 on NCBI (GenBank: K02718) through MEGA11 software.The variant sequences were further transformed into amino acid sequences to analyze the variation of the encoded proteins and predict their secondary structures using GOR4.

Selective pressure analysis
The codeML program in pamlX software based on Maximum likelihood method (http:// abacus.gene.ucl.ac.uk/ softw are/ paml.html) was used to infer dn/ds and positively selected site.Bayesian empirical Bayesian method was also used to calculate the posterior probability of positively selected site.Seven codon-substitution models (M0, M1, M2, M3, M5, M7, M8) were used to determine whether positive selection impacts the evolution of E6-E7.These models treat codons as basic units of evolutionary change and take into account genealogical history when calculating parameters.The log-likelihood score evaluates the quality of fit of the input data to the model conditions.These seven codon models use different assumptions to estimate different sets of parameters.In these models, dn/ds, estimated as separate cryptographic subclasses, are assumed to evolve independently of each other [18].To assess the effect of the positive selection on a specific coding region, a likelihood ratio test (LRT) was performed to compare the nested models [19].

B-cell epitope prediction
B-cell epitopes of HPV-16 E6 and E7 reference and variant sequences were predicted using ABCpred Server (http:// crdd.osdd.net/ ragha va/ abcpr ed/), an artificial neural network-based B-cell epitope prediction server, based on default parameters.The higher the prediction score, the better the affinity of the epitope.

HPV-16 E6 and E7 variation
Two hundred and five HPV-16 single positive samples were amplified and sequenced, of which E6 and E7 sequences were successfully obtained for 176 samples (median age 43.5 years; range 18-68 years), and the other 29 samples were excluded due to PCR or sequencing failure.In 176 samples,110 women were histological normal (62.5%), 9 were LSIL (5.1%), 42 were HSIL (23.9%) and 15 were cervical cancer (8.5%).Among the 176 E6 and E7 sequences, 168 sequences had nucleotide variants, and the remaining 8 sequences were completely homologous to the reference sequence.In this study, the E6 and E7 variant sequences were further divided into 45 different variant groups, noted as 16HB01-16HB45, and the same sequences represented a specific variant group.These sequence data were submitted to GenBank with accession numbers OQ659416-OQ659460.16HB01 (58/168, 34.5%) and 16HB02 (24/168, 14.3%) were the sequences with the highest variation frequency.Thirty-eight single nucleotide variants were identified in the E6 and E7 sequence, including 27 nonsynonymous variants and 11 synonymous variants.The sequence variability of E6 gene was higher than that of E7 gene.The most common nucleotide variation in E6 was T178G (D32E) (102/168, 60.7%), while the most common nucleotide variations in E7 were A647G (N29S) (101/168, 60.1%) and T846C (102/168, 60.7%).Distribution of major nucleotide variation in HPV16 E6 and E7 based on cervical disease status was shown in Table 1.We combined the groups of histological normal and LSIL into one category, meanwhile, combined HSIL and cervical cancer into another category.The statistical results suggested that A647G (N29S) was significantly related to HSIL and cervical cancer (OR = 1.99, 95% CI = 1.03 to 3.87, P = 0.04).T292C of E6 sequence and G597A, G753C and G781C of E7 sequence were novel variants.Five amino acid variants, including two non-conservative substitutions, were found in the sequence affecting the alpha helix.Ten amino acid variants, all with nonconservative substitutions, were found in the sequence affecting the fold.The results were shown in Table 2.  Table 2 The variations of HPV16 E6 and E7 genes The nucleotides matching the reference (GenBank: K02718) are marked with a dash (-), AA, amino acid; S, strand; H, helix

Selective pressure analysis
No positively selected sites were found in E6 and E7 sequences, and the results were shown in Tables 3 and 4.

B-cell epitope prediction
B-cell epitope prediction was based on the corresponding amino acid reference sequences and variant sequences of HPV-16 E6 and E7.Only epitope prediction scores greater than 0.85 were listed, and the predicted results were shown in Table 5.For the E6 protein,

Discussion
Cervical cancer is the leading cause of cancer deaths among women in developing countries [20].Globally, the most common HR-HPV types in cervical cancer are HPV-16 (15.56-83.78%)and HPV-18 (3.4-41.1%)[21].HPV-16 is the major cause of more than half of cervical cancer worldwide and the main HPV type causing invasive cervical cancer.There are significant geographic and ethnic differences in HR-HPV infection types [22,23].In China, HPV-52 and HPV-58 are the most common types, followed by HPV-16, which differed from the HPV type spectrum in other countries [24][25][26].In Jingzhou, central  China, the three major HR-HPV types in women are HPV-52, HPV-58, and HPV-16, respectively [27].
The E6 and E7 proteins encoded by the E6 and E7 genes in HPV-16 are major oncogenic proteins and play important roles in viral replication, termination of cell differentiation and oncogenesis [28].Previous studies have shown that HPV variants may affect viral infectivity, pathogenicity and host immune response [29].Therefore, it is meaningful to analyze E6 and E7 gene variation and corresponding amino acid sequences.The sequencing results of E6 and E7 sequences indicated that the variation rate of E6 sequences (5.5%) was greater than that of E7 sequences (4.0%), implying E7 sequences were more conserved than E6 and more suitable as a target for therapeutic vaccine development.In this study, T178G (D32E) (102/168, 60.7%) of E6 and A647G (N29S) (101/168, 60.1%) and T846C (102/168, 60.7%) of E7 were the dominate variants, which were also found in other regions of the world.In Korea, the most prevalent variants are E6 T178G (68%) and E7 A647G (73%) in HPV-16 [30].In India, the most frequent variants are T350G (100%) in E6 and T789C (87.5%) in E7 [31].In Xinjiang, China, the most frequent variants in E6 are T350G (36/75, 48%) and T178G (19/75, 25.3%); the most frequent variant in E7 is A647G (18/ 75, 24%) [32].In Northeast China, the most common variants are T178G (32.69%) in E6 and A647G (34.62%),G666A (38.46%) and T846C (32.69%) in E7 [33].16HB01, which simultaneously had the variants T178G (D32E), A647G (N29S) and T846C, was the most prevalent (58/168, 34.5%) in the 45 variant groups.In Taizhou, China, T178G (D32E) in E6 and A647G (N29S) and T846C in E7 occur in 96.4% of the A4 (Asian) variants [34].In this study, twelve nonsynonymous variants in the sequences encoding E6 or E7 proteins were found, which may affect the folding of oncoproteins in the secondary structure and result in differences in their ability to interact with tumor suppressor proteins and the pathogenicity of HPV-16.T292C, G597A, G753C and G781 were new variants that had never been reported before.Specific mutations of HPV-16 are associated with an increased risk of high-grade squamous intraepithelial lesions and invasive cervical cancer development [35].In Swedish, prototype HPV-16 and its E6 variant L83V are both prevalent in preinvasive and invasive cervical lesions [36].In Shanghai, China, T7220G (D32E) variation in E6 and A7689G (N29S) in E7 increase the incidence of HSIL [37].In Kunming, China, the C749T (S63F) variation in E7 is associated with cervical cancer [9].In this study, A647G (N29S) variation in E7 was significantly related to the higher risk of HSIL and cervical cancer, though the sample size of cervical cancer is relatively limited due to the fact that the patients in this study were mainly outpatients underwent HR-HPV screening.
HPV-16 is divided into genetic sub-lineages A1-4, B1-4, C1-4 and D1-4.The distribution of the lineages varies geographically and ethnically [38].Globally, variants of HPV-16 sub-lineages A1-A3 are predominantly found in Europe, A4 sub-lineage in Asia, B and C lineages exclusively in Africa, and D lineage most common in South/Central America [39].Different lineages exhibit disparity in carcinogenic potential.In comparison to other lineages, A4 sub-lineage and D lineage often show more strong association with cervical cancer [39,40].The worldwide burden of cervical cancer in different HPV lineages is largely driven by the spread of the historical HPV-16 sub-lineages, and HPV-16 gene variants can significantly affect the risk of cervical cancer development [39,41].In this study, we obtained 176 complete E6 and E7 gene sequences from HPV-16 isolates.All HPV-16 variants belonged to the A lineage, with the A4 sub-lineage predominating (70.8% in E6 and 60.7% in E7).Our findings were consistent with the data from North China (64.70% of A4) and East China (62.50% of A4) [40,42].
The main feature of positive selection is that it incurs an abnormal increase in allele frequency, which allows the virus to adapt rapidly to environmental changes [43].In Greek population, the selection pressure analysis of amino acid residues of HPV-16 shows that codon 83 of E6 protein and codon 85 of E7 protein are positive selection sites [44].However, in this study, positively selected site of HPV-16 E6 and E7 sequences was never found in Jingzhou, central China, which was also in consistence with the results in southwest China [45].This may be one of the possible reasons why HPV-16 is not the most prevalent in China, though needed to further verify in the future.
The HPV vaccines currently available on the market are preventive vaccines, and the development of therapeutic vaccines for people who have been infected with HPV is of great application prospects.E6 and E7 proteins are persistently expressed in HPV infection cells, most cervical cancer and precancerous lesions, but not in normal tissues [46].Therefore, HPV E6 and E7 proteins are ideal targets for diagnostic detection and therapeutic vaccine design.In this study, the B-cell epitopes of HPV-16 E6 and E7 proteins were predicted using the ABCpred server, and the effects of amino acid changes caused by genetic variants on B-cell epitopes were analyzed based on changes in epitope fraction.The results predicted 10 B-cell epitopes in E6 protein, of which six were new epitopes never encoded by the reference sequence.In addition, non-conservative substitutions of some amino acids improved B-cell epitope prediction scores, such as H31Y, D32N, D32E, I34M, L35V, E36Q, L45P, N65S, and K75T.Therefore, non-conservative substitutions of amino acids should be fully considered when developing therapeutic vaccines.The reference sequence and variant sequence of E7 predicted the B-cell epitope at the same site (39-54 DGPAGQAEPDRAHYNI).The optimal epitope for therapeutic vaccines is often selected from the same region of the reference sequence and the variant sequence [47].Therefore, E7 protein may be more suitable as an ideal target for therapeutic vaccine design.These results are valuable for the development of HPV-16 therapeutic vaccines for the population in specific region, though these predicted epitopes still need to be further verified in vivo.

Conclusion
This study researched on the genetic variability of HPV-16 E6 and E7 genes in Jingzhou, central China.The sequencing results showed E7 sequences were more conservative than E6 sequences, meanwhile several novel variations (T292C, G597A, G753C and G781C) were detected.All HPV-16 variants belonged to the A lineage, with the A4 sub-lineage predominating.In addition, nonconservative substitutions, H31Y, D32N, D32E, I34M, L35V, E36Q, L45P, N65S and K75T in E6, affected multiple B-cell epitopes.This study provided useful data for the epidemiological characteristics, immunotherapy and vaccine development of HPV-16, in central China.
• fast, convenient online submission • thorough peer review by experienced researchers in your field • rapid publication on acceptance • support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year

•
At BMC, research is always in progress.

Learn more biomedcentral.com/submissions
Ready to submit your research Ready to submit your research ?Choose BMC and benefit from: ? Choose BMC and benefit from:

Table 1
Distribution of major nucleotide variation in HPV16 E6 and E7 genes based on cervical disease status HN, histological normal; LSIL, low-grade squamous intraepithelial lesion; HSIL, high-grade squamous intraepithelial lesion; CC, cervical cancer

Table 4
Results of likelihood ratio tests of positive selection

Table 5
Predicted B-cell epitopes of the HPV-16 E6 gene