The genetic variability, phylogeny and functional significance of E6, E7 and LCR in human papillomavirus type 52 isolates in Sichuan, China

Background Variations in human papillomavirus (HPV) E6 and E7 have been shown to be closely related to the persistence of the virus and the occurrence and development of cervical cancer. Long control region (LCR) of HPV has been shown multiple functions on regulating viral transcription. In recent years, there have been reports on E6/E7/LCR of HPV-16 and HPV-58, but there are few studies on HPV-52, especially for LCR. In this study, we focused on gene polymorphism of the HPV-52 E6/E7/LCR sequences, assessed the effects of variations on the immune recognition of viral E6 and E7 antigens, predicted the effect of LCR variations on transcription factor binding sites and provided more basic date for further study of E6/E7/LCR in Chengdu, China. Methods LCR/E6/E7 of the HPV-52 were amplified and sequenced to do polymorphic and phylogenetic analysis. Sequences were aligned with the reference sequence by MEGA 7.0 to identify SNP. A neighbor-joining phylogenetic tree was constructed by MEGA 7.0, followed by the secondary structure prediction of the related proteins using PSIPRED 4.0. The selection pressure of E6 and E7 coding regions were estimated by Bayes empirical Bayes analysis of PAML 4.9. The HLA class-I and II binding peptides were predicted by the Immune Epitope Database server. The B cell epitopes were predicted by ABCpred server. Transcription factor binding sites in LCR were predicted by JASPAR database. Results 50 SNP sites (6 in E6, 10 in E7, 34 in LCR) were found. From the most variable to the least variable, the nucleotide variations were LCR > E7 > E6. Two deletions were found between the nucleotide sites 7387–7391 (TTATG) and 7698–7700 (CTT) in all samples. A deletion was found between the nucleotide sites 7287–7288 (TG) in 97.56% (40/41) of the samples. The combinations of all the SNP sites and deletions resulted in 12 unique sequences. As shown in the neighbor-joining phylogenetic tree, except for one belonging to sub-lineage C2, others sequences clustered into sub-lineage B2. No positive selection was observed in E6 and E7. 8 non-synonymous amino acid substitutions (including E3Q and K93R in the E6, and T37I, S52D, Y59D, H61Y, D64N and L99R in the E7) were potential affecting multiple putative epitopes for both CD4+ and CD8+ T-cells and B-cells. A7168G was the most variable site (100%) and the binding sites for transcription factor VAX1 in LCR. In addition, the prediction results showed that LCR had the high probability binding sites for transcription factors SOX9, FOS, RAX, HOXA5, VAX1 and SRY. Conclusion This study provides basic data for understanding the relation among E6/E7/LCR mutations, lineages and carcinogenesis. Furthermore, it provides an insight into the intrinsic geographical relatedness and biological differences of the HPV-52 variants, and contributes to further research on the HPV-52 therapeutic vaccine development. Supplementary Information The online version contains supplementary material available at 10.1186/s12985-021-01565-5.

southwest China, assessed the possible association of polymorphisms in the HPV-52 E6, E7 and LCR with the virus infection, propagation and replication, predicted the high affinity antigen epitope of HPV-52 E6 and E7, and analyzed the binding sites of the transcription factors in the LCR. Our results could provide basic data for further studies on the HPV-52 epidemiology, prevention, and therapeutic vaccine development.

Ethics statement
All participants were informed of the study aims, and a written informed consent was received from each patient before sample collection, and the patients' privacy have been fully protected. This study was approved by education and research committee and Ethics Committee of Sichuan University, China (approval number SCU20100196494), and was carried out in line with the Helsinki Declaration. Collection of clinical specimens 3432 cervical scrape cell samples were collected from outpatients who underwent routine cervical screenings at The Affiliate Reproductive Hospital of Sichuan Genitalia Hygiene Research Center, Chengdu SongZiNiao Sterility Hospital and Chengdu Medical College Affiliated Infertility Hospital from December 2018 to December 2019. The sample was stored at −20 °C in cell preservation solution and all methods were performed in accordance with the relevant guidelines and regulations. Genomic DNA extraction and HPV typing HPV DNA was extracted and genotyped using Advanced Fragment Analysis (AFA) based on capillary electrophoresis system with the commercial Human Papillomavirus Genotyping Kit for 25 types (HEALTH Gene technologies, Ningbo, China) according to the manufacturer's guidelines. This kit is able to classify 25 different HPV types (HPV 16,18,26,31,33,35,39,45,51,52,53,56,58,59,66,68,73,82,6,11,42,43,44 and 83). 46 samples that tested positive for single infection of HPV-52 were picked out for amplification, sequencing and following study.

DNA amplification and sequencing
The primer pairs were designed by Primer premier5.0 according to the HPV-52 reference sequence (GenBank: X74481.1) and the primer sequences were listed in Table 1. All the primers were synthesized by TSINGKE, China. The E6, E7 and LCR fragments were amplified in 50μl PCR reaction volumes containing 18μl of extracted DNA, 25μl Taq 2X PCR Master Mix with Dye (ABclonal), 1μl forward primer, 1μl reverse primer and 5μl Nuclease-free water.
The PCR amplification was performed under the following conditions: an initial 30s denaturation step at 95 °C, followed by 30 amplification cycles, with each cycle including a 30 s denaturation step at 95 °C, a 30 s annealing step at 52-55 °C (52 °C for E6, 55 °C for E7 and LCR), and a 35-65 s (35 s for E7, 45 s for E6 and 65 s for LCR) elongation step at 68 °C , and then 5 min final extension at 68 °C , ended up and held at 4 °C . The PCR products were examined under UV light after electrophoretic separation on a 2% agarose gel. The positive fragments were subjected to the bi-directional DNA sequencing (TSINGKE, China) for the further analysis.
To estimate the selection pressure acting on the HPV-52 E6 and E7 protein coding regions, the non-synonymous and synonymous nucleotide divergence were calculated by CODEML in PAML 4.9. Using the Bayes empirical Bayes (BEB) analysis, the sites with the posterior probability > 95% were identified as the positively selected sites [23].

T and B cell antigen epitope prediction
The Immune Epitope Database (IEDB) server (http://www.iedb.org/) was used to predict the major HLA-I and HLA-II binding peptides [28,29]. In our study, we used IEDB recommended methods to predict the epitopes against 34 HLA-I alleles by comprehensive analysis the frequency of Chinese in allele frequency net database(AFND) (http://www.allelefrequencies.net/hla6006a.asp) and 27 HLA-II allele [30].(The more details see additional file 1: Table S1 and S2). According to the IEDB recommended method, among HLA-I a low percentile rank (PR) showed a good binder, among HLA-II a low adjusted rank (AR) showed a good binder. In prediction of HLA-I and HLA-II restricted epitopes the PR≤1.0 and AR≤5.0 were selected for the further analysis, respectively [29,31].
The ABCpred server (https://webs.iiitd.edu.in/raghava/abcpred/index.html) was used to predict B cell antigen epitopes of HPV-52 E6/E7 reference and variant sequences according to the default parameters [32]. The higher predicted score represented a better affinity [29,31]. The server is able to predict epitopes with 65.93% accuracy using recurrent neural network.

Genomic polymorphisms of HPV-52 E6/E7/LCR
Among the 46 HPV-52 positive samples from the outpatients, only 41 entire E6/E7/LCR sequences were obtained and further analyzed. It is might be the possible explanation for not all sequences successfully amplified that the small copy number of HPV in some samples. Our data showed that the total length of the HPV-52 E6 Open Reading Frame (ORF) was 447 bp and that of E7 was 300 bp, which were consistent with the reference sequence. LCR was only 879 bp, with a 10 bp deletion compared with the reference sequence in 97.56% (40/41) samples. A total of 50 SNP sites and 3 deletions were identified across the E6, E7 and LCR. The combinations of all these SNP sites made for 12 unique E6/E7/LCR sequences (variant No: 1-12, Table 2). The 12 variant had the most variation compared with reference sequence.
For E6, 6 SNP sites were observed compared with the reference sequence and the combinations of all these SNP made for 5 unique E6 sequences. G350T and A378G were the most variable sites and were observed in 100% and 97.56% (40/41) of the samples, respectively. G108C and A379G, leading to the amino acids substitution of E3Q and K93R, respectively. A378C and A379G together led to the amino acid substitution of K93R. G350T, G356A and A530G were synonymous mutation. For E7, 10 SNP sites were observed compared with the reference sequence and the combinations of all these SNP made for 2 unique E7 sequences. A801G and C751T were the most variable sites and were observed in 100% and 97.56% (40/41) of the samples, respectively. C662T, AG706/707GA, T727G, C733T, G742A and T848G, leading to the amino acids substitution of T37I, S52D, Y59D, H61Y, D64N and L99R, respectively. T573A, C751T and A801G were synonymous mutation. No deletion or insertion mutation sites were found in either E6 or E7 sequences. The secondary structures also had not changed in either E6 or E7 protein.
For LCR, 34 SNP sites and 3 deletions were observed compared with the reference sequence and the combinations of all these mutations made for 9 unique LCR sequences. G7622A, T7624G, T9659C, G7712C and G7861A were observed in all samples. G7168C, C7207A, G7371, A7657C, A7865G and T13C were observed in 97.56% (40/41) of the samples. Two deletions were found between the nucleotide sites 7387 to 7391 (TTATG) and 7698 to 7700 (CTT) in all samples. A deletion was found between the nucleotide sites 7287 to 7288 (TG) in 97.56% (40/41) of the samples. Also, T7933C and A7938G were the common variable sites and were observed in 36.59% (15/41) and 60.98% (25/41) of the samples, respectively.

Phylogenetic analysis
The neighbor-joining phylogenetic trees were constructed by MEGA 7.0, using the 12 unique HPV-52 E6/E7/LCR variant sequences and 7 sub-lineages reference sequences. The phylogenetic tree (Fig.1) showed that all variants were clustered in sub-lineage B2, except variant NO.12 (sub-lineage C2). It was basically consistent with a study related to HPV-52 in Japan [34].

Selective pressure analysis
The selective pressure analysis results by the Bayes empirical Bayes (BEB) analysis of PAML 4.9 were indicated that: in the HPV-52 E6 and E7 protein coding sequences, no positive selection sites were observed.

B cell binding peptides prediction
In HPV-52 E6, a total of 13 B cell potential epitopes were predicted in both of the reference and variant sequences. The most potent epitopes were 129-144 MGRWTGRCSECWRPRP and 108-123 TPLCPEEKERHVNANK. Due to K93R, the score of epitopes 89-104 EERVKKPLSEITIRCI and 81-96 YSLYGKTLEERVKKPL changed from 0.87 to 0.90 and 0.81 to 0.84, respectively; due to E3Q, the score of epitope 3-18 EDPATRPRTLHELCEV changed from 0.68 to 0.74. The other prediction epitopes were consistent between reference and variant sequences, and no increased epitopes in the variant sequences (The more details see additional file 6: Table S11).

Prediction of the transcription factor binding sites
The online JASPAR database was used to investigate the potential binding sites for the transcription factors in HPV-52 LCR reference and variant sequences. The results showed that LCR region had high-affinity binding sites for transcription factors of SOX9, FOS, RAX, HOXA5, VAX1 and SRY. G7622A and T7624G, G7861A and C7917A/G76C lead to increase the binding sites for FOXC1, RAX/VAX1/HOXA5/PHOX2A and FOXL1, respectively. In addition, the nucleotide sites 21, 7168, 7414, 7580, 7865 and 7983 potentially affected the binding sites for FOXP3, CEBPB, SRY, CEBPB, VAX1 and VAX1/RAX/HOXA5, respectively.

Discussion
According to Marc Arbyn research, approximately 570 000 women developed cervical cancer and 311 000 women died in 2018. Worldwide, cervical cancer was the fourth most common cancer among women, after breast cancer, colorectal cancer, and lung cancer; and it was also the fourth leading cause of cancer death among women, after breast, lung and colorectal cancers. Approximately 84% of all cervical cancers and 88% of all deaths caused by cervical cancer occurred in lower-resource countries, of which 1.8% of women were diagnosed with and 1.3% died from the disease before age 75 years, in the absence of competing causes of death. By contrast, in the highest-resource countries, the cumulative rates of cervical cancer incidence and mortality were two to four times lower than those in lower-resource countries. And China was the country with the highest number of cases (106 000), whereas India was the country with the highest estimated number of cervical cancer deaths (60 000). China and India together contributed 35% to the global burden of cervical cancer cases and deaths [35]. Therefore, it is still particularly important to study cervical cancer in China.
The persistent infection with high-risk HPV types is the main cause in triggering the development of cervical cancer, such as the types 16, 18, 52 and 58 [36]. HPV 52 is one of the most relevant HPV types especially in Southeast Asia, where it causes up to 20% of all cervical cancer [37,38]. Through preliminary exploration in the early stage, we found that HPV-52 accounting for 22.05% of all the HPV-positive samples was the most common high-risk types, followed by HPV-58 and HPV-16 in Sichuan, China. In addition, other studies have shown that the distribution of HPV-52 has a certain regional distribution, mainly related to cervical cancer in Asian countries like China and South Korea [39,40]. Compared with the reference sequence, the HPV-52 variants were clustered in sub-lineage B2 and C2, no variants belonged to the lineage A and D in our study. In addition, the HPV-52 LCR variants from Sichuan, China have not been reported yet. Our study showed that the variation of the HPV-52 LCR was showed in a higher ratio than those of the E6 and E7, the nucleotide variations were LCR > E7 > E6 found in 3.82%, 3.33% and 1.34%, respectively. The most common non-synonymous substitution in the HPV-58 E6 was A278G (K93R), and E7 was only one sample with non-synonymous. The viral proteins E6 and E7 function as the main regulators of HPV-induced tumorigenesis, and changes in amino acids may influence the transforming activity of the E6 and E7 oncoproteins [41]. Identifying new variants in HPV-52 E6/E7 may inform the rational design of new vaccines specifically for women in southwest China.
The amino acid changes in E6 and E7 oncoproteins can influence the HLA binding peptides and have a significant immunological effect on the immune system's ability of recognition of these viral antigens. Unlike prophylactic HPV vaccines, which are used to generate neutralizing antibodies against viral particles, therapeutic HPV vaccines are used to stimulate cell-mediated immune responses to specifically target and kill infected cells. In the cell mediated immune responses, cytotoxic T lymphocytes (CTLs) were considered as the major eradicators of both HPV-infected cells and cervical cancer [42]. HPV oncoproteins E6 and E7 are responsible for the malignant progression of HPV-associated diseases and are consistently expressed in HPV-associated diseases and cancer lesions. E6 and E7 act as the promising specific tumor antigens and are available as the therapeutic targets [43]. Furthermore, therapeutic HPV vaccines targeting E6 and E7 can circumvent the problem of immune tolerance against selfantigens because these virus encoded oncogenic proteins are foreign proteins to human bodies [44]. In our study, we had identified the high affinity epitopes of E6/E7 oncoproteins for T cells and B cells, providing certain basic data for the development of therapeutic vaccines.
Mutations on LCR may influence the binding sites and the function of it. The HPV LCR which contains the binding sites for both viral and cellular factors, has shown regulatory functions on replication of HPV, transcriptional activity of the E6/E7 and the other interaction through the virus life cycle [25,38]. In our study, VAX1, CEBPB, FOXL1, PHOX2A and HOXA5 were the transcription factors that may be affected in HPV-52. One study had shown that VAX1 was closely related to bladder cancer recurrence [45]. CEBPB is a leucine-zipper transcription factor that regulates growth and differentiation of hematopoietic and epithelial cells. One study based on breast cancer found that CEBPB was a novel transcriptional regulator of CLDN4. The upregulation of CEBPB-CLDN4 signaling caused the migration and invasion of cancer cell [46]. FOXL1 is a member of the Forkhead box (FOX) superfamily and was reported to be dysregulated in various types of cancers [47]. PHOX2A was a transcription factor involving in cell proliferation and migration in lung cancer [48]. HOXA5 is a member of the homeobox (HOX) family and is upregulated in many types of tumors [48,49].
In addition, SOX10, a transcription factor of the sex determining region Y (SRY)-related high motility group (HMG)-box gene family, playing an important role in cancer progression, including tumorigenesis, changes in the tumor microenvironment, and metastasis [50,51]. Studies had shown that FOS was closely related to the pathogenesis of bone tumors in mice [52]. The study by Yang M et al. showed that in addition to stimulating PKR activity, RAX can positively regulate both SV40 large T antigen-dependent DNA replication and transcription in a mechanism that may alter the interaction of the cellular factor(s) with the SV40 enhancer via the dsRNA-binding domains of RAX [53]. This function of RAX may have implications for regulation of HPV replication and transcription because of the many similarities between the viral and cellular processes. SOX9, FOS, RAX and SRY were the high probability binding sites in HPV-52 LCR.