Prevalence and distribution of human papillomavirus (HPV) in Luoyang city of Henan province during 2015–2021 and the genetic variability of HPV16 and 52

Background Persistent high-risk Human papillomavirus (HPV) subtypes infection has been implicated as a causative of cervical cancer. Distribution and genotypes of HPV infection among females and their variations would assist in the formulation of preventive strategy for cervical cancer. The purpose of the present study is to investigate the prevalence of HPV among females in central China. Methods The distribution and genotypes of HPV among 9943 females attending the gynecological examinations in central of China during 2015–2021 were investigated. HPV genotypes were detected using a commercial kit. Nucleotides sequences of L1, E6 and E7 genes in HPV16 or HPV52 positive samples collected in 2021 were amplified by polymerase chain reaction (PCR). Variations of L1, E6 and E7 in HPV16 and HPV52 were gained by sequencing and compared with the reference sequence. Sublineages of HPV16 and HPV52 were determined by the construction of phylogenetic tree based on L1 gene. Results The overall prevalence of HPV infection was 22.81%, with the infection rate of high-risk human papillomavirus (HR-HPV) was 19.02% and low-risk human papillomavirus (LR-HPV) was 6.40%. The most top five genotypes of HPV infection were HPV16 (7.49%), HPV52 (3.04%), HPV58 (2.36%), HPV18 (1.65%) and HPV51 (1.61%). Plots of the age-infection rate showed that the single HPV, multiple HPV, HR-HPV, LR-HPV infection revealed the same tendency with two peaks of HPV infection were observed among females aged ≤ 20 year-old and 60–65 year-old. The predominant sublineage of HPV16 was A1 and B2 for HPV52. For HPV16, The most prevalent mutations were T266A (27/27) and N181T (7/27) for L1, D32E for E6 and S63F for E7 in HPV16. For HPV52, all of the nucleotide changes were synonymous mutation in L1 (except L5S) and E7 genes. The K93R mutation was observed in most HPV52 E6 protein. Conclusions The present study provides basic information about the distribution, genotypes and variations of HPV among females population in Henan province, which would assist in the formulation of preventive strategies and improvements of diagnostic probe and vaccine for HPV in this region.

women who died from cervical cancer live in the developing countries [2]. Human papillomavirus (HPVs) are found in the cervical carcinoma tissues of most patients and the oncogenic HPVs are regarded as the major cause of cervical cancer. In China, it was reported that there are estimated 110,650 new cancer cases and 36,714 cancer deaths are attributable to HPVs infection in 2015, of which cervical cancer accounted for 85.6% and 78.1% [3].
HPVs are small non-enveloped double-stranded DNA viruses that belong to the genus Alpha-Papillomaviridae family [4]. The HPVs genomes are about 7.2-8.0 kb and contain eight open reading frames (ORFs), including: the presumptive early (E1-E2, E4-E7), late (L1 and L2) and Long Control Region (LCR) [5][6][7]. The continued expression of the E6 and E7 genes is related to induce cellular immortalization, transformation, and carcinogenesis [6]. The E6 and E7 proteins would be candidate for the development of therapeutic vaccines [8]. The L1 protein is the primary composition of HPVs and can self-assemble into virus like particles (VLPs) [9]. The first generation commercial HPV vaccines are based on the recombinant expression of L1 protein in system [10,11]. Human immunized with commercial HPV vaccines can acquire robust immunity against the homology genotype [9]. The polymorphisms of HPV L1 gene affect the generation of neutralization antibody of different binding affinities [12].
In the present study, the distribution and genotypes of HPV infection among females in Henan province during 2015-2021 were investigated based on commercial HPV test kit. As the L1 gene plays an important role in the classification of HPV sublineage, so L1 gene of HPV16 and HPV52 were sequenced and applied to phylogenetic analysis. The E6 and E7 are the major oncogenes and the variations are correlated with the progression of cervical lesions. The L1, E6 and E7 genes of HPV16 and HPV52 were sequenced and compared with the reference HPVs strain. The distribution and genotypes of HPVs would assist on the formulation of the vaccination program and preventative strategies against cervical cancer. Variations of the HPVs genetic may be useful for the analysis of cervical cancer risk, even provide crucial information for the development of diagnostic tools and vaccine design.

HPV sequencing
Samples collected in 2021 that were only positive for HPV16 or HPV52 were chosen and processed for the variant analysis of L1, E6 and E7 genes by sequencing. To amplify the full length of the L1, E6 and E7 genes, primers were designed based on published HPV16 (GeneBank NC 001526) and HPV52 (GeneBank NC 001592) sequences. The primers used for the amplification of L1, E6 and E7 genes were shown in Table1 and synthesized by Sangon Biotech, Inc. (Shanghai, China). The PCR reaction volume was 50 μl, which included 2 μl of template DNA, 25 μl 2 × PrimeSTAR Max Premix(Takara Biotechnology Co., LTD, Dalian, China), 2 μl of each primer and 19 μl of ultrapure water. The PCR program was as follows: initial denaturation step at 94°C for 10 min; followed by 30 cycles of 95°C for 30 s, 55°C for 30 s, 72°C for 30 s, and a final 72°C extension for 10 min. The PCR products were visualized on 1% agarose gels stained with GoldView TM Nucleic Acid Stain. Identified plasmids containing the L1, E6 or E7 genes were used as positive control and the reaction mixture containing no template as negative control. The targets fragments were then purified using TIANgel Midi Purification Kit (TIANGEN BIOTECH, China) and ligated into p-EASY-Blunt cloning vector (TransGen Biotech, China) according to manufacturer's instruction. The recombinant plasmids were then transformed into Trans1-T1 Phage Resistant Chemically Competent Cells (TransGen Biotech, China) according to manufacturer's protocols. The positive clones containing the recombinant plasmids were sent to Sangon Biotech, Inc. (Shanghai, China) for sequencing.

Molecular characterization and phylogenetic analysis of HPV16 and HPV52
The variations of the L1, E6 and E7 genes and proteins were gained by the comparison and numbered with the reference strain HPV16 (GeneBank NC 001526) and HPV52 (GeneBank NC 001592) by DNAStar (Madison, WI, USA). Variants between the studied and reference sequence were noted and the frequencies were calculated.

Prevalence of HPV infection in different age groups and years
To evaluate the relationship between HPV infection with the age, females were divided into fourteen age groups.
There are significant differences in the HPV infection rates among females in different age groups (χ 2 = 134.563, P < 0.01). Among the 2268 females with HPV infection, there were two peaks of HPV infection, the first was in the ≤ 20 year-old group (31.48%, 17/54) and the second was in the 61-65 year-old group (38.04%, 151/397). All of the HR-HPV, LR-HPV, single infection and multiple infection groups showed the same tendency with the "Any HPV type" infection in different age groups (Fig. 2). For all of the groups, the HPV infection rates are significant differences (P < 0.01) during 2015 and 2021. The highest HPV infection rates were observed in 2015, with the any HPV type was 34.98%, the high risk types was 30.26%, the single infection was 27.11% and then declined gradually (Fig. 3).

HPV16 and HPV52 L1 gene nucleotide variations and amino acid mutational analysis
Twenty-seven HPV16 L1 genes were sequenced successfully and twelve different sequences were submitted to GenBank (MZ546238-MZ546249). The twelve HPV16 L1 gene shared 99.6-99.9% identities with the reference sequence (NC 001526). The variation sites and frequencies of HPV16 L1 gene are shown in Table3  Fifteen HPV52 L1 genes were sequenced successfully and five different nucleotide sequences were gained by comparison and then submitted to GenBank (OL589507-OL589511). The five sequences shared 99.0%-99.9% identities with the reference sequence (NC 005192). Compared with the NC 001592, nineteen nucleotide changes were identified among the fifteen sequences ( Table 4). All of the changes in HPV52 L1 gene were synonymous mutation except L5S (6.7%, 1/15). Seven synonymous mutations in HPV52 L1 gene, including G6110A, T6701G, T6764C, A6794G and C6824T were found in 93.3% (14/15) samples. The G6218A was detected in all of fifteen sequences.

HPV16 and HPV52 E6-E7 gene nucleotide variations and amino acid mutational analysis
Eighteen HPV16 E6 and E7 genes were sequenced successfully. Thirteen different E6 sequences (MZ546266-MZ546278) and E7 sequences (MZ546295-MZ546307) were obtained and the identity was 99.4-100% for E6 gene and 98.7-100% for E7 gene compared with the HPV16 reference sequence (NC 001526). Compared with the reference sequence, nine nucleotide mutations were observed in the HPV16 E6 genes and eight were non-synonymous mutations ( Table 5). The most frequently nonsynonymous mutation in HPV E6 genes were T7220G (A) (5/18), which made D32E mutation. Eight nucleotide changes occurred in the HPV16 E7 genes with four were non-synonymous mutations. The most frequently observed non-synonymous in HPV16 E7 genes were C7791T (S63F) ( Table 5).
Fifteen HPV52 E6 and E7 genes were sequenced and thirteen sequences were identical ( Table 6). The most prevalent non-synonymous mutations in E6 genes was A379G (14/15) and cause the amino acid to change from Lysine to arginine (K93R). For E7 sequences, the high frequent mutations was C751T and A801G, both were synonymous mutations.

Discussion
Globally, cervical cancer is the fourth most common malignancy in females around world and contributes 530 000 new cases per year [1]. Persist infection with HR-HPV has been identified as a major risk factor for cervical cancer. The prevalence of high-risk HPVs infections, such as HPV16, HPV18, HPV52, increased with severity of cervical lesions [28]. In the present study, a retrospective survey of HPV infection among 9943 females who underwent gynecological outpatient clinic during 2015 to 2021 was conducted in a located hospital. Though it is a military hospital, it is open for non-military people.
The relationship between population age and HPV infection rate was investigated. Two peaks of infection rate were observed in the any type HPV, HR-HPV, LR-HPV, single and multiple HPV infection among ≤ 20 and 61-65 year-old females, which have been observed in other reports [15,29,38]. The first peak of HPV infection occurred in the ≤ 20 years group (31.5%, 17/54), with 9.3% (5/54) was LR-HPV and 27.8% (15/54) was HR-HPV infection. The high HPV infection rate was obvious in females aged ≤ 20 years was partly due to the limited sample numbers. On the other hand, high sexual activity and lack of immunity to HPV may contribute to the high HPV infection rate [39]. The second peak was observed in the 61-65 year-old groups (38.0%, 151/397), which consisted of 10.6% (42/397) LR-HPV infection and 33.5% (133/397) HR-HPV infection. It was assumed that viral persistence or reactivation of latent HPV due to the physiologic and immunologic deregulation caused by hormone fluctuations may  explain the high HPV infection rate around menopausal women [40]. The present study would assist in the formulation of preventive strategy for cervical cancer and more inspections, including cytology and even colposcopy, should be proceed among women aged > 60 years for the prevention of cervical cancer. The L1 protein is the major capsid protein and able to induce immune response [12]. Phylogenetic distance and amino variations of the L1 protein have an effect on the immune efficiency of HPV vaccines [11,41]. The uncontrolled expression of E6 and E7 proteins inactivates the p53 and pRb tumor suppressor proteins and is associated with the HPV persist infection [42]. HPV variants and nucleotide mutations have been suggested to affect the oncogenic potential of HPV persistent infection [22][23][24]43]. Thus, the L1, E6 and E7 sequences of the most predominant HPV (HPV16 and HPV52) were selected to study lineage phylogeny and the genetic polymorphisms. Based on the L1 genes, the predominant HPV16 sublineage in Henan province was A1 in 2021. In other areas in China, such as Beijing city, Zhejiang and Yunnan province, the sublineage A4 was the most common genotype [44][45][46][47]. It was reported the sublineage A4 were associated with more severity disease status than A1-3 sublineage in Chinese females and higher risk of cancer [22,23,48,49]. Compared with the reference (NC001526), four non-synonymous mutations were found in HPV16 L1 protein, including H76Q (1/27), N181T (7/27), E240D (1/27) and T266A (27/27). The amino mutations N181T (7/27) and T266A (27/27) were also found in other provinces, such as Shanghai and Sichuan province [47,50,51]. Synonymous mutations in L1 gene, including G6196A (19/27, 70.4%), A5803C (18/27, 66.7%) and T5683C (8/27, 29.6%), had also been reported in Sichuan province [51]. In the present study, the most prevalent HPV52 sublineage in Henan province was B2. It was reported that the HPV52 sublineage B2 predominated in Asian, while in Africa, Americas and Europe, lineage A was the most common lineage [52]. Compared with HPV52 lineage A, the B2 sublineage showed a higher risk [52].
For HPV16, the most frequent non-synonymous mutations found in E6 gene was D32E (7/18). Although the D32E mutation in E6 protein did not change the B-cell epitopes, the gene variation altered the other gene profiles [53,54]. It was suggested that the D32E amino mutation had a significant correlation with the persistent HPV16 infection in females [47]. The S63F was the most prevalent non-synonymous mutations in E7 genes. It was reported that the S63F mutation was more frequent in women with carcinoma cancer [44]. The reason was assumed that the S63F variation had an influence on the E7 epitopes and caused viral persistence and cervical cancer [44]. Compared with the HPV52 reference sequence (NC 001,592), the K93R was the only one non-synonymous mutation. The K93R mutation was also observed in other HPV52 isolates in China [55,56]. Though the K93R mutation did not increase the cell immortalization ability of HPV52, a higher colony formation and greater cell migration ability was observed when compared to HPV52 prototype [57]. The synonymous mutation C751T and A801G were observed in other report and the roles need be further studied [56].

Conclusion
In summary, the present study provides basic information about the distribution, genotypes and variations of HPV among females population in central China, which would assist in the formulation of preventive strategies and improvements of diagnostic probe and vaccine for HPV in this region.