Complete genome analysis of hepatitis B virus in Qinghai-Tibet plateau: the geographical distribution, genetic diversity, and co-existence of HBsAg and anti-HBs antibodies

Background The genetic variation and origin of Hepatitis B Virus (HBV) in Qinghai-Tibet Plateau were poorly studied. The coexistence of HBsAg and anti-HBs has been described as a puzzle and has never been reported in the indigenous population or in recombinant HBV sequences. This study aimed to report geographical distribution, genetic variability and seroepidemiology of HBV in southwest China. Methods During 2014–2017, 1263 HBsAg positive serum were identified and 183 complete genome sequences were obtained. Serum samples were collected from community-based populations by a multistage random sampling method. Polymerase chain reaction (PCR) was used to amplify the HBV complete genome sequences. Then recombination, genetic variability, and serological analysis were performed. Results (1) Of the 1263 HBsAg positive serum samples, there were significant differences between the distribution of seromarkers in Tibet and Qinghai. (2) Of 183 complete genome sequences, there were 130 HBV/CD1 (71.0%), 49 HBV/CD2 (26.8%) and four HBV/C2 isolates (2.2%). Serotype ayw2 (96.1%) was the main serological subtype. (3) Several nucleotide mutations were dramatically different in CD1 and CD2 sequences. Clinical prognosis-related genetic variations such as nucleotide mutation T1762/A1764 (27.93%), A2189C (12.85%), G1613A (8.94%), T1753C (8.38%), T53C (4.47%) T3098C (1.68%) and PreS deletion (2.23%) were detected in CD recombinants. (4) From the inner land of China to the northeast boundary of India, different geographical distributions between CD1 and CD2 were identified. (5) Twenty-seven (2.14%) HBsAg/HBsAb coexistence serum samples were identified. S protein amino acid mutation and PreS deletion were with significant differences between HBsAg/HBsAb coexistence group and control group. Conclusions HBV/CD may have a mixed China and South Asia origin. Based on genetic variations, the clinical prognosis of CD recombinant seems more temperate than genotype C strains in China. The HBsAg/HBsAb coexistence is a result of both PreS deletion and aa variation in S protein. Several unique mutations were frequently detected in HBV/CD isolates, which could potentially influence the clinical prognosis.


(Continued from previous page)
Conclusions: HBV/CD may have a mixed China and South Asia origin. Based on genetic variations, the clinical prognosis of CD recombinant seems more temperate than genotype C strains in China. The HBsAg/HBsAb coexistence is a result of both PreS deletion and aa variation in S protein. Several unique mutations were frequently detected in HBV/CD isolates, which could potentially influence the clinical prognosis.
Keywords: Hepatitis B virus, Mutation/mutation rate, Recombination, Hepatitis B surface antigen, Antibody to HBsAg Background Hepatitis B Virus (HBV) is considered to be a major global health problem with more than 250 million chronic HBV (CHB) carriers and more than one million HBVassociated human deaths per year [1]. In the nationwide investigation in 2006, Hepatitis B surface antigen (HBsAg) was identified in 7.2% of the whole population in China [2]. In Qinghai-Tibet Plateau, due to high altitude environment, religion issue, delayed vaccine inoculation or other unknown reasons, the HBV prevalence is over 10% according to our recent study (unpublished), which is much higher than in other areas of China. These carriers of HBV are at increased risk of developing liver cirrhosis (LC) and hepatocellular carcinoma (HCC) [3].
Qinghai-Tibet Plateau covers more than 2.5 million square kilometers, which is the second-largest plateau in the world and connecting south and northeast Asian [4]. In this area, a special recombinant of HBV genotype C and genotype D (HBV/CD) was reported [4]. However, the nature and origin of the recombinant are still poorly studied. At the same time, the coexistence of HBsAg and anti-HBs (HBsAb) in CHB patients has been occasionally reported [5,6]. Due to the low incidence of HBsAg/ HBsAb coexistence in HBV carriers, the sample sizes in many previous studies [5,7] were small (less than 20) and lack of adequate control subjects [8]. The mechanism underlying the emergence of anti-HBs in CHB patients remains unclear and related information has not been reported in the indigenous population or in recombinant HBV sequences.
To gain deeper insights into HBV genomic diversity of the special recombinants, 1263 HBsAg positive serum samples were obtained from eight areas of Qinghai-Tibet Plateau and 183 complete genome sequences of HBV isolates were under further analysis. Twenty-seven HBsAg/HBsAb coexistence serum samples were identified and studied with genome variation.

Methods
Procedures to detect point mutations and recombination of HBV by PCR and direct sequencing analysis Sera samples from Qinghai-Tibet plateau → HBV infection serological markers detection→ DNA purification → HBV complete genome amplification → PCR-DNA purification → Automatic sequencing → Sequences assembling → Comparing with reference sequences.

Subject sample collection
During 2014-2017, the subjects were collected from community-based populations from eight regions of Qinghai-Tibet Plateau based on the population density and covered most of the indigenous habitations, including Hainan, Lhasa, Shannan, Nyingchi, Ali, Nakqu, Chamdo and Rikaze. In this study, the multistage random sampling method was used to ensure the sample representativeness of the whole area. Firstly, two or three counties were selected at random from eight areas, respectively. Secondly, two villages were selected from every county. Thirdly, populations of 18-59 years old were selected from every village. Basic information was recorded on the questionnaire prepared beforehand, including name, gender, birth date, address, and medical information, and 5 mL of venous blood was taken from each participant. HBV infection markers including hepatitis B surface antigen (HbsAg), anti-hepatitis B surface antibody (HBsAb), anti-hepatitis B core antibody (HBcAb), hepatitis B e antigen (HBeAg) and anti-hepatitis B e antibody (HBeAb) were detected by chemiluminescent assays (AXSYM; Abbott Laboratories, North Chicago, IL, USA).
HBV DNA extraction, whole-genome amplification, and sequencing HBV DNA was extracted from 200 μL serum using QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany). Full-length HBV DNA (about 3.2 kb) was amplified by Nested PCR finally performed in seven fragments. Based on the previously reported methods [9] used in our preliminary study [10], the primers and thermal profile were optimized in this study to adapt to HBV/CD isolates and identify more complete genomes. The basal core promoter (BCP) region was generated by two rounds. First round was conducted using the primer combination of BcpF1 and BcpR1 in a 25 μL reaction volume containing 5 μL extracted DNA and 12.5 μL premix Taq polymerase. The whole-genome (without BCP region) consisting of six fragments was also generated by two rounds of PCR. First round was conducted using the primer combination of HBV1799FLong and HBV1801RLong in a 50 μL reaction volume containing 15 μL extracted DNA and 25 μL premix Taq polymerase. All the primers and thermal profiles were listed in Table 1.
After purification of the PCR products with a QIA Gel Extraction Kit (Qiagen, Valencia, CA), the sequences were determined using the Sanger dideoxy terminator sequencing method with DNA sequencer ABI 3700 (PE Applied Biosystem). Then the sequences were assembled using SeqMan II software (DNAStar Inc.), and the correct nucleotide position of the complete HBV sequence was revised through alignment with the reference sequence in the Gen-Bank database.
The whole-genome nucleotide sequences reported in this article have been deposited in the National Center for Biotechnology Information GenBank database with accession numbers MN683570-MN683729, MN657315-MN657318, KX660674-KX660690.

Recombination and serological analysis
Genotypes and recombination in genome sequences of HBV were investigated using SimPlot v3.5.1 software and JPHMM (jumping profile Hidden Markov Model) method [11]. MEGA 7.0 software was used to calculate genetic distance and pairwise distance comparisons. The HBsAg serotypes were deduced from the sequences of the S gene region by the identification of amino acids at positions 122 (Lys-Arg for d-y determinants), 160 (Lys-Arg for w-r determinants), 127 (Pro-Thr-Leu/Ile for w2-w3-w4), and in the case of Arg122 Pro127 Lys160, also at positions 159 (Ala-not Ala for ayw1-ayw2 and ayw4) and 140 (not Ser-Ser for ayw2-ayw4) [12].

Mutation analysis
Generally, the mutation definition and analysis were performed as previously described [13]. The whole genome, including S gene, PreC/C gene, reverse transcriptase (RT), X gene and basal core promoter (BCP) were under Table 1 List of primers used to amplify the different regions of the HBV genome and their respective thermal profile *Numbers within primer names represent the primer positions. An F after the primer position stands for sense primers while R stands for anti-sense primers **SP6 and T7 are tag sequences attached at the 5′ end of PCR primers used in this study,except four Bcp primers. SP6 and T7 primers were used to sequence PCR fragments nucleotide mutation analysis. Base on the structure of HBV/CD recombinants, genotype C and genotype D reference sequences were used as parental sequences for mutation detection [14,15]. Nucleotide sequences of datasets in this study and reference HBV sequences were analyzed using the MEGA 7.0 software and Mutation Reporter Tool [16]. The nucleotide PreS/HBsAg sequences obtained were translated into amino acid sequences, aligned and compared with reference sequences. Amino acid variability was defined as the frequencies of residue substitutions at each position [17]. For analysis, HBsAg was divided into subregions corresponding to structural and/or functional domains: the N-terminal region (aa 1 to 99), the major hydrophilic region (MHR, aa 100 to 169) and the C-terminal region (aa 170 to 226). The "a" determinant in MHR (aa 124 to 147), the first loop of "a" determinant (aa 124 to 137) and second loop of "a" determinant (aa 139 to 147) were also analyzed respectively.

Statistical analysis
Statistical analyses were performed using SPSS 22.0 (IBM Corp., Armonk, NY, USA). Chi-squared test and two-tailed Student's t-test were used for analysis, as appropriate. A P-Value of ≤0.05 was considered as statistically significant.

Ethical approval
The study protocol conformed to the ethical guidelines of the 1975 Declaration of Helsinki and was approved by the Ethics Committee of the Chinese Center for Disease Control and Prevention. The purpose of the study and the right to information were explained to the participants by research staff. Written informed consent was obtained from each participant before the interview and venous blood collection.

Patient characteristics
During 2014-2017, 1263 HBsAg positive serum samples were identified from the community population of Qinghai-Tibet plateau. There were no differences in age (29.29 ± 17.93 vs 30.56 ± 18.98, P = 0.211) and gender (449/404 vs 201/210, P = 0.206) distribution between study population in Tibet and Qinghai. Of all the 1263 positive serum samples, twenty-seven serum samples (2.14%) were HBsAg/HBsAb coexistence. There were significant differences between the distribution of seromarkers in Tibet and Qinghai. Details of serological information are shown in Supplement Table 1.
One hundred and eighty-three samples were selected from the 1263 HBsAg-positive serum samples and then full-length sequences were amplified for analysis of the HBV genome. A total of 87 samples were HBeAg-negative, and 96 samples were HBeAg-positive. Most isolates had a genome size of 3215 nt, except eight sequences with evidence of deletion. Basic information of HBV/CD complete genome sequences was listed in Table 2.

Recombination and serological results
There were two types of HBV/CD recombinants detected in this study. HBV/CD1 was with around nt10-800 from genotype D integrated into genotype C to form the recombinant viral strain; HBV/CD2 was with a region around nt10-1500 from genotype D integrated into genotype C to form the recombinant viral strain. Graphical display of JPHMM and Simplot recombination analysis results were shown in Fig. 1. Of all the 183 HBV complete sequences, there were 130(71.0%) HBV/CD1 isolates, 49 (26.8%) HBV/CD2 isolates and four (2.2%) HBV/C2 isolates. As the results of evolutionary divergence over sequences showed, the genotype D fragment of HBV/CD is closest to HBV subgenotype D4, while the genotype C fragment of HBV/CD is closest to the subgenotype C2 (Supplement Table 2 and 3).
Geographical distribution of HBV subgenotypes in the plateau HBV/CD1 recombinant sequences were predominantly identified in all the geographic regions analyzed in this study. However, most of the HBV/CD2 isolates (93.9%, 46/49) were identified in two regions (Shannan & Rikaze), which was significantly more than the other six regions in the plateau (P = 0). Four genotype C isolates were all identified in the eastern part of the plateau. The distribution of HBV subgenotypes in the plateau is shown in Fig. 2 and Supplement Table 4.  Table 3 and Fig. 3.
Compare to reference sequences of genotype D and genotype C, several nucleotide (amino acid) positions changed in nearly all the HBV/CD1 and HBV/CD2 sequences, such as A942T(aaL613QH for HBV/CD1 and aaH613K for HBV/ CD2), T1485A and T3210A(aaS272TN) in P gene, T1485C(aaS38P) in X gene.

Amino acid substitution in PreS/S region
One hundred and seventy-nine HBV CD recombinants with complete genome sequences were under analyses of amino acid substitution in PreS/S region. Amino acid substitution of 27 HBsAg+/HBsAb+ strains (Group I) were compared with 152 HBsAg+/HbsAb-strains (Group II). The distribution of different recombination type (HBV/CD1 and HBV/CD2, P = 0.677) and HBeAg status (P = 0.213) had no significant difference between Group I and Group II. DNA level of the 179 serum samples were all above 5 log10 and had no significant difference between Group I and Group II (P = 0.355). Significant aa substitution diversity was observed within S gene of HBV between Group I and Group II (1.03 vs. 0.39, for substitution per 100 aa, the same below, P < 0.001). Moreover, the aa variabilities in MHR (P < 0.001), "a" determinant (P < 0.001), the first loop (P < 0.001) and the second loop (P < 0.001) of "a" determinant were all more variable in Group I than in Group II. The frequency of PreS deletion was 2.23% (4/179) and also with significant differences between the two groups. Details were listed in Table 4 and Fig. 4.

Discussion
HBV genotypes are related to the severity of liver disease and response to clinical therapy [18]. Compare to other genotypes, HBV genotype C and genotype D carry a higher lifetime risk of liver cirrhosis and hepatocellular carcinoma development [19]. It is believed that recombination can exert an influence on clinical important properties more dramatically than the steady accumulation of natural mutations, which suggests the potential pathogen significance of the HBV/CD recombinants [20]. As far as we know, there was no detailed molecular epidemiology or genetic variability study carried out based on a large number of HBV/CD recombinant complete genome sequences.
In this study, HBV/CD recombinant was the main genotype (179/183, 97.81%) isolated in the plateau. This result is different from recent but smaller sample size reports in Tibet that genotype C and genotype D were the most dominant HBV genotypes [21,22]. The HBV genotypes showed a distinct geographical distribution all over the world [19]. In both HBV CD1 and CD2 recombinant genomes, the 'C fragment' and 'D fragment' were genetically close to subgenotype C2 and D4, respectively. HBV/D4 isolates were mainly reported in South America and the Pacific islands [14]. The only report of HBV/D4 in Asia was in northeast India [23]. Specifically, subtype C2 was the main genotype in central and north China [24], so Qinghai-Tibet Plateau was found at the presumed geographical junction to the distribution of these two HBV subgenotypes (Fig. 2). In this study, recombinants CD1 and CD2 had significant different geographic distribution in the plateau. It could also partly explain the different distribution of seromarkers in Tibet and Qinghai. HBV/CD2 isolates were mainly identified in two regions (Shannan & Rikaze) which are close to the boundary of northeast India (Fig. 2). Genotype D fragment was in around nt10-800 of HBV/CD1 and around nt10-1500 of HBV/CD2 (Fig. 1). Then both CD1 and CD2 have the same initial recombination site at nt10. Based on the facts mentioned above, it can be inferred that these two types of recombinations might have a mixed China (subgenotype C2) and Indian (subgenotype D4) origin.
Though lots of HBV genotype D or genotype C isolates were reported in India [23,[25][26][27], no HBV/CD recombinant strain was reported in south Asia until now. Actually, nearly all the HBV/CD recombinants in this study and in previous studies scattered in indigenous of high altitude areas, such as Qinghai-Tibet Plateau, Yunnan-Kweichow Plateau [28], Loess Plateau [29] and Mongolia Plateau (Accession Number: AB270534, AB270534). A possible reason for this phenomenon might be due to the genetic background or migration history of highlanders.  As the diagnosis information in these rural areas is still not detailed enough to analyze the clinical feature of these HBV recombinants, the complete genome features in this study offer us a way to estimate the clinical prognosis of these HBV strains in the indigenous population. Serotype awy2 was the predominant serotype of HBV/ CD in this study, as same as the results of genotype D [30]. It indicates that the main serological character had not dramatically changed after the recombination. Previous studies reported that A1762T/G1764A double mutation in BCP (nt.1742-1849) was the strongest viral factor associated with the development of liver disease [20,31]. Other mutations, such as T53C, G1613A, C1653T, T1753C, A2189C, T3098C and PreS deletions were also reported associated with clinical progress [32][33][34][35][36][37][38]. In this study, the A1762T/G1764A double mutations were observed in 27.93% in HBV CD recombinant sequences. These frequencies were less than previous reports of genotype C in CHB carriers [39], which indicates a lower risk of HCC in the population. The same result was also found in frequencies of mutation A2189C (12.85%), G1613A (8.94%), T1753C (8.38%), C1653T (5.59%), T53C (4.47%), T3098C (1.68%) and PreS deletion (2.23%, 4/179), which were also lower than mutation frequencies of genotype C in CHB patients in Asia [32,[34][35][36][37][38][39][40]. This indicates the clinical progress of CD recombinant seems genetically more temperate than genotype C which had caused the most liver disease and related death in China.
There were several mutations frequently identified in PreC/C and PreS/S regions, which potentially influence the clinical prognosis. However, many of these prevalent nucleotide mutations or amino acid substitutions were unique in the CD recombinants, which were not previously reported in genotype C or other genotypes of HBV genome. The function and influence of these mutations need further analysis.
The prevalence of anti-HBs coexistence was 2.14% in the population of HBsAg-positive patients. The coexistence frequency is lower than previous reports [5][6][7]41], suggesting possible geographical variability or special recombinant characteristics in these statistical differences. HBeAg status and DNA level of the serum were the focus of the argument about comparability of the HBsAg/HBsAb coexistence studies [8]. In this study, no differences were identified in the distribution of DNA level, HBeAg positivity or recombination types between Group I and Group II. And the DNA level of serum samples in Group I was all above 5 log10 copies/mL. The aa variability of S region in Group I was  Fig. 3 Distribution of wild type and nucleotide mutations (amino acid substitutions) in HBV/CD1 and HBV/CD2 genome. Each bar represents the percentage of isolates with mutated nucleotide (amino acid residues) in CD1 and CD2 recombinants significantly higher than that in Group II (Table 3, P < 0.001), which was consistent with previous reports [41].
In further analysis of each segment in the S region, the aa variability in Group I was significantly increased in N-terminal, MHR and C-terminal, compared with Group II (P < 0.001, P < 0.001, P = 0.013, Fig. 3).
The "a" determinant in MHR is the hot zone of HBsAg/HBsAb coexistence studies, which is located in the hydrophilic region between aa124 and aa147 and act as the most important antigen-binding site in the S region of each HBV serotypes [17,41]. In this study, compared with Group II, the aa variability in the "a" determinant of Group I increased significantly (1.54% vs 0.06%, P < 0.001). This region is composed of two stemloop structures, which act as two binding sites of monoclonal antibodies and are of great significance for the effectiveness of hepatitis B vaccine and the clinical detection of antigens. In this study, the aa variation of the first and second loop of the "a" determinant were both statistically correlated with HBsAg/HBsAb coexistence, which consistent with previous studies [5,17]. There were different opinions about the effect of the second loop [42] or even the entire S protein variablity [7] on the HBsAg/HBsAb coexistence. However, former studies were about genotype B and genotype C [42], while HBV/CD recombinant was the main genotype in this study and the "a" determinant was located in genotype D fragment (Fig. 1). It may be due to the amino acid variation background in different genotypes. Moreover, due to the low incidence of HBsAg/ HBsAb coexistence in all HBV genotypes, the sample sizes in previous studies were less than 20 [5,7], and the insufficient number of samples may affect the stability of statistical results.  The association between aa variation of S protein and HBsAg/HBsAb coexistence has been reported in many studies [5,6,17,41,42]. However, the aa mutations in any part of S protein is not a sufficient and necessary condition for the explanation of HBsAg/HBsAb coexistence enigma [8], so there should be at least one auxiliary or secondary condition in the emergence of the coexistence. Previous studies suggested that mutations or deletions in the PreS region were the reason for the coexistence of HBsAg/ HBsAb [43]. The significant difference was also found in the distribution of deletions in PreS region between Group I and Group II (Table 3), suggesting the change of PreS region is also associated with coexistence. Interestingly, the frequency of PreS deletion in HBV/CD recombinants was 2.23%, which is lower than the frequency in other genotypes (4.9%) [7]. Combine with the fact that the frequency of HBsAg/ HBsAb coexistence is 2.14% in this study, which is also lower than other genotypes [5][6][7]41]. PreS deletion may act as a genetic variation background which occasionally affects the interaction between antigens and antibodies, together with the aa mutation in MHR to cause the coexistence.
This study has several limitations. First of all, multiple infections, minor populations of immune escape variants in viral quasispecies may not be identified in this study by PCR product sequencing. Secondly, the combined action of PreS deletion and aa variation in MHR need further support. Finally, the result would be more convincing if we could have a larger sample size of HBsAg/HBsAb coexistence, though 27 HBsAg/HBsAb coexistence in this study is more than most of the previous studies.

Conclusions
In summary, this study describes the geographical distribution, genetical variability and HBsAg/HBsAb coexistence phenomena of HBV isolates in Qinghai-Tibet plateau. HBV/CD recombinant has become the predominant genotype in Qinghai-Tibet Plateau. There were signs that HBV/CD had a mixed China and South Asia origin. PreS deletion and aa variation in S protein may cause the HBsAg/HBsAb coexistence together. Several unique nucleotide mutations were frequently detected in HBV/CD isolates, which could potentially influence the clinical prognosis.