Phylogenetic analysis of complete genome sequences of hepatitis B virus from an Afro-Colombian community: presence of HBV F3/A1 recombinant strain

Background Hepatitis B virus (HBV) infection is one of the most prevalent viral infections in humans and represents a serious public health problem. In Colombia, our group reported recently the presence of subgenotypes F3, A2 and genotype G in Bogotá. The aim of this study was to characterize the HBV genotypes circulating in Quibdó, the largest Afro-descendant community in Colombia. Sixty HBsAg-positive samples were studied. A fragment of 1306 bp (S/POL) was amplified by nested PCR. Positive samples to S/POL fragment were submitted to PCR amplification of the HBV complete genome. Findings The distribution of HBV genotypes was: A1 (52.17%), E (39.13%), D3 (4.3%) and F3/A1 (4.3%). An HBV recombinant strain subgenotype F3/A1 was found for the first time. Conclusions This study is the first analysis of complete HBV genome sequences from Afro-Colombian population. It was found an important presence of HBV/A1 and HBV/E genotypes. A new recombinant strain of HBV genotype F3/A1 was reported in this population. This fact may be correlated with the introduction of these genotypes in the times of slavery.


Introduction
Hepatitis B virus (HBV) infection is a relevant global health problem with 2 billion people that have been infected worldwide, including 350 million of them suffering from chronic HBV infection [1]. In Latin America, the estimated HBsAg seroprevalence ranges from 0.5% to 3.0%, with the total number of HBsAg carriers approaching 11 millions [2]. The highest prevalence surpassing 8.0 % is found among the native populations of the western Amazon basin, which includes Brazil [3], Colombia [4], Peru [5] and Venezuela [6].
HBV genome is a partially double-stranded circular DNA molecule of approximately 3,200 bp that encodes four overlapping open reading frames (ORFs) [7]. A genetic classification based on the comparison of complete HBV genomes has identified nine genotypes, A through I [8], that differ by at least 8% at nucleotide level from each other. Genotype A was initially identified in southern Africa [9]. Phylogenetic analysis of the complete genomes of subgenotype A1 isolates classified it in two clusters (African and Asian) [10]. The introduction of subgenotype A1 into Asia could be the result of movements along the East coast of Africa, from Somalia in the horn of Africa to the Arabic Peninsula in Asia [10]. Subgenotype A2, also denoted Ae, from "European" subgenotype, was isolated from South African carriers and is also found in Northern Europe and Greenland [11].
Genotypes B and C are predominant in East and Southeast Asia. Subgenotype D1 occurs mostly in the Mediterranean basin and Middle East. D2 has been reported in India, Japan, Europe and the United States. D3 was found in South Africa, Brazil, Rwanda, India, Costa Rica, Iran, Serbia and the United States. Finally, D4 was reported in Australia, South Africa, Somalia, Rwanda and Oceania [11].
Genotype E was first described in West Africa in high prevalence but in a surprisingly low diversity, as the mean diversity over the whole genome is 1.75%. HBV/E is the most prevalent genotype in western and central Africa [10]. In Quibdó, Colombia we previously reported for the fist time the presence of this genotype in nine cases [16]. Genotypes F and H are found in populations from Alaska to Central and South America [17,18].
The aim of the present study was to characterize the HBV genotypes circulating in Quibdó city, Colombia. We report for the first time the HBV complete genome sequences from Afro-Colombian infected people and inferred their origin using phylogenetic analyses approaches.

Study population
To evaluate HBV genotypes distribution in Quibdó, the largest Afro-descendant community in Colombia, 60 positive samples for the Hepatitis B virus surface antigen (HBsAg) were obtained from sera stored at −20°C in a public health laboratory in Quibdó, Colombia in 2007. This protocol was approved by the Ethical Committees from Pontificia Universidad Javeriana, Bogotá Colombia and School of Medicine, University of São Paulo, São Paulo, Brazil. Before HBV DNA extraction HBV DNA extraction was carried out from 100 μl serum using the acid guanidinium thiocyanate/phenol/chloroform method [19]. Briefly, 300 μL GT solution was added to each sample. Ice-cold chloroform (50 μL) was added, followed by homogenization and centrifugation. The supernatant was transferred to a conical tube and precipitated with 300 μL cold ethanol. After discarding the ethanol, samples were dried at 94°C for 1 min, resuspended in 50 μL ultrapure MilliQ water and stored at −20°C.

HBV PCR amplification
To characterize HBV genotypes, a fragment of 1306 bp partially comprising HBsAg and DNA polymerase coding regions (S/POL) was amplified by nested PCR using the primers PS3132F/2920R and PS3201F/P1285R [18]. After purification of the PCR product through ChargeSwitch PCR Clean-Up Kit, Sanger sequencing was performed using dideoxynucleotide triphosphates (ddNTPs) in Big Dye Terminator v3.1 Cycle Sequencing Ready Reaction kit -Applied Biosystems, Foster City, CA, USA). The electrophoresis was done in an ABI Prism 377 Automatic Sequencer (Applied Biosystems, Foster City, CA, USA). The quality of each electropherogram was evaluated using the Phred-Phrap software and consensus sequences were obtained by alignment of both sequenced strands (sense and antisense) using CAP3 software available at the web page Electropherogram quality analysis Phred (http://asparagin.cenargen. embrapa.br/phph).
Amplification of the whole HBV genome was performed with P1 and P2 primers described previously with slight modifications [20]. The quality of each electropherogram was evaluated as cited above.

HBV genotyping analysis
Sequences were genotyped by phylogenetic reconstructions using reference sequences from all HBV genotypes obtained from GenBank (n=412), comprising 1306 bp of partial HBsAg and DNA polymerase coding regions (S/POL). Complete genomes where also obtained from Genbank (n=192) and phylogenetic analyses were performed. All sequences were aligned using Muscle software [21] and edited with the SE-AL software (available at http://tree.bio.ed.ac.uk/software/seal/). The Bayesian Markov chain Monte Carlo (MCMC) simulation implemented in BEAST v.1.5.4 [22] was done to obtain the best possible estimates under both relaxed uncorrelated log normal and exponential molecular clock and using the model of nucleotide substitution (GTR+G+I). The molecular clock that best fitted the data was chosen by Bayes factor (BF) comparison. After 10 million generations, the maximum credibility tree (MCC) was obtained by summarizing the 10,000 substitution trees after removing a burn-in of 10% using Tree Annotator v.1.5.3 [22]. Phylogenetic trees were visualized and midpoint rooted in FigTree v1.2.2 (http://tree.bio.ed.ac.uk/software/ figtree/).

Results
Of the 60 HbsAg-positive samples, 29 (48.3%) were positive by nested PCR for S/POL region and among them, 23 were obtained with good quality sequenced for phylogenetic analysis (Figure 1). The distribution of HBV genotypes in these 23 samples was: A1 (52.17%), E (39.13%), D3 (4.3%) and F3/A1 (4.3%). Nine genotype E sequences generated in this study from the same community were previously published [16] since they represented an exclusively African HBV genotype circulating in South America. Due to a small sample size, these results were considered inconclusive about the prevalence of these genotypes in the population.
This work shows for the first time globally a recombinant strain of HBV genotype F3/A1 with 1306bp. The break point was at position 941 of HBV genome (POL region). The analysis using the Simplot program and bootscanning analysis confirmed this complex recombination. The sequence was compared with a consensus sequence of each HBV genotype (A-H) in order to identify the breakpoints. The analysis was carried out using a window size of 200 bp, a step size of 20 bp, 100 bootstrap replicates, gapstrip on and neighbour-joining analysis. The break point at position 941 of HBV genome (POL region) was in the codon ATT (C) (HBV/F3) → ACT (C) (HBV/A1) ( Figure 2). Furthermore, we obtained for the first time seven HBV Colombian complete genome sequences from A1, E and D3 genotypes ( Figure 3). Complete genome HBV sequences were deposited at the GenBank under accession numbers: JQ023660-JQ023666.

Discussion
These results also showed that the distribution pattern of HBV genotypes and/or genotypes variation could be different in Colombia based on the region studied since unlike in Quibdó, a previous study in Bogota found that the most frequent subgenotype was the subgenotype F3 followed by A2 and G [18]. The presence of genotype E in more than two thirds of cases studied herein highlights the importance to carry out larger studies in Quibdó population to ascertain if this genotype is widespread in this region.
Such results could provide additional knowledge on the history of HBV/E around the world and further clarify molecular chronometer of such viral infections among humans. Based on our recent publication on genotype E in this afro-Colombian community [16], the TMRCA data suggested a recent infection; however such high frequency of infection in a population does not seem to match a recent infection. Further investigations are required to correlate these findings.
In Africa, viruses belonging to five genotypes, A (HBV/A) to E (HBV/E), have been found. Subgenotype A1 was identified in HBV isolates from South Africa using phylogenetic analysis of preS2/S sequences and confirmed by analysis of complete genomes from South Africa and Malawi [9]. This subgenotype has also been found in Somalia, Zimbabwe, Kenya, Rwanda, Philippines, India and Nepal and Yemen [10]. In South America, previous studies have shown that subgenotype A1 was the most frequent in Brazilian population [23]. While in Argentina there is a low prevalence of this subgenotype [24], in Haiti more than 90% of the population descended from African Slaves has the A1 as the most common subgenotype [25]. While this suggest that subgenotype A1 in South America is probably of similar origin, the subgenotype A1 in Colombia reported in this (See figure on previous page.) Figure 1 The Maximum Clade Credibility (MCC) tree was estimated by a Bayesian analysis from a larger dataset comprising 413 sequences with 1306 nucleotides of S and Polymerase HBV region of the different subgenotypes. The posterior probabilities of the key nodes are show above the respective clusters. The cluster containing the strains of other HBV subgenotypes were collapsed. study though grouped in the same cluster, seems so suggest otherwise since the posterior probability was low (0.21), thus indicating a distant genetic relatedness.
An important result of this work was the presence of recombinant strain between subgenotype F3 and A1 found in a sixteen years old girl. Since this girl was pregnant when this sample was collected, is probably that sexual contact was the way of HBV transmission. Several HBV recombinants inter genotypes have been reported around the world. In Bolivia, South America, several recombinants A/D, C/B, D/C and F/C have been detected [26]. Furthermore, recombination within and between genotypes created complex patterns and altered the cladistic structure of HBV genotypes [27]. For example, in previous study the B2 subgenotype proved to be a hybrid of genotypes B and C [28]. Unfortunately, we had no success on full-length genome amplification of our recombinant strain. Thus, although we know that the break point was in the polymerase region, we cannot exclude the possibility with of existence of other recombination points in this virus. In sum, this is the first HBV recombinant strain reported in Colombia.
Here, we present a study of the first seven complete HBV genomes of Colombian population. The results obtained from E and A1 genotypes support the theory that HBV may have been was introduced into this Afro-descendent community in Colombia in the times of slavery.
(See figure on previous page.) Figure 3 The Maximum Clade Credibility (MCC) tree was estimated by a Bayesian analysis of 192 complete genome sequences of HBV strains. The posterior probabilities of the key nodes are shown above the respective nodes. The HBV Colombian complete genome sequences (n = 7) were analyzed together with other strains from around the world. The cluster containing the strains of other HBV subgenotypes were collapsed.