Characterization and frequency of a newly identified HIV-1 BF1 intersubtype circulating recombinant form in São Paulo, Brazil

Background HIV circulating recombinant forms (CRFs) play an important role in the global and regional HIV epidemics, particularly in regions where multiple subtypes are circulating. To date, several (>40) CRFs are recognized worldwide with five currently circulating in Brazil. Here, we report the characterization of near full-length genome sequences (NFLG) of six phylogenetically related HIV-1 BF1 intersubtype recombinants (five from this study and one from other published sequences) representing CRF46_BF1. Methods Initially, we selected 36 samples from 888 adult patients residing in São Paulo who had previously been diagnosed as being infected with subclade F1 based on pol subgenomic fragment sequencing. Proviral DNA integrated in peripheral blood mononuclear cells (PBMC) was amplified from the purified genomic DNA of all 36-blood samples by five overlapping PCR fragments followed by direct sequencing. Sequence data were obtained from the five fragments that showed identical genomic structure and phylogenetic trees were constructed and compared with previously published sequences. Genuine subclade F1 sequences and any other sequences that exhibited unique mosaic structures were omitted from further analysis Results Of the 36 samples analyzed, only six sequences, inferred from the pol region as subclade F1, displayed BF1 identical mosaic genomes with a single intersubtype breakpoint identified at the nef-U3 overlap (HXB2 position 9347-9365; LTR region). Five of these isolates formed a rigid cluster in phylogentic trees from different subclade F1 fragment regions, which we can now designate as CRF46_BF1. According to our estimate, the new CRF accounts for 0.56% of the HIV-1 circulating strains in São Paulo. Comparison with previously published sequences revealed an additional five isolates that share an identical mosaic structure with those reported in our study. Despite sharing a similar recombinant structure, only one sequence appeared to originate from the same CRF46_BF1 ancestor. Conclusion We identified a new circulating recombinant form with a single intersubtype breakpoint identified at the nef-LTR U3 overlap and designated CRF46_BF1. Given the biological importance of the LTR U3 region, intersubtype recombination in this region could play an important role in HIV evolution with critical consequences for the development of efficient genetic vaccines.


Background
The immense genetic variability of HIV-1 viruses is considered the key factor that frustrates efforts to halt the virus epidemic and poses a serious challenge to the development and efficacy of vaccines. Like other human positive-sense RNA viruses, HIV has a high mutation rate as a result of the error-prone nature of their reverse transcriptase (3 × 10 -5 mutations per nucleotide per replication cycle) [1,2]. This high rate of mutation coupled with the increased replication capacity of the virus (10.3 × 10 9 particles per day) [3], allows for the accumulation and fixation of a variety of advantageous genetic changes in a virus population, which are selected for by the host immune response and can resist newly evolving host defense. Recombination is another potential evolutionary source that significantly contributes to the genetic diversification of HIV by successfully repairing defective viral genes and by producing new viruses [4]. To date, HIV-1 viruses are classified into four phylogenetic groups: M, O, N and P, which most likely reflect four independent events of cross-species transmission from chimpanzees [5][6][7]. The M group (for main), responsible for the majority of viral infection worldwide, is further subdivided into nine subtypes (A-D, F-H, J and K), among which subtypes A and F have been further classified into two sub-subtypes [5]. Moreover, early sequencing studies have provided evidence of recombination between genomes of different HIV subtypes [8,9]. Such interclade recombinant strains are consistently reported from regions where two or more clades are predominant. Recombinant strains from at least three unlinked epidemiological sources, which exhibit identical mosaic patterns, have been classified separately as circulating recombinant forms (CRFs) [10,11]. Currently, there are more than 40 defined CRFs http://www.hiv.lanl.gov that are epidemiologically important as subtypes [12]. In addition to the known CRFs, a large number of unique recombinant viruses, which are called unique recombinant forms (URFs), have been characterized worldwide [13]. Together, CRFs and URFs account for 18% of incident infections in the global HIV-1 pandemic [12]. HIV-1 subtypes, CRFs and URFs show considerably different patterns of distribution in different geographical regions [12,14].
In Brazil, the number of persons living with HIV reached an estimated number of 730,000 cases at the beginning of 2008 (2008 Report on the Global AIDS Epidemic). Like in other European countries and in North America, HIV-1 subtype B is a major genetic clade circulating in the country. However, the existence of other subtypes such as F1, C, B/C and B/F, has been consistently reported [15][16][17][18][19][20][21][22][23]. Data from recent studies of the near full length genomes (NFLG) of HIV have provided evidence of Brazilian CRF strains designated as CRF28_BF, CRF29_BF, CRF39_BF, CRF40_BF and CRF31_BC [17,[24][25][26]http://www.hiv.lanl.gov/content/ sequence/HIV/CRFs/CRFs.html.
In 2006, Thompson and colleagues [27] published two NFLG of similar BF1 mosaic viruses from patients in Rio de Janeiro 94BR-RJ-41 (GenBank: AY455781) and 99UFRJ-16 (GenBank: AY455782). Here, we describe the HIV-1 NFLG of an additional six isolates with similar BF1 mosaic genomes from patients without evidence of direct epidemiological linkage.

Study population
The six samples reported in this study were from individuals residing in São Paulo in the southeast region of Brazil and considered the most populous city in South America. The rationale for selection of these samples has been previously reported [28]. The data, including age, gender, number of CD4-positive T cells, and viral load were obtained from medical records and shown in Table 1. No evidence of direct epidemiological linkage could be established.

Amplification and sequencing of HIV-1 DNA
The genomic DNA used for the PCR analyses was extracted using the QIAamp blood kit (Qiagen) according to the manufacturer instructions. The NFLGs from five overlapping fragments were obtained by PCR using the Platinum Taq DNA polymerase (5 U/μl) (Invitrogen) and determined by a previously reported method [16,17]. To rule out the possibility of Taq-generated recombinants, an additional PCR product of 670 bp, which spans most of the viral LTR, was generated in separate PCR reactions using previously described primers and conditions [29]. All amplification reactions were done in duplicate to eliminate PCR artifacts, ensuring that sequenced NFLG were not assembled from heterogeneous DNA targets. To test for PCR carry over contamination, extraction and PCR negatives were run in each experiment. Both complementary DNA strands from each amplicon were directly sequenced by cycle sequencing using a variety of internal primers, BigDye terminator chemistry and Taq polymerase on an automated sequencer (ABI 3130, Applied Biosystems Inc., Foster City, CA), essentially according to the protocols recommended by the manufacturer. Fragments for each amplicon were assembled into contiguous sequences on a minimum overlap of 30 bp with a 97-100% minimal mismatch and edited using the Sequencher program 4.7 (Gene Code Corp., Ann Arbor, MI).

Screening for recombination events and identification of breakpoints
Sequences were screened for the presence of recombination patterns by the jumping profile Hidden Markov Model (jpHMM) [30] and further confirmed using the bootscanning method [31] implemented by SimPlot  [32]. The following parameters were used in this method: window size, 250 bp; step size, 20 bp; the F84 model of evolution (Maximum likelihood (ML)) as a model to estimate nucleotide substitution; transition\transversion ratio, 2.0; and a bootstrap of 100 trees. In addition, the significant threshold for the bootscan was set at 90%. The alignment of multiple sequences, including reference sequences representing subtypes A-D, F-H, J and K http://hiv-web.lanl.gov, were performed by the CLUSTAL X program [33] followed by manual editing in the BioEdit Sequence Alignment Editor program [34]. Gaps and ambiguous positions were removed from alignment. Positions of crossover sites were defined based on the distribution of informative sites supporting the two incongruent topologies that maximize the χ 2 value [35], a method implemented in Simplot.

Phylogenetic tree analysis
Phylogenetic relationships between the individual sequence types were determined by two methods: the neighbor-joining (NJ) algorithm of MEGA v.4 [36] and the ML of PHYML v.2.4.4 [37]. For NJ, trees were constructed under the maximum composite likelihood substitution model and bootstrap resampling was carried out 1000 times for analysis by the MEGA software. ML phylogenies were constructed using the GTR + I + G substitution model and a BIONJ starting tree. Heuristic tree searches under the ML optimality criterion were performed using the NNI branch-swapping algorithm.
The approximate likelihood ratio test (aLRT) based on a Shimodaira-Hasegawa-like procedure was used as a statistical test to calculate branch support. Comparison of tree topologies between subgenomic regions was performed using the algorithm described by Nye et al [38]. Trees were displayed using the program MEGA v.4 package. The nucleotide similarities were estimated using the maximum composite likelihood model implemented by MEGA v.4 software.

Recombinant Analysis
A total of six strains (06BR FPS561, 07BR FPS625, 07BR FPS742, 07BR FPS783, 07BR FPS810, and 07BR FPS812) preliminarily classified as subclade F1 by sequence analysis of a partial pol region were corroborated by further phylogenetic analysis of the complete coding sequences and part of the LTR region. Analysis of the proviral NFLGs revealed all isolates retain intact reading frames for a majority of their genes and no gross deletions or rearrangements were observed. The NFLG sequence from each strain was initially investigated using jpHMM which showed them to display identical mosaic structures with a single intersubtype breakpoint identified at the nef-U3 overlap (HXB2 position 9347-9365). The recombinant genomes essentially consisted of subclades F1 and B as parental sequences. Fragments identified as subclade F1 were found to cover almost all of the genome coding regions while fragment classified as subtype B consisted of a short sequence comprising the last part of the 3' LTR. Furthermore, the analysis also revealed that all the six isolates had a mosaic sequence pattern nearly identical to the previously published Brazilian BF1 isolates 94BR-RJ-41 (GenBank: AY455781) and 99UFRJ-16 (GenBank: AY455782). Based on these preliminary analyses, we reanalyzed all six sequences using the bootscanning method with three different subtype reference sequences (subtype B, F and C) obtained from the full-length alignment of the HIV sequence database http://hiv-web.lanl.gov. In agreement with the results obtained by jpHMM, bootscanning analysis confirmed similar mosaic structures with almost identical breakpoint positions within these six isolates ( Figure 1). The BF1 intersubtype transitions were estimated at nucleotides 9347-9365, based on the HIV HXB2 numbering system, by mapping the informative site and χ 2 maximization. To further test for recombination, ML phylogenetic trees were inferred for the regions of nucleotide sequence on either side of the breakpoints detected by bootscan method (Figure 1). This analysis corroborates the results from the bootscan and thus provided unambiguous evidence for a single recombination event supported by high aLRT values among the six isolates.
To rule out the possibility of Taq-generated recombinant artifacts, an additional PCR product of 670 bp covering most of the viral LTR was generated in a separate PCR reaction using previously described primers and conditions [29]. The results confirmed the recombination breakpoint obtained using complete viral sequences.

Phylogenetic analysis of regions bounded by the crossover sites
As shown in Figure 2a, phylogenetic reconstructions for F1 specific regions bound by the crossover site, as defined by bootscan analysis, were compared with representatives of all subtype and sub-subtype references available in the HIV database (year 2008) and with other subclade F1 published sequences. The result of the ML tree revealed all our sequences clustered on a branch of subclade F1 and further into one separate sub-branch intrinsic to South America, particularly Brazil (100%   (Figure 3a &3b).
The phylogenetic tree based on the fragment characterized as subtype B by bootscan from all of the six isolates is shown in Figure 2b. The resulting tree topology agrees with the accepted HIV-1 group M phylogeny and the majority of the internal nodes are supported with high aLRT values. Despite the fact that B fragments in these isolates have shorter sequences and some group M variants cannot resolve some of the internal nodes, all of them can resolve the terminal nodes.

Molecular rate of CRF46_BF1
Five of the current six BF1 isolates described in this study (designated as CRF46_BF1 in the Los Alamos database) were detected in 36 samples selected from 888 samples infected with HIV-1 F1 based on pol subgenomic fragment sequencing [28]. Based on these results, the molecular distribution of the CRF46_BF1 accounts for 0.56% of the HIV-1 circulating strains in São Paulo. Next, we aimed to compare the recombinant profiles of our sequences to other HIV BF1 genomes at the nucleotide level to illustrate the distribution of their breakpoints. This was done by retrieving the full-length genomes from all BF1 and CRF_BF1 isolates available in the Los Alamos database. The automated jpHMM was used for mapping breakpoints with significant recombination signal (Figure 4). Our analysis showed that two variants (GenBank:DQ085869; BREPM11931 and DQ085870; BREPM11931) annotated as BF1 recombinants in the database, appear ancestral to subtype B strains. The recombination mapping of the nef-U3 overlap detected in our sequences was also found in CRF39_BF1 and four other URF BF1 recombinants. In addition, most of the sequences have undergone multiple rounds of recombination events. These data suggest that this part of the nef-U3 overlap is a possible 'hot spot' for recombination.

Identification of Related HIV-1 Strains in the database
Fragment B from all six isolates shared 96% sequence identity with the B stretch in the nef-U3 overlap from the Brazilian 93br029 which was isolated in 1993. Thus, we assume that the initial recombination event happened several years before 1993.

Partial LTR nucleotides alignment features
A detailed scrutinization of the partial nucleotide alignment of the 3' LTR regions relative to HXB2 and consensus sequences of other HIV subtypes (Year 2005) is shown in Figure 5. Conform to the consensus sequence GGGRNNYYCC, additional NF-B binding sites were found in three strains from the current study. A subclade F1 specific insert of 13-15 [39] nucleotides downstream of the NF-B III binding site was not observed in our sequences and added further support to our results, indicating that our sequences are not genuine F1 subsubtypes but BF1 recombinant isolates. Absence of this nucleotide signature was also observed in isolates F1. JP.2004.DR6082, F1.JP.2004.DR6190, and 01BR087, which have previously been classified as pure subclade F1 sequences.

Discussion
In the present study, we have characterized six NFLG sequences that posses mosaic genomic structure identical to the previously described strains, 94BR_RJ_41 and 99UFRJ_16 with a genome of predominantly subtype F1 and the nef-U3 overlap portion of the LTR of subtype B (recovered from patients residing in Rio de Janeiro) position outside the single cluster formed by isolates 01BR087 and all BF1 recombinants identified in this study, except 06BR FPS561 (recovered from patients residing in São Paulo) (Figure 2a&3a). The discordant branching between gag-pol and env sequences of 06BR FPS561, 94BR-RJ-41 and 99UFRJ-16 isolates can be explained by the occurrence of another recombination events after the spread of their common ancestor. Generally, our results suggest that the 11 recombinant sequences were not the result of one, but at least three independent recombination events that produce similar simple recombinant structures. In particular, BF sequences isolated in Japan and Rio de Janeiro may have originated from different BF recombinant ancestors than those sequences isolated in São Paulo. Thus, by excluding all the isolates that branch out of the main cluster, we provide a total of 6 sequences (01BR087 and 5 sequences described in this study) that meet the formal requirement for assigning a new CRF46_BF1. Again, in the phylogenetic tree of the F1 subclade fragment, the two recently isolated Japanese strains (F1.JP.2004. DR6190 and F1.JP.2004.DR6082) formed a rigid subcluster with isolate 06BR FPS561 and branch outside the subcluster formed by the other five viruses described in this study, but still strongly position within the main Brazilian subclade F1 sequences. This result suggests that the viruses found in the Japanese patients share a distinct common ancestry originating in Brazil. It is possible that the heavy traffic of people from both countries across international borders could have facilitated the spread of these viruses in both countries.
Based on the criteria of inclusion of the samples in this study, we were able to show that the CRF46_BF1 accounts for 0.56% of the HIV-1 circulating strains in São Paulo, similar to the frequency of subclade F1 reported from this region [28]. The apparently low prevalence of the CRF46_BF is ecological and may not be due to inherent properties of the virus itself but rather to the chance results of subtype B (a founder virus in Brazil), where it is introduced and consequently established into our HIV infected population before the new CRF and other subtypes are introduced.
Our analysis also showed that the recombination of subclade F1 with subtype B at the nef-U3 overlap portion of the LTR appears to be a recurrent finding because it has also been found in CRF39_BF1 and other unique HIV-1 recombinants [17,25,40,41]. In HIV, the existence of recombinational hot spots is common given that they have been described in cell-free systems [42] and exists in the dimer initiation sequence of the HIV-1 5'-untranslated region and some preferential sites across the viral genome [43][44][45][46]. Several studies have demonstrated that RNA hairpin structures strongly correlate with recombination hotspots in various regions of the HIV-1 genome [42,43,46,47]. Thus, based on the later mechanisms, it is possible that hairpins promote recombination by hampering the RT during reverse transcription or direct interaction with template [46,48,49].
The HIV-1 LTR region is composed of various cis-acting regulatory components needed for proviral DNA synthesis, integration of the nascent viral cDNA into the host cell genome, transcription and modulation of HIV genes expression [50,51]. Early reports showed that the LTR region is made up of three segments designated as U3, R and U5 [52]. The U3 modulatory region entirely overlaps with nef [53] and is essentially required during reverse transcription for first template transfer and integration of the provirus into the host genome. Moreover, this region seems to regulate the transcription pathway of HIV viral promoters by directly or indirectly interacting with a large number of cellular proteins, including NF-AT, Ets-1, USF, AP-1, COUP and Sp1 [54]. Thus, substitution through recombination of the nef-U3 overlap portion of the LTR with that of a genetically different subtype, as in our isolates, may affect the binding of both cellular and viral transcription factors. In turn, this may influence viral transcription levels, potentially enhancing the propagation of a recombinant virus leading to the persistance of a circulating form.
Several studies reported successful results in inhibiting HIV-1 replication by using synthetic siRNAs targeting either viral RNA sequences or cellular mRNAs encoding proteins that are critical for HIV-1 replication [55][56][57][58]. The study conducted by Yamamoto and his colleagues [59] showed a considerable sustainable suppression of HIV replication and control of CC-chemokine production associated with nef expression in HIV-1-infected macrophages following transfection of short hairpin RNA (shRNA) by a lentivirus vector system expressing HIV-specific shRNAs. These results allowed the authors to conclude that lentivirus-vector-based RNA interference of the U3-overlapping region of HIV-1 nef may have potential usefulness as a genetic vaccine against HIV-1 infection. Furthermore, Ludwig and collaborator [60] proved that HIV-1 contains an antisense gene in the U3-R regions of the LTR responsible for both an antisense RNA transcript and proteins. This antisense transcript has tremendous potential for intrinsic RNA regulation because of its overlap with the beginning of all HIV-1 sense RNA transcripts by 25 nucleotides. The novel HIV antisense proteins encoded in a region of the LTR that has already been shown to be deleted in some HIV-infected long-term survivors and represent new potential targets for vaccine development [60,61].
Given the biological relevance described to the U3 region, it is probable that the intersubtype recombination in this region could play an important role in HIV evolution with critical consequences for the development of efficient genetic vaccines.
During phylogenetic analysis, the B fragments of our six strains and the other five strains (marked with a triangle symbol in Figure 2b), which showed identical mosaic genomic structures, were clearly distinct from available South American subclade F1 sequences, particularly of Brazilian origin. This result coupled with the absence of the 13-15 nucleotides insertion downstream of the NF-B III binding site, which is typical for subclade F1, agrees with the interpretation that the segment at the nef-U3 overlap portion of the LTR of the eleven isolates originates from subtype B. Unlike the marked clustering of the eleven isolates in the tree generated from the F1 fragment, the tree of fragment B depicted in Figure 2b shows them to fall in different subbranches within subtype B reference sequences. This result is most likely explained by the short lengths of the fragment B sequences.

Conclusion
In this study, we describe the NFLG sequence analysis from six HIV-1 isolates sampled from São Paulo and five other published isolates that had an identical breakpoints between subclades F1 and B at the nef-U3 overlap portion of LTR. Six of these sequences (five from this study and one from other published sequences) are currently classified as a member of the CRF46_BF1 family. Our data is relevant to guide diagnosis and vaccine development. We conclude that recombination is a potentially important mechanism that significantly contributes to HIV genetic variability with serious implications for diagnosis, drug treatment and optimal vaccine development.