An RNAi in silico approach to find an optimal shRNA cocktail against HIV-1
© Méndez-Ortega et al; licensee BioMed Central Ltd. 2010
Received: 25 August 2010
Accepted: 20 December 2010
Published: 20 December 2010
HIV-1 can be inhibited by RNA interference in vitro through the expression of short hairpin RNAs (shRNAs) that target conserved genome sequences. In silico shRNA design for HIV has lacked a detailed study of virus variability constituting a possible breaking point in a clinical setting. We designed shRNAs against HIV-1 considering the variability observed in naïve and drug-resistant isolates available at public databases.
A Bioperl-based algorithm was developed to automatically scan multiple sequence alignments of HIV, while evaluating the possibility of identifying dominant and subdominant viral variants that could be used as efficient silencing molecules. Student t-test and Bonferroni Dunn correction test were used to assess statistical significance of our findings.
Our in silico approach identified the most common viral variants within highly conserved genome regions, with a calculated free energy of ≥ -6.6 kcal/mol. This is crucial for strand loading to RISC complex and for a predicted silencing efficiency score, which could be used in combination for achieving over 90% silencing. Resistant and naïve isolate variability revealed that the most frequent shRNA per region targets a maximum of 85% of viral sequences. Adding more divergent sequences maintained this percentage. Specific sequence features that have been found to be related with higher silencing efficiency were hardly accomplished in conserved regions, even when lower entropy values correlated with better scores. We identified a conserved region among most HIV-1 genomes, which meets as many sequence features for efficient silencing.
HIV-1 variability is an obstacle to achieving absolute silencing using shRNAs designed against a consensus sequence, mainly because there are many functional viral variants. Our shRNA cocktail could be truly effective at silencing dominant and subdominant naïve viral variants. Additionally, resistant isolates might be targeted under specific antiretroviral selective pressure, but in both cases these should be tested exhaustively prior to clinical use.
Despite the advent of highly active antiretroviral therapy (HAART), human immunodeficiency virus (HIV-1) is still a matter of concern for public health . The major obstacle to finding a cure lies in the integration of the viral genome, by virtue of which the virus will always have a chance to restart the infection . The overwhelming genetic variability of HIV-1 is mainly due to the error-prone nature of reverse transcriptase (RT) . Other factors are also responsible for generating quasispecies, and usually a combination of factors -genetic (e. g. HLA type), immunological (e. g. CD8+ cytotoxic T lymphocytes selective pressure) and viral (e. g. HIV type, subtype, recombination events) among others- contributes to the exhaustion of the immune system [4, 5]. Moreover, the virus has an innate ability to accumulate mutations that are readily accepted by its flexible proteins . Collectively, these factors help the virus to overcome HAART . Clearly, effective strategies are needed to combat each replication-competent viral variant that may emerge under any circumstances or selective pressure [8, 9]. Although HAART saves thousands of lives, resistant variants emerge, even though multiple key steps in the viral replication cycle are targeted simultaneously . Indeed, some cases have shown persistent viral replication, even under successful HAART [11, 12].
RNA interference (RNAi) is an evolutionarily conserved naturally occurring eukaryotic process by which double-stranded RNA (dsRNA) triggers post-transcriptional gene silencing . Research during the last decade has focused on the possibility of using it to treat various diseases . In fact, several in vitro and in vivo RNAi approaches have proven effective at inhibiting HIV-1 [15–17], and such studies have shown that replication is potently inhibited beyond initial replication only when multiple conserved regions in the viral genome are targeted simultaneously [18, 19]. However, even though HIV-1 has been inhibited in vivo in a humanized mouse model , there is no absolute certainty this will extrapolate to humans. Key differences between mouse models and humans may influence the viral population and its evolution, especially if complete inhibition is not achieved.
shRNA design to date has been based on studies of HIV variability that have focused on conserved regions and multiple sequence alignments (MSAs) , in which the HXB2 reference genome has been used to select the consensus silencing sequence. Efficient silencing molecules have also been selected by in vitro screening . Previous studies analyzing 170 and 495 full-length genomes identified 19 and 216 target sequences respectively, showing that a greater number of viral genomes provides more evidence for variability [18, 22]. Other authors have analyzed the conservation of unique targets from gene sequence fragments of 19 nucleotides . However, 75% conservation among its genomes still allows the virus 25% variability, which it can use to escape from shRNA-based silencing. This highlights the importance of analyzing not only previously reported parameters of silencing efficiency [24, 25], but also enough sequences to represent the actual viral variability. We addressed this issue by including in our analysis resistant isolates and more than 1000 viral genomes representing the M group viral divergence. The principal target was RT, but we further analyzed complete genome sequences. In silico studies can produce accurate enough approximations to guide better experimental approaches; thus, with this in mind, we developed an in silico approach for identification of the best HIV silencing molecules. Our in silico approach scanned multiple HIV-1 aligned genomes in search for the most frequent (dominant and subdominant) nucleotide variants in several conserved regions instead of identifying a single consensus sequence for each, in order to be able to use them all simultaneously in a combination cocktail. These variants were analyzed following Zhou and Zeng's  parameters in order to select the ones that could be efficient shRNAs, given a silencing score and an exhaustive search for off-target effects.
Conserved regions and prevalent drug-resistant mutations
Target regions within Conserved Domain RT_rtv
Mutation in RT2 and prevalence3
Evaluated Region in MA4
2610 - 2630
2700 - 2760
74-76, 78, 81
2800 - 2840
2835 - 2870
2865 - 2905
Sequence retrieval and MSAs
A total of 2,264 sequences from the non-specific first line regimen were downloaded from HIVRT&PrDB and aligned. In addition, four specific MSA from specific regimens were generated independently, but with caution on not including sequences with previous treatment history: Stavudine-Lamivudine-Nevirapine (D4T-3TC-NVP) MSA (91 sequences), Zidovudine-Lamivudine-Efavirenz (ZDV-3TC-EFV) MSA (1,381 sequences), Zidovudine-Lamivudine-Abacavir (ZDV-3TC-ABC) (52 sequences) and Zidovufine-Lamivudine-Nevirapine (ZDV-3TC-NVP) (212 sequences). Six MSA from Los Alamos HIV databases were used to assess the impact of viral diversity: three from only the pol gene (B subtype no recombinants, 778 sequences; Group M plus recombinants, 1206 sequences; all subtypes, 1250 sequences) and three from complete HIV genomes (B subtype no recombinants, 790 sequences; Group M plus recombinants, 1214 sequences; all subtypes, 1257 sequences). Some sequences were present in more than one MSA and were discarded.
Thirty-five Colombian samples from hospitalized symptomatic HIV-positive patients with viral loads over 1000 copies/ml were chosen for genotyping and were analyzed so that the sequences from resistant isolates could be included in the study (resistance data will be published separately). These isolates were added to the 2,264 resistant isolate alignment to give a 2299 sequence alignment. Accession numbers are: [GenBank:HM584982, GenBank:HM584983, GenBank:HM584984, GenBank:HM584985, GenBank:HM584986, GenBank:HM584987, GenBank:HM584988, GenBank:HM584989, GenBank:HM584990, GenBank:HM584991, GenBank:HM584992, GenBank:HM584993, GenBank:HM584994, GenBank:HM584995, GenBank:HM584996, GenBank:HM584997, GenBank:HM584998, GenBank:HM584999, GenBank:HM585000, GenBank:HM585001, GenBank:HM585002, GenBank:HM585003, GenBank:HM585004, GenBank:HM585005, GenBank:HM585006, GenBank:HM585007, GenBank:HM585008, GenBank:HM585009, GenBank:HM585010, GenBank:HM585011, GenBank:HM585012, GenBank:HM585013, GenBank:HM585014, GenBank:HM585015, GenBank:HM585016].
Variability analysis and shRNA design
MSA coverage by shRNAs
h PC -SV (%)
j PC-DV (%)
Pol Subtype B no Recombinants
Pol Group M plus Recombinants
Pol All Subtypes
Genome Subtype B no Recombinants
Genome Group M plus Recombinants
Genome All Subtypes
Best shRNAs targeting sequences in more than one MSA.
a HXB2 Coordinates
b shRNA Sequence
d Targeted MSAs
AGt CCTATTGAa ACTGTACCAGTA
Multiple Comparisons for Score, and Proportion of dominant variants.
2299 Resistant Isolates w1
2299 Resistant Isolates w2
GENOME DNA All Subtype
GENOME DNA GroupM plus Recombinants
GENOME DNA No Recombinants
POL DNA All Subtypes
POL DNA GroupM plus Recombinants
Pol DNA No recombinants
Assigned letter group
Mean a Score
A B K L
A B K L
A B C D K L M
A B C D K L M
A B C D K L M
A B C D K L M
A B C D K L M
A B C D K L M
A B L
A B K L
A B E F G H I J K L
C D E F G H I J L M
Using BLAST, eight out of forty-eight shRNAs were found in the selected databases. Results are shown in Table 3. No hit had 100% overlap, and overlap was toward the 3' terminal end of the shRNA (Additional file 3).
This is the first in silico approach to novel shRNA design based on the scored search of a group of sequences directed at silencing the dominant and subdominant most frequent wild type and mutant RT variants, targeting conserved regions. We developed an algorithm that followed previously published sequence parameters from effective shRNAs, using a free energy cut-off and specific sequence features [24, 25]. No current approach targets frequent viral variants simultaneously; instead, it is usual to target several conserved regions with one sequence. The trouble is that for each of these regions, other frequent variants that do not match the reference genome sequence HXB2 need to be considered. Similar interesting works have been undertaken also analyzing publically available sequences, such as McIntyre et al. 2009. However, these differ from ours in that they neither searched for subdominant viral variants and/or infrequent viral variants, nor searched for shRNAs able to target resistant viruses that emerged under a specific antiretroviral selective pressure. Also, they do not describe in detail their in silico analyses; the features for silencing activity they evaluated, the filters or threshold they used, whether they included a free energy cut-off, their approach to ambiguities (UIPAC letter code), whether they used all the sequences, how they analyzed sequence quality in their MSAs, etc. They did design shRNAs of different lengths directed toward HXB2 reference genome, that overlaps within one of our regions -emphasizing the conservation of this part of the viral genome- however, those molecules do not match our subdominant variants. Our results identified a greater number of viral variants that any other study.
shRNA design is difficult, owing to the multiple requirements for achieving efficient silencing in vivo, and to all the parameters that must be carefully followed. Available programs are usually directed towards siRNA rather than shRNA design , and it has been shown that these programs do not always correctly predict the silencing efficiency of shRNAs . Online tools do not allow for more than one aligned sequence to be used, but several aligned sequences are necessary for designing silencing molecules against error-prone viruses such as HIV. Throughout the HIV-1 genome, we identified the less variable regions that showed the best silencing predicting features. However, MSAs revealed that there is at least between 20.12% to 21.31% of naïve isolates, and between 14.51% to 45.03% -percentages result from subtracting the table values out of 100%- of resistant isolates that will not be targeted using solely the dominant viral variant (Table 2). For that reason targeting multiple genome regions with one sequence for each will not solve this problem, because each region will have different untargeted naturally occurring variants. Any design strategy based on consensus shRNA sequences is susceptible to viral escape in terms of long-term silencing, particularly in an HIV-1-infected human. HIV variability underlies the fact that key target selection is of utmost importance. The most frequent or dominant shRNA (one sequence) in all the alignments fell between 63.46% and 85.49% of the viral sequences with an average value of 75.20% (Table 2). This is consistent with previous findings in which targeting a single region resulted in rapid emergence of resistance by means of selecting subdominant variants -those that remained untargeted . Achieving a higher silencing could be obtained by targeting subdominant variants from the same region like the subdominant variants we found (Table 3 and Additional file 1. Ideally, all the viable changes in each targeted conserved sequence must also be targeted in order to achieve life-long silencing. For this we first attempted to analyze further viral variability on the basis of protein function or biological significance, which is thought to show the lowest variability. From the selected regions based on protein function, only region number 2 of RT conserved domain provided results (Table 1 and Table 3). This was probably because we were not merely looking for a conserved region, but a conserved region that met specific requirements such as free energy values and sequence specific features. This was based on the fact that shRNAs that are perfectly matched with their target sequences do not necessarily achieve 100% silencing. Nonetheless, our shRNAs targeted only two regions in PR and one in RT, highlighting the conservation of these regions despite analyzing complete genome sequences; complete genomes provided the same windows. It is interesting that all the HIV-1 group M sequences behave within the same limits of variability, and the inclusion of recombinants did not affect the results. High scores were predominant in these sequences, implying that within the selected regions changes are allowed preferably in the same positions, not randomly. Highest scores were not reached; this means that intrinsic HIV-1 sequence characteristics and variability are an obstacle to expecting specific silencing sequence features in shRNA molecules. In fact, reaching the highest score demands for a highly conserved region in which changes are limited to certain positions and certain nucleotide changes. The latter is due to the fact that there are multiple sequence features that need to be satisfied throughout the silencing molecule in such a way that increasing variability would reduce the probability of achieving them. Differences were only significant when analyzing resistant MSAs. Low scores of these sequences are attributable to the degree of polymorphisms that seem not to have any pattern, and to drug selected mutations. Changes can occur almost in any place of the 23 nt window with differences in frequency per position, but with no apparent restriction. That's why resistant MSA showed the highest entropy values with the lowest scores. Recently, Schopman et al.  showed that targeting common resistant variants that emerge under silencing therapy decreased viral escape, but then new routes of evading silencing were used by the virus. This is explained by our analyses, which showed that there is over 20% variability that the virus can use to escape, without any selective pressure (non-resistant MSAs). Resistant MSAs showed the capability of the virus to mutate much further beyond this 20%. In fact, non-resistant MSAs were grouped together and apart from resistant-MSAs (Table 4.). Window 1 from ZDV-3TC-EFV was different from all the other MSA (resistant and non-resistant) in dominant viral variants, and W3 from the same MSA was different also in subdominant viral variants. These results are consistent with the fact that W1 dominant viral variant is different from HXB2 reference sequence and also with the fact that W3 had the lowest entropy value, which is the same as saying that it showed the highest variability. Resistant-MSAs constitute an insight to understand virus evolution; nonetheless we doubt those to show the true limits. In any case, targeting the dominant and subdominant viral variants for each region may reduce this set of viable changes.
We did not find any other genome region to be targeted, probably due to some of the parameters used such as "number of sequences" in which regions that are not well represented by a certain number of sequences are discarded. Another reason is that other stringent conditions besides sequence conservation were assessed. Unfortunately, genome ends are underrepresented, which leaves long terminal repeats (LTRs) and other terminal regions outside of the study. LTR is thought to be a good region for this type of strategy, but the variability of this region cannot be addressed accurately due to the relative small number of complete sequences present in the databases. There is another explanation for not having found shRNAs for key regions within the RT conserved domain. For example, the conserved nucleotide positions for the YMDD motif ranges from 1 to 8 out of 12, in the nucleotide reference MSA from the pol gene (Los Alamos HIV Databases). The amino acid reference sequence for the window with the fewest variants was WPLTEEK, which can be formed by 512 different nucleotide sequences. The mutations throughout the reference Pol polyprotein MSA (Los Alamos HIV Databases) are W24R, P25LTS, T26SA, E27KAGR, E28K and K29ER, and these collectively give 286,654,464 possible nucleotide combinations. Another reason could lie in the three nearby amino acids (either to the left or to the right of the motif), which can be encoded by more than two codons due to the redundancy of the genetic code.
In 2004, an siRNA against nef was used to inhibit HIV-1 in vitro, however in these assays, several weeks later escape mutant viruses emerged. That was one of the first studies to propose that virus variability should be addressed with a combination of siRNAs  Escape happened because the selected target sequence is biologically dispensable, a fact that should not be ignored even though the target was not thought to be nef itself, but all the RNA viral variants which are supposed to have this sequence. Now this underlines the importance of including much more biological information criteria for target selection than simply sequence conservation or the presence of a sequence in multiple viral transcripts. In fact, we found no conservation in any of the 21 nucleotides of the nef sequence used, throughout an MSA (data not shown). In other words, silencing was directed against an uncommon or subdominant variant of the virus, a fact that could have played a role in resistance development. However, it is important to highlight that HIV-1 silencing has been achieved in in vivo mouse models. Silencing was observed in a mouse model systemically infected with a combination of siRNAs consisting of a human specific CCR5 siRNA and two viral specific siRNAs , and no evidence of viral escape mutants was observed. Furthermore, a mouse model was engrafted with human hematopoietic stem cells that were previously transduced with a lentiviral vector carrying a shRNA against nef. While transduction efficiency in this in vivo model was reduced, ex vivo transduction of mature CD4 (+) cells lead to wild type HIV-1 resistant T cells in vivo . Now, even though certain studies have shown the emergence of resistant variants, our results raise the question whether they always result from "resistance" to RNAi therapy or whether they are naturally occurring uncommon variants that are being selected rather than targeted. So even though there was no evidence of resistance in the first mouse model--and despite the fact it was clearly shown that the silencing effect is sequence specific, regardless of the type of infection used (systemic or ex vivo) and regardless of the time an experiment is assessed--it cannot be said for sure that resistance wouldn't have occurred later. Even when assessing resistance was not the aim of these studies, it constitutes a threat for therapy outcome. In fact, resistance against HAART inhibitors arises in many human cases within a few years of initiating therapy, usually 2-4, depending on different variables [7, 31–34]. Then, longer periods of time in an animal model are therefore needed to evaluate the emergence of resistance or the inhibition of virus replication, especially since treatment has to be life-long. Our target was RT because it cannot be deleted from the genome and is the main source of virus variability. Since it is a HAART target, information about mutations induced under selective drug pressure is available [33, 35–38]. Reverse transcription is a prerequisite for viral integration and infection of other cells, so its silencing would reduce the number of cells that become actively and latently infected, preventing the establishment of more HIV cell reservoirs.
Several groups have also designed silencing molecules within our same pol region, confirming the importance of this region as an HIV target [16, 19, 21, 23, 30, 39, 40]. However, these previous molecules do not exactly begin from the same position as ours, have different lengths, and are only directed to silence viruses that match the dominant HXB2 reference genome sequence or specific mutant viruses that emerge during RNAi experiments in prolonged cell culture. We compared our dominant shRNAs to 100 previously published pol-directed silencing molecules and none of them was found to be identical, but some do map really near within the virus genome (data not shown). The nearest previous published molecule (2328-2346 with respect to HXB2)  maps 5 nt downstream of our targets against non-resistant isolates (2333-2356 with respect to HXB2). Seven other molecules were designed from position 2326 to 2360 with respect to HXB2 , and three more targets were designed in region 4752 to 4775 , overlapping our shRNA for non-resistant isolates that was identified when analyzed with no score restriction. Our molecule mapped within position 4747- 4770 with respect to HXB2 (Additional file 2). There are neither previous reports concerning shRNAs targeting viruses resistant to current first line antiretrovirals, nor reports about shRNAs targeting other dominant or subdominant viral variants within the same genome region. To target viruses resistant to certain antiretrovirals or to certain line of antiretrovirals with RNAi while taking them, or to target dominant and subdominant viral variants simultaneously, may hypothetically impede viral escape due to a major reduction in the possible available nucleotide changes that are not deleterious to the virus. This type of approach might cover many more viral variants than using just one strategy, but it will not solve economic issues and side effects concerning life-long HAART.
The emergence of resistant viral variants is an inconvenience that must be addressed carefully, particularly in the case of using shRNAs since they normally work in a sequence-specific manner. In this study we identified dominant and subdominant frequent viral variants representing the set of naturally occurring changes that are observed in the viral population that has been sequenced world-wide, and we designed shRNAs against these. We found that in order to cover all the viral variability to impede viral escape, we will need to use too many silencing molecules, which is not clinically feasible. Even in the absence of a selective pressure such as antiretrovirals, there are plenty of polymorphisms that can occur throughout the viral genome that can allow the virus to escape RNAi therapy. In addition, drug selective pressure is capable of inducing even more unusual changes. Although it is difficult to determine the exact mechanism by which it is possible to completely avoid viral escape, what is important now is that we have identified most of what we have to set upon. Our shRNA cocktail was developed based on the viral variability we found, and there is chance that it could be used alone (cocktail) or as a complement to HAART. Several authors have also proposed a cocktail of shRNAs to target HIV-1, but our cocktail is different in that it was designed to target dominant as well as subdominant viral variants. Long-lasting silencing for humans seems much more possible with this complete approach. Our results point towards the conclusion that viral population is modeled by selective pressures, since it was possible to find shRNAs for most of the regimens (it was, however, difficult for ZDV-3TC-NVP). Special attention must be taken regarding the fact that selective pressures, including silencing molecules, may induce or push the virus to mutate within the limits of resistance to antiretrovirals, since most changes are non- deleterious. Further studies are needed to find regimens with resistance patterns that can be specified and better controlled using the fewest number of shRNAs. Proving real in vivo efficiency in combination therapy, as well as identifying off-target and non-specific off-target effects of the shRNAs, is absolutely required before starting any therapeutic approach.
Our study is the first in identifying naturally occurring and induced nucleotide changes in the virus, based on data from viruses modeled by natural selection from natural hosts (humans) and from viruses modeled by drug selective pressure in treated patients.
This work is important to understanding the complexity of HIV-1 variability, in order to be able to target it effectively. Indeed, nucleotide substitutions have occurred under RNAi selective pressure, but we have shown in this study that there are much more nucleotide changes that can occur which may result in viral escape. Different approaches might work while trying to address viral ability to escape RNAi therapy, but undoubtedly the more information we know about the mechanisms the virus uses to do so, the more we can do about it.
Conserved regions and prevalent drug-resistant mutations: target sites
Pol polyprotein AAB50259.1 from the HXB2 reference sequence was used as query against the conserved domain database (CDD) of NCBI  in order to identify catalytic residues, residues involved in DNA binding, dNTP binding, or active site residues with no other annotation. The map of coordinates from the HXB2 reference sequence accession number K03455 (gi|1906382|gb|K03455.1|HIVHXB2CG human immunodeficiency virus type 1 (HXB2), complete genome; HIV1/HTLV-III/LAV reference genome), available at Los Alamos HIV databases http://www.hiv.lanl.gov, was used to find the exact positions of selected regions in MSAs using the sequence as a guide. The complete genomes were also assessed in order to identify other possible targets within conserved regions outside of the pol gene.
In addition, high prevalence resistance mutations were selected from the "Mutation Prevalence According to Subtype" web page http://hivdb.stanford.edu/cgi-bin/MutPrevBySubtypeRx.cgi from the Stanford HIV Drug Resistance Database (HIVRT&PrDB) http://hivdb.stanford.edu/. Mutations which were located within the selected regions of the conserved domain were preferred.
Sequence retrieval and MSAs
Sequences from resistant isolates were retrieved from HIVRT&PrDB by a therapy criterion that consisted of different combinations of three antiretrovirals, two nucleoside analog reverse transcriptase inhibitors (NRTI) and one non-nucleoside analog reverse transcriptase inhibitor (NNRTI). These sequences were not grouped by a specific regimen; instead they were grouped together as sequences from resistant viruses that emerged under different first line regimens. Duplicates and repeated sequences were eliminated using the Elim Dupes tool http://www.hiv.lanl.gov/content/sequence/ELIMDUPES/elimdupes.html from Los Alamos HIV databases. Sequences were further processed to eliminate foreign characters not compatible with FASTA format (e. p. "~"), and were then aligned to generate an MSA. In addition, first-line regimens that are used in Colombia were selected from the "National Resolution No. 3442-2006 for the care of HIV-infected patients", approved for treatment of HIV infection in Colombia by the Ministerio de Protección Social . This was done in order to retrieve sequences generated by the specific drug-selective pressure induced by each regimen in our country. The regimens were zidovudine-lamivudine-efavirenz (ZDV-3TC-EFV), which is normally the first choice, zidovudine-lamivudine-nevirapine (ZDV-3TC-NVP), zidovudine-lamivudine-abacavir (ZDV-3TC-ABC) and stavudine-lamivudine-nevirapine (D4T-3TC-NVP), which are also first line regimens. All of these select M184V mutation. Sequences were aligned by HMM (Hidden Markov Models) model HIV-1/SICcpz  into four independent MSA, one for each regimen. MSAs were generated using the HIVAlign tool from Los Alamos HIV databases, which is an implementation of the HMMER package http://www.hiv.lanl.gov/content/sequence/HMM/HmmAlign.html, and sequences were codon-aligned with Gene Cutter http://www.hiv.lanl.gov/content/sequence/GENE_CUTTER/cutter.html. HXB2 was included in each MSA to identify the selected regions. MSAs were further edited manually using MSA editors Bioedit and eBIOX, and analyzed in CLC DNA Workbench http://www.clcbio.com (Aarhus, Denmark).
Sequences from naïve isolates were retrieved from Los Alamos HIV databases. MSAs were generated as mentioned and downloaded from this database. In order to address the impact of virus variability in the aim of the study, different type of MSAs were downloaded from the database (i.e. Subtype specific, pol gene specific, with and without recombinants, complete genomes, all subtypes, whole group M). HXB2 was also included in each alignment and MSAs were also manually edited.
Additionally, 35 Colombian isolates from symptomatic HIV-positive patients were included in the study, and the fasta sequences were incorporated into the MSA of the non-regimen-specific resistant viral sequences. Only samples with written informed consent, complete clinical history and viral loads over 1000 copies/ml were included. RNA was extracted using the semi-automated Nuclisens Minimag and Nuclisens Extraction Kit according to the manufacturer's instructions (Biomerieux, Marcy l'Etoile, France). HIV was genotyped with a commercial TRUGENE HIV-1 kit (Siemens Diagnostics, Deerfield, USA), and fasta sequences were further analyzed with HIVdb software from HIVRT&PrDB. Quality assessment and Calibration Population Resistant (CPR) tools were used to evaluate probable sequencing errors and mutation prevalence, respectively.
Variability analysis and shRNA design
molecules. The genomes were first aligned using HMM (see sequence retrieval and MSAs) and the MSAs were scanned using sliding windows of 23 bp. For each window, the information entropy as defined by Shannon  was calculated:
Where n is the number of different sequences within each window, πi is the proportion of the i-th sequence (i.e. the number of times this sequence appears in the window divided by the number of sequences in the alignment), and H p is the information entropy of the p-th window.
Only windows with information entropy below two -allowing viral variability to be studied- and with more than 40% of informative sequences (i.e. not entirely composed by gaps) were accepted, avoiding windows that were too variable, and two sets of conditions were evaluated on every different sequence of the window. First, we evaluated mandatory conditions, accepting only sequences:
With less than 10% gaps,
No gaps in the middle of the sequence,
No ambiguous characters (i.e. only A, C, T and G accepted),
No G or C stretches of 4 bp or more in length,
No T stretches of 3 bp or more in length,
Starting with T or C,
With GC percentage between 30 and 50,
ΔG measured in the 5' last 7 bp of the sense sequence above -6.6, and
ΔG measured in the 5' last 6 bp of the antisense sequence above -6.6.
The ΔG were calculated as described by Freier et al. Afterwards, we evaluated a score only on accepted sequences, defined as a sum of differentially scored properties
The sequence starts with AA, AT, TT or TA: +0.5,
The sequence ends with AA, AT, TT or TA: +0.5,
The AT count of the 15-21 position range is at least 4: +1, or more than 4 (+1 for every additional A or T in the range),
The 21st position is A or T: +1,
The 5th position is C or G: +0.5,
The 7th position is G: +0.5,
The 13th position is A or T: +0.5,
The 16th position is T: +0.5,
The 8th position is C and the 9th position is A: +1,
The 10th and 11th positions are C: +1,
The 12th position is T and the 13th position is G: +1,
The 15th position is G and the 16th position is A: +1,
The 19th position is T and the 20th position is A: +1,
The 20th position is T and the 21st position is A: +1,
The GC count is below 11: +1,
The 21st position is G or C: -1,
The 5th position is T: -0.5,
The 6th position is T and the 7th position is A: -1,
The 8th position is T and the 9th position is A: -1,
The 10th and the 11th positions are G: -1,
The 11th position is G and the 12th position is C: -1, and
The 19th position is G and the 20th position is A: -1
Unique sequences with a score equal or greater than two were accepted because they had at least two efficient sequence specific silencing features, an advantage over silencing molecules that are made to follow only a sequence specific silencing; higher score thresholds would have resulted in the elimination of too many sequences per window leaving less than an 80% of the initial amount, rendering further analyses not worthy. Only windows with at least 80% of sequences accepted were used in further analyses. Within each accepted window, sequences that appeared 4 or more times were named as subdominant viral variants, viral variants meaning different from reference HXB2 sequence. No statistical approach was undertaken to asses this frequency limit, it was chosen in order to avoid accepting nucleotide changes that could have been sequencing errors, but at same time trying to rescue very infrequent but viable and possible ones accepted within the limits of virus evolution. Dominant viral variant, refers to the viral variant that appears the most, but in this case it may be exactly the same as reference HXB2 sequence. A window might have one or more dominant variants when there are sequences that appear a similar number of times as the dominant viral variant in a MSA. All the accepted sequences in the accepted windows were searched against the NCBI's Human RNA database using BLAST  with an e-value threshold of 0.1. Sequences were also searched against the Human genomic sequences in order to search for a potentially transcribed sequence not identified so far as mRNA, with identical parameters.
Data was organized considering each MSA as an independent group, and statistical analyses were performed with SPSS for windows package V 8.0.0 (SPSS Inc., IBM, Chicago, Illinois). A descriptive analysis was performed to describe quantitative variables for which weighted average values were calculated, and qualitative variables were expressed as frequencies. For all the multiple comparisons, the non parametric Bonferroni-Dunn correction test was used. Weighted average scores obtained for each MSA were compared in order to address if scores were significantly different within all categories using a Student t-test (p < 0.05, α = 0.05). These weighted average scores were calculated as the average score for each MSA, but it was weighted with respect to the number of sequences that showed that score. This is why it is a "weighted" average score. The Spearman correlation coefficient was used to address Score-Entropy non-causal association (p < 0.01, α = 0.01). Differences in the proportion of sequences that could be targeted by the most frequent variant of each MSA were also addressed with a Z-test (p < 0.05, α = 0.05). Comparisons were made with no discrimination between resistant and non-resistant viral sequences. For MSAs for which more than one window was selected by the algorithm, each window was treated independently.
This study was sponsored by the National Institute of Health in Bogotá D.C., Colombia, and Departamento Administrativo de Ciencia Tecnología e Innovación COLCIENCIAS Bogotá D.C. Project Number 2104-343-19207, Colombia. The authors would like to acknowledge: Damaris Heredia, for her help in statistical issues, and the National Network of Laboratories from the Colombian National Institute of Health, for collecting the HIV positive samples. This study was partly financed by Departamento Administrativo de Ciencia Tecnología e Innovación COLCIENCIAS, Bogota D.C., Colombia, Project Number 2104-343-19207.
- Cohen MS, Hellmann N, Levy JA, DeCock K, Lange J: The spread, treatment, and prevention of HIV-1: evolution of a global pandemic. J Clin Invest 2008, 118: 1244-1254. 10.1172/JCI34706PubMedPubMed CentralView ArticleGoogle Scholar
- Colin L, Van Lint C: Molecular control of HIV-1 postintegration latency: implications for the development of new therapeutic strategies. Retrovirology 2009, 6: 111. 10.1186/1742-4690-6-111PubMedPubMed CentralView ArticleGoogle Scholar
- Hubner A, Kruhoffer M, Grosse F, Krauss G: Fidelity of human immunodeficiency virus type I reverse transcriptase in copying natural RNA. J Mol Biol 1992, 223: 595-600. 10.1016/0022-2836(92)90975-PPubMedView ArticleGoogle Scholar
- Boasso A, Shearer GM: Chronic innate immune activation as a cause of HIV-1 immunopathogenesis. Clin Immunol 2008, 126: 235-242. 10.1016/j.clim.2007.08.015PubMedPubMed CentralView ArticleGoogle Scholar
- Alimonti JB, Ball TB, Fowke KR: Mechanisms of CD4+ T lymphocyte cell death in human immunodeficiency virus infection and AIDS. J Gen Virol 2003, 84: 1649-1661. 10.1099/vir.0.19110-0PubMedView ArticleGoogle Scholar
- Silverman AP, Kool ET: RNA probes of steric effects in active sites: high flexibility of HIV-1 reverse transcriptase. J Am Chem Soc 2007, 129: 10626-10627. 10.1021/ja072791bPubMedView ArticleGoogle Scholar
- Potter SJ, Chew CB, Steain M, Dwyer DE, Saksena NK: Obstacles to successful antiretroviral treatment of HIV-1 infection: problems & perspectives. Indian J Med Res 2004, 119: 217-237.PubMedGoogle Scholar
- Chellappan S, Kiran Kumar Reddy GS, Ali A, Nalam MN, Anjum SG, Cao H, Kairys V, Fernandes MX, Altman MD, Tidor B, et al.: Design of mutation-resistant HIV protease inhibitors with the substrate envelope hypothesis. Chem Biol Drug Des 2007, 69: 298-313. 10.1111/j.1747-0285.2007.00514.xPubMedView ArticleGoogle Scholar
- Grimm D, Kay MA: Combinatorial RNAi: a winning strategy for the race against evolving targets? Mol Ther 2007, 15: 878-888.PubMedGoogle Scholar
- Bezemer D, van Sighem A, de Wolf F, Cornelissen M, van der Kuyl AC, Jurriaans S, van der Hoek L, Prins M, Coutinho RA, Lukashov VV: Combination antiretroviral therapy failure and HIV super-infection. AIDS 2008, 22: 309-311. 10.1097/QAD.0b013e3282f37489PubMedView ArticleGoogle Scholar
- Dahl V, Josefsson L, Palmer S: HIV reservoirs, latency, and reactivation: prospects for eradication. Antiviral Res 85: 286-294. 10.1016/j.antiviral.2009.09.016Google Scholar
- Kim H, Perelson AS: Viral and latent reservoir persistence in HIV-1-infected patients on therapy. PLoS Comput Biol 2006, 2: e135. 10.1371/journal.pcbi.0020135PubMedPubMed CentralView ArticleGoogle Scholar
- Leung RK, Whittaker PA: RNA interference: from gene silencing to gene-specific therapeutics. Pharmacol Ther 2005, 107: 222-239. 10.1016/j.pharmthera.2005.03.004PubMedView ArticleGoogle Scholar
- Lopez-Fraga M, Martinez T, Jimenez A: RNA interference technologies and therapeutics: from basic research to products. BioDrugs 2009, 23: 305-332. 10.2165/11318190-000000000-00000PubMedView ArticleGoogle Scholar
- Kumar P, Ban HS, Kim SS, Wu H, Pearson T, Greiner DL, Laouar A, Yao J, Haridas V, Habiro K, et al.: T cell-specific siRNA delivery suppresses HIV-1 infection in humanized mice. Cell 2008, 134: 577-586. 10.1016/j.cell.2008.06.034PubMedPubMed CentralView ArticleGoogle Scholar
- Liu YP, Gruber J, Haasnoot J, Konstantinova P, Berkhout B: RNAi-mediated inhibition of HIV-1 by targeting partially complementary viral sequences. Nucleic Acids Res 2009, 37: 6194-6204. 10.1093/nar/gkp644PubMedPubMed CentralView ArticleGoogle Scholar
- von Eije KJ, ter Brake O, Berkhout B: Stringent testing identifies highly potent and escape-proof anti-HIV short hairpin RNAs. J Gene Med 2009, 11: 459-467. 10.1002/jgm.1329PubMedView ArticleGoogle Scholar
- ter Brake O, Konstantinova P, Ceylan M, Berkhout B: Silencing of HIV-1 with RNA interference: a multiple shRNA approach. Mol Ther 2006, 14: 883-892. 10.1016/j.ymthe.2006.07.007PubMedView ArticleGoogle Scholar
- Liu YP, von Eije KJ, Schopman NC, Westerink JT, Brake OT, Haasnoot J, Berkhout B: Combinatorial RNAi Against HIV-1 Using Extended Short Hairpin RNAs. Mol Ther 2009, 17: 1712-1723. 10.1038/mt.2009.176PubMedPubMed CentralView ArticleGoogle Scholar
- ter Brake O, Legrand N, von Eije KJ, Centlivre M, Spits H, Weijer K, Blom B, Berkhout B: Evaluation of safety and efficacy of RNAi against HIV-1 in the human immune system (Rag-2(-/-)gammac(-/-)) mouse model. Gene Ther 2009, 16: 148-153. 10.1038/gt.2008.124PubMedView ArticleGoogle Scholar
- von Eije KJ, ter Brake O, Berkhout B: Human immunodeficiency virus type 1 escape is restricted when conserved genome sequences are targeted by RNA interference. J Virol 2008, 82: 2895-2903. 10.1128/JVI.02035-07PubMedPubMed CentralView ArticleGoogle Scholar
- Naito Y, Nohtomi K, Onogi T, Uenishi R, Ui-Tei K, Saigo K, Takebe Y: Optimal design and validation of antiviral siRNA for targeting HIV-1. Retrovirology 2007, 4: 80. 10.1186/1742-4690-4-80PubMedPubMed CentralView ArticleGoogle Scholar
- McIntyre GJ, Groneman JL, Yu YH, Jaramillo A, Shen S, Applegate TL: 96 shRNAs designed for maximal coverage of HIV-1 variants. Retrovirology 2009, 6: 55. 10.1186/1742-4690-6-55PubMedPubMed CentralView ArticleGoogle Scholar
- Zhou H, Zeng X: Energy profile and secondary structure impact shRNA efficacy. BMC Genomics 2009,10(Suppl 1):S9. 10.1186/1471-2164-10-S1-S9PubMedPubMed CentralView ArticleGoogle Scholar
- Ui-Tei K, Naito Y, Takahashi F, Haraguchi T, Ohki-Hamazaki H, Juni A, Ueda R, Saigo K: Guidelines for the selection of highly effective siRNA sequences for mammalian and chick RNA interference. Nucleic Acids Res 2004, 32: 936-948. 10.1093/nar/gkh247PubMedPubMed CentralView ArticleGoogle Scholar
- Tilesi F, Fradiani P, Socci V, Willems D, Ascenzioni F: Design and validation of siRNAs and shRNAs. Curr Opin Mol Ther 2009, 11: 156-164.PubMedGoogle Scholar
- Taxman DJ, Livingstone LR, Zhang J, Conti BJ, Iocca HA, Williams KL, Lich JD, Ting JP, Reed W: Criteria for effective design, construction, and gene knockdown by shRNA vectors. BMC Biotechnol 2006, 6: 7. 10.1186/1472-6750-6-7PubMedPubMed CentralView ArticleGoogle Scholar
- Boden D, Pusch O, Lee F, Tucker L, Ramratnam B: Human immunodeficiency virus type 1 escape from RNA interference. J Virol 2003, 77: 11531-11535. 10.1128/JVI.77.21.11531-11535.2003PubMedPubMed CentralView ArticleGoogle Scholar
- Schopman NC, ter Brake O, Berkhout B: Anticipating and blocking HIV-1 escape by second generation antiviral shRNAs. Retrovirology 2010, 7: 52. 10.1186/1742-4690-7-52PubMedPubMed CentralView ArticleGoogle Scholar
- Das AT, Brummelkamp TR, Westerhout EM, Vink M, Madiredjo M, Bernards R, Berkhout B: Human immunodeficiency virus type 1 escapes from RNA interference-mediated inhibition. J Virol 2004, 78: 2601-2605. 10.1128/JVI.78.5.2601-2605.2004PubMedPubMed CentralView ArticleGoogle Scholar
- Rodriguez-Arenas MA, Jarrin I, del Amo J, Iribarren JA, Moreno S, Viciana P, Pena A, Sirvent JL, Vidal F, Lacruz J, et al.: Delay in the initiation of HAART, poorer virological response, and higher mortality among HIV-infected injecting drug users in Spain. AIDS Res Hum Retroviruses 2006, 22: 715-723. 10.1089/aid.2006.22.715PubMedView ArticleGoogle Scholar
- Richman DD, Morton SC, Wrin T, Hellmann N, Berry S, Shapiro MF, Bozzette SA: The prevalence of antiretroviral drug resistance in the United States. AIDS 2004, 18: 1393-1401. 10.1097/01.aids.0000131310.52526.c7PubMedView ArticleGoogle Scholar
- Baldwin C, Berkhout B: HIV-1 drug-resistance and drug-dependence. Retrovirology 2007, 4: 78. 10.1186/1742-4690-4-78PubMedPubMed CentralView ArticleGoogle Scholar
- Sethi AK: Adherence and HIV drug resistance. HIV Clin Trials 2004, 5: 112-115. 10.1310/N53E-1930-NJMW-GL7CPubMedView ArticleGoogle Scholar
- Bakhouch K, Oulad-Lahcen A, Bensghir R, Blaghen M, Elfilali KM, Ezzikouri S, Abidi O, Hassar M, Wakrim L: The prevalence of resistance-associated mutations to protease and reverse transcriptase inhibitors in treatment-naive (HIV1)-infected individuals in Casablanca, Morocco. J Infect Dev Ctries 2009, 3: 380-391. 10.3855/jidc.247PubMedView ArticleGoogle Scholar
- Back NK, Nijhuis M, Keulen W, Boucher CA, Oude Essink BO, van Kuilenburg AB, van Gennip AH, Berkhout B: Reduced replication of 3TC-resistant HIV-1 variants in primary cells due to a processivity defect of the reverse transcriptase enzyme. EMBO J 1996, 15: 4040-4049.PubMedPubMed CentralGoogle Scholar
- Basu VP, Song M, Gao L, Rigby ST, Hanson MN, Bambara RA: Strand transfer events during HIV-1 reverse transcription. Virus Res 2008, 134: 19-38. 10.1016/j.virusres.2007.12.017PubMedView ArticleGoogle Scholar
- Boyle BA, Cohen CJ, DeJesus E, Elion R, Frank I, Moyle GJ, Sax P: Update on antiretroviral therapy: the 15th CROI. AIDS Read 2008, 18: 273-278. C273PubMedGoogle Scholar
- ter Brake O, Berkhout B: Lentiviral vectors that carry anti-HIV shRNAs: problems and solutions. J Gene Med 2007, 9: 743-750. 10.1002/jgm.1078PubMedView ArticleGoogle Scholar
- Liu YP, Haasnoot J, ter Brake O, Berkhout B, Konstantinova P: Inhibition of HIV-1 by multiple siRNAs expressed from a single microRNA polycistron. Nucleic Acids Res 2008, 36: 2811-2824. 10.1093/nar/gkn109PubMedPubMed CentralView ArticleGoogle Scholar
- Marchler-Bauer A, Anderson JB, Derbyshire MK, DeWeese-Scott C, Gonzales NR, Gwadz M, Hao L, He S, Hurwitz DI, Jackson JD, et al.: CDD: a conserved domain database for interactive domain family analysis. Nucleic Acids Res 2007, 35: D237-240. 10.1093/nar/gkl951PubMedPubMed CentralView ArticleGoogle Scholar
- Shafer RW, Stevenson D, Chan B: Human Immunodeficiency Virus Reverse Transcriptase and Protease Sequence Database. Nucleic Acids Res 1999, 27: 348-352. 10.1093/nar/27.1.348PubMedPubMed CentralView ArticleGoogle Scholar
- Granados CAD, Álvarez C, Prada G, Martínez FL, Sarmiento CA: Guía para el manejo de VIH/sida Basada en la evidencia. In Modelo de gestión programática en VIH/sida. COLOMBIA: Programa de Apoyo a la Reforma de Salud Ajuste a los Planes de Beneficio y la Unidad de Pago por Capitación. MPS; 2006.Google Scholar
- Eddy SR: Profile hidden Markov models. Bioinformatics 1998, 14: 755-763. 10.1093/bioinformatics/14.9.755PubMedView ArticleGoogle Scholar
- Shannon CE: A Mathematical Theory of Communication. The Bell System Technical Journal 1948, 27: 379-423. 623-656View ArticleGoogle Scholar
- Freier SM, Kierzek R, Jaeger JA, Sugimoto N, Caruthers MH, Neilson T, Turner DH: Improved free-energy parameters for predictions of RNA duplex stability. Proc Natl Acad Sci USA 1986, 83: 9373-9377. 10.1073/pnas.83.24.9373PubMedPubMed CentralView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215: 403-410.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.