A comprehensive analysis of the naturally occurring polymorphisms in HIV-1 Vpr: Potential impact on CTL epitopes

The enormous genetic variability reported in HIV-1 has posed problems in the treatment of infected individuals. This is evident in the form of HIV-1 resistant to antiviral agents, neutralizing antibodies and cytotoxic T lymphocytes (CTLs) involving multiple viral gene products. Based on this, it has been suggested that a comprehensive analysis of the polymorphisms in HIV proteins is of value for understanding the virus transmission and pathogenesis as well as for the efforts towards developing anti-viral therapeutics and vaccines. This study, for the first time, describes an in-depth analysis of genetic variation in Vpr using information from global HIV-1 isolates involving a total of 976 Vpr sequences. The polymorphisms at the individual amino acid level were analyzed. The residues 9, 33, 39, and 47 showed a single variant amino acid compared to other residues. There are several amino acids which are highly polymorphic. The residues that show ten or more variant amino acids are 15, 16, 28, 36, 37, 48, 55, 58, 59, 77, 84, 86, 89, and 93. Further, the variant amino acids noted at residues 60, 61, 34, 71 and 72 are identical. Interestingly, the frequency of the variant amino acids was found to be low for most residues. Vpr is known to contain multiple CTL epitopes like protease, reverse transcriptase, Env, and Gag proteins of HIV-1. Based on this, we have also extended our analysis of the amino acid polymorphisms to the experimentally defined and predicted CTL epitopes. The results suggest that amino acid polymorphisms may contribute to the immune escape of the virus. The available data on naturally occurring polymorphisms will be useful to assess their potential effect on the structural and functional constraints of Vpr and also on the fitness of HIV-1 for replication.


Introduction
Humoral and cellular responses have been implicated in controlling viral and bacterial infections in addition to the host's innate immune responses. This is, indeed, demonstrated in the context of HIV-1 infection [1][2][3]. Specifically, CTL responses against the virus have been shown to limit the virus replication at a low level in the infected individuals. This is evident in the inverse correlation of CTL responses vs. virus load observed in acutely infected individuals [4][5][6]. Utilizing the rhesus macaque/SIV infection model, a suppressive effect on virus replication was shown for CTLs [7]. However, the initial CTL responses are not able to contain the virus at a later stage, possibly due to the emergence of viral variants that evade the immune responses resulting in continued virus replication [8,9]. Hence, an understanding of the CTL escape variants of HIV is important both in natural viral infections and also in the context of vaccine-induced immunity for developing effective CTL based polyvalent vaccines for containing diverse HIV-1 strains [10]. This is an area of research which is actively being pursued by several investigators [11,12].
The genome of HIV-1 has been shown to code for two regulatory proteins (Tat and Rev) and four auxiliary proteins (Vif, Vpr, Vpu and Nef) in addition to the Gag, Pol, and Env structural proteins [13]. The regulatory proteins Tat and Rev are essential for virus replication. Rev is involved in the transport of genomic and partially spliced subgenomic mRNA from the nucleus to the cytoplasm [14]. Tat is known as an activator of transcription of viral and cellular RNA. Vif plays an important role in HIV-1 replication in peripheral blood mononuclear cells (PBMC). Specifically, Vif prevents hypermutation in the newly made viral DNA through its interaction with APOBEC3G [15,16]. Vpr is known for its incorporation into the virus particles. The interaction of Vpr with the Gag enables its incorporation into the virus particle. Vpr is a multifunctional protein and is involved in the induction of apoptosis, cell cycle arrest, and transcriptional activation [17]. Vpu plays a role in the particle release and degradation of CD4 [14,18,19]. The features of Nef include downregulation of cell surface receptors, interference with signal transduction pathways, enhancement of virion infectivity, induction of apoptosis in bystander cells, and protection of infected cells from apoptosis [20][21][22][23][24].
Based on the data reported so far, it is clear that HIV-1 employs multiple strategies to successfully replicate in the infected individuals [14,25,26]. The enormous genetic variation that is generated through errors of reverse transcriptase enzyme may provide a pool of variants to evade the host immune responses against the virus and also result in the emergence of drug resistant viruses during treatment. In addition, it is also likely that the immuno-suppressive effects of HIV-1 encoded proteins may attenuate the host immune responses in favor of the virus.
Upon infection of target cells by the virus, viral proteins are synthesized for carrying out the functions related to the virus replication and also exert effect on specific host cell functions. In addition, viral proteins are also targeted to the proteosomal degradation pathway. This process results in the generation of peptides, which are then translocated to the ER through TAP and are presented on the cell surface in association with human leukocyte antigen (HLA) class I molecules. The genetic variability present in the coding sequences of the virus may result in viral proteins with alterations in the CTL epitopes, which may lead to defective processing, presentation or lack of recognition of the epitope by the reactive CTLs. This is the likely mechanism of the CTL escape by HIV-1 and other viruses. The presence of multiple CTL epitopes has been demonstrated in HIV-1 proteins including Gag, Pol, Vif, Vpr, Tat, Rev, Vpu, Env and Nef. Though the characterization of the epitopes with respect to the viral proteins is achievable in individual cases, such an analysis at a population level is difficult to carry out for the following reasons: i) HIV-1 exhibits high genetic variation in different regions of the genome. The extent of heterogeneity among circulating HIV-1 strains is described to be in the range of 20% or more in relatively conserved proteins and up to 35% for Env protein [11]. In addition, there is also extensive diversity among HIV-1 within a subtype, ii) There are multiple subtypes of HIV-1, and iii) There are variables at the HLA loci. On the other hand, this limitation can be overcome to some extent by utilizing alternative approaches where information about CTL epitopes and their variants can be inferred from the sequences available for HIV-1 [27][28][29]. The HIV sequence database has information about the viral isolates from different parts of the world. This information can be used as a source to assess the extent of naturally occurring polymorphisms and their potential impact on CTL epitopes. We hypothesize that mutations or alterations in the residues which are part of the CTL epitope in the Vpr molecule are likely to affect the epitope at multiple levels (processing and recognition of the epitope). Recently, studies have addressed this issue using full length or partial HIV-1 genome sequences [30]. This has prompted us to carry out a comprehensive analysis of the extent of variation at the amino acid level in the auxiliary gene product Vpr of HIV-1.
The underlying reasons for the selection of Vpr for a comprehensive analysis are the following: i) Vpr is a virion associated protein, ii) Vpr plays a critical role for the replication of virus in macrophages, iii) Vpr is a transcriptional activator of HIV-1 and heterologous cellular genes, iv) Vpr arrests cells at G2/M, v) Vpr induces apoptosis in diverse cell types, vi) Vpr exhibits immune suppressive effect, vii) Vpr is present in the body fluids as an extracellular protein, viii) Vpr is highly immunogenic, ix) Vpr is a small protein comprising only 96 amino acids and x) Structural information for the whole Vpr molecule is available through NMR [17,[31][32][33][34]. These features enable a detailed analysis of the polymorphisms in Vpr with respect to CTL epitopes, structure-function of the protein, and fitness of the virus for replication.
In this study, we have analyzed the predicted amino acid sequences of Vpr from global HIV-1 isolates available through the HIV database. Specifically, the extent of genetic variation in Vpr in the form of polymorphisms at the individual amino acid level was comprehensively analyzed. Several of the amino acid polymorphisms were found to be part of the experimentally verified and predicted CTL epitopes. The location and nature of the variant amino acid were found to affect the CTL epitope considerably. Hence, our results provide a glimpse into the genetic footprints of immune evasion in Vpr.

Materials and methods
The goal of our studies is to assess the nature and extent of polymorphisms at the level of individual residues in the Vpr molecule. The sequences considered here comprise Vpr sequences derived from all the major subtypes of HIV-1. The details regarding the subtypes and the number of sequences from each subtype are presented in Table 1 and are taken from the HIV database http://www.hiv.lanl.gov [35][36][37][38]. In addition, we have included Vpr sequences derived from HIV-1 positive long term non-progressors (McKeithen et al., unpublished data). It should be noted that we have also included Vpr from SIV isolated from chimpanzees, as this is likely the progenitor virus for HIV-1. Vpr sequences from the database were accessed in January of 2007. The deletions in the Vpr molecule were excluded from our analysis. The alignment of Vpr sequences (which is available from the authors upon request) was analyzed manually for variant amino acids at the level of individual residue in Vpr from global and distinct subtypes of HIV-1.

Characteristics of Vpr sequences selected for this study
The alignment of Vpr sequences has enabled us to analyze the differences at the level of each residue from diverse HIV-1 isolates. A total of 976 Vpr sequences have been used for alignment. The polymorphisms, with respect to the length, have been noted in Vpr by several investigators [17,39]. As this may pose problem for our analysis, our alignment does not take into account both deletions and insertions. The Vpr alleles are from diverse subtypes and include 67, 294, 185 and 44 Vpr sequences representing subtype A, B, C, and D, respectively ( Table 1). The O, AE, AG, and cpx groups represent 39, 45, 39 and 28 Vpr sequences, respectively. Since the Vpr sequences are derived from different sources such as viral RNA, cloned viral DNA and proviral DNA from tissues, we have not made attempts to classify them in our analysis.

N-terminus of Vpr (residues 1-16)
The results presented in Table 2 regarding the N-terminal domain of Vpr show that all the residues excluding the initiator methionine are susceptible for alterations. The altered amino acids or polymorphisms at each residue are indicated as variant amino acids or substitutions. For convenience, we have used Vpr from NL4-3 proviral DNA as a reference sequence. The amino acid sequence of NL4-3 Vpr is similar to HIV-1 subtype B consensus Vpr except for residues 28(S), 77(Q) and 83(I). Interestingly, the residue 9, which is G, has only one variant amino acid. In an earlier study, it was noted that a change in residue 3 from Q to R was not associated with cytopathic effect [41]. In our analysis, variant amino acids H, L, M, and P were also noted for Q. Studies involving synthetic peptides corresponding to the N-terminus and also the full-length Vpr The impact of the majority of the polymorphisms on Vpr functions is not clear. Substitution of alanine for proline at residue 5 and 10 showed less or increased virion incorporation of Vpr, respectively [42]. Similarly, substitution of alanine for residue 12 reduced the cell cycle arrest function of Vpr [43]. On the other hand, substitution at residue 13 and 14 showed an increase in cell cycle arrest [42,44]. Hence, the naturally occurring polymorphisms are likely to affect the functions of Vpr.

Helical domain I (HI residues 17-33)
NMR studies of full length Vpr show that a region comprising the residues 17-33 adapt a helical structure. This was also predicted by several algorithms. The polymorphisms observed for the residues 17-33 are presented in Table 3. The characteristics of the residues with respect to the variant amino acids are the following: residues 18, 23 and 26 show two substitutions; residue 20 has three substitutions; residues 25, and 29 show four substitutions;    Several laboratories including ours have reported on the importance of residues in the helical domain I for Vpr functions. Substitution of a proline residue for glutamic acid (residue 17, 21, 24, 25, and 29) has a drastic effect on the stability, subcellular localization, and virion incorporation of Vpr [44][45][46][47][48][49]. The variant amino acids noted in this domain have the potential to destabilize and disrupt the function of Vpr. Similarly, substitution of alanine for leucine residue affected the stability and virion incorporation of Vpr [45,48,[50][51][52][53]. Based on the studies reported, varying amino acid arginine for histidine at residue 33 will affect the subcellular localization and virion incorporation of Vpr [54].

Interhelical domain I (residues 34-37)
This region is present between helical domains I and II and comprises only four residues. It has been shown that residues in this region have the ability to form a γ-turn.
The naturally occurring polymorphisms in this region are presented in Table 4. Site-specific mutagenesis studies have shown an important role for residues in subcellular localization, cell cycle arrest, apoptosis and virion incorporation of Vpr [42,44,51,55,56].

Helical domain II (residues 38-50)
Studies with peptide (1-50 amino acids) and full-length Vpr have shown that residues 38-50 correspond to helical domain II of Vpr. The naturally occurring polymorphisms corresponding to the residues in this region are presented in Table 5.  [43,44,50,56].

Interhelical domain II (residues 51-54)
This region is located between helical domains II and III.
Of the four residues which are part of this domain, only the residue G51 has been shown to reduce G2/M cell cycle arrest through alanine substitution [44]. The naturally occurring polymorphisms corresponding to the residues in this region are presented in Table 6. The characteristics of the substitutions are the following: residue 54 shows two substitutions; residue 51 shows three substitutions; residue 52 shows four substitutions and residue 53 shows five substitutions. The variant amino acids reach a total of fourteen and the majority of them are non-conservative substitutions.

Helical domain III (residues 55-77)
The presence of helical domain III has been demonstrated by NMR [40]. Several laboratories including ours have shown the importance of this domain for the function of Vpr. The naturally occurring polymorphisms noted for the residues in this region are presented in  [44,[57][58][59][60][61][62].
Additionally the LXXLL domain is also involved in Vpr-GR interaction and its subsequent role in virus replication [63,64].

C-terminus of Vpr (residues 78-96)
The naturally occurring polymorphisms corresponding to the residues in the C-terminus of Vpr are presented in This domain contains multiple arginine and serine residues. It has been reported that the arginine residues are important for the cell cycle arrest and subcellular localization [65,66]. Vpr is known to undergo post-translational modification and the serine residues located at 28, 79, 94, and 96 positions of the protein serve as substrates for the phosphorylation [67]. Vpr, devoid of phosphorylation through site-specific mutagenesis, severely affects replication of HIV-1 in macrophages [68]. Residue (Table 9), the residue 7 (D) has residue N substitution with a frequency of  6.2%. Also, while the reference Vpr allele has Y at position 15, which is the predominant amino acid (85%), the variant amino acid F occurs to a limited extent (6.9%). Similar scenario is also applicable to the residues 28, 77, and 83 (Tables 10 and 15). The residue R 80, which has been implicated in cell cycle arrest function of Vpr, exhibits substitution of A with a frequency of 5.1%.

Impact of amino acid polymorphisms on defined and predicted CTL epitopes in Vpr
It has been shown that a single amino acid change in the epitope enables the virus to evade the T cell surveillance [9,69]. Hence, it is of interest to analyze the polymorphisms in the context of both experimentally verified and predicted CTL epitopes. As Vpr is a highly immunogenic protein, several CTL epitopes have been already defined [12]. CD8+ epitopes are contiguous and nine amino acids long. The experimentally verified CTL epitopes in Vpr are presented in Table 16 with their location in the protein.
We have presented the overall amino acid polymorphisms for each of the epitope. The experimentally verified CTL epitopes cluster in the region covering 1-70 residues of Vpr. The total amino acid polymorphisms range from 36 to 107 for the individual epitopes. For example, the CTL epitope comprising the residues REPHNEWTL contains 53 variant amino acids. Residues at position 1 to 9 of the epitope show 3, 6, 4, 11, 10, 6, 2, 8, and 3 variant amino acids, respectively.
In addition, we have also utilized bioinformatics approach to assess the effect of polymorphisms on CTL epitope http://Bimas.dcrt.nih.gov/molbio/hla-bind. The predicted CTL epitopes with respect to several HLA class I alleles are presented in Table 17. The impact of polymorphisms on the CTL epitope was assessed by determining the estimate of half-time of disassociation of the molecule    H R (3.6) containing the epitope. For this purpose, we have considered 3, 1, 2, and 6 epitopes corresponding to HLA-A2, Cw-4, HLA B-7 and HLA B-2705, respectively. The influence of variant amino acids on the CTL epitope is presented in Table 18, 19, 20 with respect to HLA-A2 molecule. The epitopes considered for analysis correspond to residues 18-26, 38-46, and 66-74 of Vpr. While the reference peptide of the epitope located at residues 18-26 (Table 18) Table 20. The results show that both the location and nature of the amino acid have an effect on the half-time disassociation of the molecule, which may lead to defective processing, presentation, and recognition of the epitope.

Discussion
Viral infections in individuals generally lead to a scenario where the virus is confronted by the host immune system involving both innate and adaptive immune responses.
Regarding the latter, cellular and humoral immune responses have been shown to play a role in the control of infections of viruses including HIV-1 [70,71]. It has been suggested that an understanding of the correlates of protective immunity is an important requirement for the development of vaccines against HIV-1. Several studies have been published on this subject [71][72][73]. These studies point out a role for CD8+ and CD4+ T cell responses and neutralizing antibodies in the control of HIV-1 replication. For example, it has been reported that CD8+ cells control HIV-1 in the acutely infected individuals [4][5][6]. The relevance of CD8+ T cells for the control of virus infection was also shown in the case of SIV infected rhesus macaques [74,75]. Recently, the published data on CD8+ T cells in acute and chronic HIV-1 infection revealed that CTL epitopes are present in all of the proteins encoded by HIV-1. Virus replication, however, is not completely contained due to the emergence of CTL escape variant viruses. Based on this, it is suggested that vaccine efforts to control HIV-1 should take into account the high genetic variability noted among HIV-1.
The continued emergence of genetic variants is a characteristic feature of RNA viruses. RNA dependent RNA polymerase and reverse transcriptase are error-prone  (3.6) , G (1.1) , K (0.4) , L (1.8) , M (2.5) , N (0.4) , P (16) , R (0.4) , S (0.7) , T (7.6) , V (19.3)   enzymes and have been implicated as a cause for the generation of variants [76,77]. The mutational changes in the protease and reverse transcriptase, depending on their location, may impact on their binding inhibitors targeting these enzymes. The viruses containing alterations may then be able to evade the inhibitory activities of the agents and are designated as drug-resistant variants. Similarly, the mutations in Env, Tat, and possibly other proteins can also evade the neutralizing antibody, CTL and T-helper cell responses [12,71]. The emergence of escape variants eventually repopulates the body in the face of immune responses against the virus. It has been suggested that immune escape may be a key step in the evolution of HIV-1 [30,[78][79][80].
In an effort to understand the overall polymorphisms in a HIV-1 gene product, we undertook a comprehensive analysis of the predicted amino acid sequences of Vpr from diverse HIV-1 subtypes. Considering the genetic variation noted in diverse HIV-1 [39], our hypothesis is that the differences in Vpr and other viral proteins may enable the viruses to escape the host immunological pressures. To address this issue, we have initially compiled the polymorphisms in Vpr at the level of individual amino acid. Vpr contains only 96 amino acids. Hence, the small size of the protein is an advantage for a comprehensive analysis. For this purpose, we have turned to the Vpr sequences which are available in the HIV database and also sequences from specific groups such as HIV-1 positive long-term non-progressors. A total of 976 predicted Vpr amino acid sequences were used for our studies. The analysis revealed several characteristic features with respect to the individual amino acids in the Vpr. Of the 96 amino acids, all the amino acids except the initiator methionine have the propensity to change. This indicates that Vpr molecule is highly flexible in nature. The frequency of the variant amino acids, calculated for subtype B Vpr at the level of individual residue, revealed that substitution is very low for most of the residues. This suggests that many of the substitutions in Vpr may compromise the function and possibly the fitness of the virus. Interestingly, there are several amino acids that can accommodate ten or   Essex [27] also showed that the proportion of polymorphic amino acids ranged from a low of 55% (RT, IN) to a high of 94% (Vpu). In our analysis, Vpr variability is high which may likely be due to the inclusion of diverse isolates including the HIV-1 progenitor virus SIVcpz.
Vpr is known as a highly immunogenic protein. The presence of CTL epitopes verified through experimental approaches has been reported by several groups [12]. These include the region encompassing residues 9-70 of   Vpr. Of the 96 residues, 62 (65%) have been shown to be associated with experimentally defined CTL epitopes. The data presented in Table 16 show that there are polymorphisms with respect to the experimentally verified CTL epitopes. The presence of variant amino acids at distinct locations within the epitope is likely to impact the CTL epitope. Further, we have also evaluated the effect of Vpr polymorphisms on CTL epitopes using the bioinformatics approach by calculating the estimate of half time of disassociation of the molecule containing the epitope. Such an analysis predicted several CTL epitopes all over Vpr including the C-terminus with respect to specific HLA class 1 molecules. The detailed analysis was carried out for different HLA alleles (HLA-A2, Cw-4, HLA-B7 and HLA-B2705) involving a total of 12 epitopes. The polymorphisms have also been analyzed for three predicted epitopes corresponding to residues 18-26, 38-46, and 66-74. The substitution of the variant amino acids for the residues comprising the epitope resulted in a drastic reduction in the value corresponding to the half time of the disassociation of the molecule containing the epitope. It should, however be noted that additional in vitro binding studies are necessary to confirm the predicted values.
Based on the data presented here, the amino acid polymorphisms noted in Vpr have the potential to contribute to the escape of the virus along with the epitopes present in other HIV-1 proteins [30]. It is also likely that the information regarding the polymorphisms at the CTL epitope will provide an opportunity to create an epitope-based vaccine that will exert control over viral isolates from different parts of the world. It is important to mention that the extensive HLA-associated amino acid polymorphisms noted here may also impact on the structure/function of Vpr and fitness of the virus [10,[81][82][83][84][85]. The biological sources used for generating the sequence information of vpr include tissues from infected individuals, plasma viral RNA, and cloned viral DNA. For this reason, the Vpr sequences considered here for the analysis may be derived from both infectious and non-infectious viral genomes. Hence, there is a possibility that the amino acid polymorphisms noted here may or may not have a chance to be acted upon by CTL and T-helper cell pressures. It is known that amino acids in the proximal region of the epitope can also influence their immunogenic potential. The amino acid polymorphisms noted in the putative CTL epitopes can have an effect at a single and/or multiple levels in the generation of immune response: i) The mutations may eliminate the binding of the peptide to the appropriate HLA molecule, which will be presented on the cell surface. ii) Mutations may also disrupt the interaction with the Tcell receptors. iii) Mutations may disrupt the intracellular processing of the peptides. This results in the escape of the cells expressing the viral proteins from the surveillance of CD8+ T cells. The variant amino acids present in the proximal or far away from the epitope could influence through interference with the processing of the peptide from the protein. With regard to the latter, the variant amino acids may be either independent or compensatory in relation to changes in specific residues of Vpr. In addition, variant amino acids, which are part of overlapping epitopes presented by different HLA molecule, can also exert an influence on the epitope [30].
HIV variability is an important factor that should be taken into account in the efforts directed towards the develop-   dues cluster around a sequence shared by HIV-1 isolates of different subtypes. It is likely that the influence of the residues on the fitness of the virus counters the variability, thus limiting the genetic variation. The information on Vpr polymorphisms will be of value for the development of vaccines based on the auxiliary genes of HIV-1.

Authors' contributions
AS, VA, AK, AB, VS, RC and AC participated in the analysis of the predicted amino acid sequences of Vpr. SM, DD and BS provided information regarding the structure-function of Vpr. NM and RM contributed to the analysis of polymorphisms in Vpr from the structural angle. AS, VA, SM, VS, AC, and RC were involved in the preparation of the manuscript. All authors read and approved the final manuscript.