Promiscuous prediction and conservancy analysis of CTL binding epitopes of HCV 3a viral proteome from Punjab Pakistan: an In Silico Approach

Background HCV is a positive sense RNA virus affecting approximately 180 million people world wide and about 10 million Pakistani populations. HCV genotype 3a is the major cause of infection in Pakistani population. One of the major problems of HCV infection especially in the developing countries that limits the limits the antiviral therapy is the long term treatment, high dosage and side effects. Studies of antigenic epitopes of viral sequences of a specific origin can provide an effective way to overcome the mutation rate and to determine the promiscuous binders to be used for epitope based subunit vaccine design. An in silico approach was applied for the analysis of entire HCV proteome of Pakistani origin, aimed to identify the viral epitopes and their conservancy in HCV genotypes 1, 2 and 3 of diverse origin. Results Immunoinformatic tools were applied for the predictive analysis of HCV 3a antigenic epitopes of Pakistani origin. All the predicted epitopes were then subjected for their conservancy analysis in HCV genotypes 1, 2 and 3 of diverse origin (worldwide). Using freely available web servers, 150 MHC II epitopes were predicted as promiscuous binders against 51 subjected alleles. E2 protein represented the 20% of all the predicted MHC II epitopes. 75.33% of the predicted MHC II epitopes were (77-100%) conserve in genotype 3; 47.33% and 40.66% in genotype 1 and 2 respectively. 69 MHC I epitopes were predicted as promiscuous binders against 47 subjected alleles. NS4b represented 26% of all the MHC I predicted epitopes. Significantly higher epitope conservancy was represented by genotype 3 i.e. 78.26% and 21.05% for genotype 1 and 2. Conclusions The study revealed comprehensive catalogue of potential HCV derived CTL epitopes from viral proteome of Pakistan origin. A considerable number of predicted epitopes were found to be conserved in different HCV genotype. However, the number of conserved epitopes in HCV genotype 3 was significantly higher in contrast to its conservancy in HCV genotype 1 and 2. Despite of the lower conservancy in genotype 1 and 2, all the predicted epitopes have important implications in diagnostics as well as CTL-based rational vaccine design, effective for most population of the world and especially the Pakistani Population.


Background
Family Flaviviridae comprises small enveloped pathogens classified in three genera: Flavivirus, Pestivirus, and Hepacivirus. Members of these genera cause various diseases in humans and other animals such as birds, horses and pigs. The only genera Flavivirus contain more than 70 members including Hepatitis C Virus (HCV), Dengue virus, West Nile virus and tick-borne encephalitis virus [1][2][3].
HCV is a positive sense RNA virus affecting approximately 180 million people world wide and rate of infection in Pakistani population is about 10 million [4,5]. HCV genome contributes about 9400 nucleotides that encode single polyprotein of approximately 3010 to 3033 amino acids in length [6]. This single polyprotein is processed by viral as well as host proteases into three structural proteins (i.e. core, E1 and E2) and four nonstructural proteins (i.e. NS2, NS3, NS4, and NS5A) [7]. HCV mainly spreads via blood supply, reuse of glass syringes and needles, unsterilized medical equipment, use of tooth brushes of HCV patients, etc [7] and causes of acute and chronic infections [8]. Clinical demonstrations of acute Hepatitus C Viral infection include Jaundice, Fever, Myalgia, Fatigue, Lethargy, Increased ALT, Anorexia and Fulminant hepatic failure [7]. About 80% of HCV infected individuals develop chronic infections [9]. Chronic liver infections develop chronic hepatitis, cirrhosis and hepatocellular carcinoma within a period of 10, 20 and 30 years respectively followed by viral infection [10,11]. Out of 70-80% chronically infected individuals, 20% develop cirrhosis and 1-5% individuals suffer from final stage of liver diseases [12]. Hepatic steatosis is the accumulation of lipids in hepatocytes and is reported for the cause of cirrhosis [13] with the more severe cases being reported in patients infected with HCV genotype 3a [14]. The prevelance of steatosis in Pakistani population is about 61.5-65.5% compared with 32.8-81.2% in western countries [15]. The percentage of males infected with HCV chronic liver stage is higher then females with the age of patients between 40-50 years [5].
HCV is classified into six genotypes each heaving various subtypes [16][17][18]. These genotypes are distributed differently in various parts of the world with the genetic variance between them is about one third. The genotypes 1, 2 and 3 have world wide distribution. But the significant differences are observed in subtype distribution. Subtype 1a is mostly found in North America and Europe followed by 2b and 3a. Subtype 1b is frequently found in South East Europe and Tunisia and 2c in North Italy. Genotype 4 is mainly restricted to Middle East and Central Africa and genotype 5 in South Africa. Genotype 6 is distributed throughout South East Asia and also being isolated from Hong Kong and Vietnam [17]. The most frequent HCV genotypic distribution in Pakistan is 3a [49.05%] followed by 3b [17.66%] [19]. The knowledge of HCV distribution is crucial for treatment therapy and vaccination because of its predictive value in terms of response to antiviral therapy and vaccination. Effective responses to antiviral therapy are normally associated with genotype 2 and 3 in comparison to any other genotype [20].
HCV replicates at about 10 12 new HCV viruses/day. Replication is carried out by RNA dependent RNA polymerase. RNA polymerase lacks the "proofreading" ability that ensures the high mutation rate of about 8-18 mutations in genomic RNA/year [21,20]. Such a high mutation rate limits the treatment therapy and vaccination. The current treatment therapy for HCV is INF alpha along with ribavirin limited to about 50% population [22]. Although the response rate is not much deterring, but high dosage, long-term treatment and side effects limits the usage [23,21]. There is the possibility that after next few years, new antiviral agents such as inhibitors of the viral protease, helices or polymerase will further improve the response rate of the current therapeutic agents. However, antiviral therapy is not affordable in most developing countries, where the prevalence of HCV is generally the highest. Thus, given the huge reservoir of HCV worldwide, the development of an effective vaccine may be the cheapest way to control disease associated with HCV infection.
Development of an effective HCV vaccine requires understanding of immune response. Viral immune response is associated with Major Histocompatabiliy complex protein (MHC) and T lymphocytes/T cell. MHC are classified into 2 broad categories, MHC I and MHC II [24]. MHC initially recognizes the viral antigenic epitopes and presents to T lymphocytes for degradation. MHC I presents the antigenic epitopes to CD8+ T cells and MHC II presents to CD4+ T cells for viral degradation [25,26]. CD8 T cells also referred to as cytotoxic T cells (CTL or Tc), limit viral infections by initial recognizing and their subsequent killing infected cells and secreting cytokines. CD4 T referred to as helper cells or Th cells and provides growth factors and signals for generation and maintenance of CD8 T cells [27]. T cells recognize the antigens only when they are associated with MHC, surface glycoprotein exposed on surface of all vertebrate cells. The selection of T cell epitopes is also important because these are linear and hence easy to synthesize.
A particular vaccine developed against HCV can't be effective for Pakistani population because of variations in HCV genomic sequences and distribution with regard to geographical area. Since a large number of Pakistani population is infected by HCV3a and number of patients enrolled in public and hospitals is increasing day by day. So there is a current need to develop a vaccine against HCV in particular to HCV3a that will cover approximately maximum Pakistani population. The current vaccines are DNA vaccine, Peptide vaccine and epitopic vaccines. Epitopes are the small antigenic segments of viral proteins and causes infections in the host. Epitopic vaccines provide more potent and controlled immune response and eliminates the potential lethal effects of the use of whole viral proteins [28]. Promiscuous epitopes (epitopes capable of binding maximum number of HLA alleles) may overcome the population coverage. Secondly the conserved epitopes reduces antigen escape associated with the viral mutation [29]. So the present study was designed for the prediction of promiscuous epitopes and to analyze their conservancy in general population. Any mutation in the peptide/epitope will lower the conservancy, so it was hypothesized to analyze the pI value of the mutated amino acid residue, that if remain in the range as was in original epitope provides the likeliness of that particular epitope to be used for epitopic vaccine design having an effective control over viral mutation, immune response with minimum side effects.

Sequence Retrieval and Analysis
The sequence of fully sequenced HCV 3a genome and protein of Pakistani origin was retrieved from NCBI [GU294484]. The number of individual bases in the genome i.e. the number of adenine; cytosine, guanine and thymine were calculated from DDBJ database. The molecular weight of proteins, percentage of highly repeated amino acid and the least repeated amino acid in the viral protein was calculated by using sequence and search analysis tool at PIR database (http://pir.georgetown.edu/).

Epitope Prediction
Promiscuous epitopes of HCV 3a viral proteins were predicted for HLA I and HLA II binding alleles using freely available immunoinformatics tools such as ProPred I, and ProPred respectively. In comparison to other epitope prediction tools, Propred 1 and Propred cover maximum number of Human Leukocyte antigens i.e. HLA and being used for epitopic prediction for HBV and tuberculosis. ProPred1 allows the user to predict antigenic apitopes for 47 MHC I alleles and ProPred allows epitopes prediction for 51 MHC II alleles. Predictions through these tools can be carried out at various thresholds from 1 to 10%. The algorithms designed for the working of these tools are based on linear coefficients of matrices. Maximum of the matrics were retrieved from BIMASS where the score of each peptide is calculated in multiplication and/or sum up manner. For example the score of following peptide "PACDP-GRAA" can be calculated by using following equation: Score = P(1) × A(2) × C(3) × D(4) × P(5) × G(6) × R (7) × A(8) × A(9) Score = P(1) + A(2) + C(3) + D(4) + P(5) + G(6) + R (7) + A(8) + A (9) Where P (1) is score of P at position 1.
Only the promiscuous epitopes with score higher than the chosen threshold score were assigned as predicted epitopes for the selected HLA alleles [30]. For the following study the default threshold i.e. 4% was used where the sensitivity and specificity are nearly the same for most of the HLA alleles available in ProPred1 and ProPred server. Moreover, MHC I alleles were predicted by keeping the proteosome and immunoproteosome filters on at 5% threshold because most of the MHC binders having a proteosomal cleavage site at C-terminal have higher likelihood to be T-cell epitopes [31]. The predicted promiscuous epitopes were positioned in the table in a decreasing order of their score.

Epitope Conservancy Analysis
All the predicted epitopes of HCV 3a proteins of Pakistani origin were subjected for worldwide conservancy analysis among HCV genotype 1, 2 and 3. 5 sequences against each HCV protein (used for epitope prediction) were retrieved from NCBI randomly. The predicted epitopes of HCV 3a (Pakistani origin) along with 5 selected sequences of individual genotypes (genotype 1, 2 and 3; one at a time) were submitted to epitope conservancy analysis tool available at IEDB database (http://tools. immuneepitope.org/tools/conservancy/iedb_input). All the epitopes having 77-100% conservancy were selected while rejecting the epitopes having variation at the anchor residues. The anchor residues in the predicted epitopes were highlighted by making it bold. The epitopes that were 100% conserved in the selected proteins of the 3 viral genotypes 1, 2 and 3 were also fully bold. Epitopes with 88/77% conservancy were with single or double amino acid variation respectively and to highlight them bold format was used in the conservancy column against each genotype.
Asteric sign (*) indicates that one out of five selected sequences either does not respond to epitope conservancy or have conservancy lower then 77%. Double asteric sign (**) indicates that only one sequence responds for 77-100% conservancy to the selected epitope.

Validation of varied amino acids using pI value
The Peptides with single or double amino acid variation were analyzed for their hydropathic characteristics or pI value [32]. The pI gives the information that the varied residue retained the amino acid group or diverted from its normal group in a particular peptide under consideration and thus provides information to be used or their rejection. All the varied amino acid residue with diverted group (with considerable change of pI value) were separated from other using superscript "D" for single variation and "DD" for diverted group for doubly varied residues. The superscript "D" in doubly varied residues of particular peptides represents the partial variation i.e. one of the varied residue retained the amino acid group while other residue shifted the amino acid group by a considerable change of pI value.

Results
HCV 3a genome of Pakistani origin comprises 9474 bp with GC content 2622 and 2700 respectively. The GC contents are 12.35% higher then AT contents. The genome encodes a polyprotein that subsequently get fragmented into structural and non structural protein of obvious molecular weight. The envelope protein E2 comprises highest moleculat weight 38755.3 KDa (Table 1).
Leucine (L) a neutral nonpolar amino acid residue has the highest percent of repetition (13.1%) in E2 protein.
The least repeated residue of E2 is a basic polar Lysine (K) (1.4%). The shortest segment viral protein is NS4a (5751.69 KDa molecular weight) comprising 54 amino acid residues. Leucine (L) and Valine (V) have highest percentage of repetition (14.8) and Histidine (H), Methionine (M), Threonine (T) and Tryptophan (W) are the least repeated amino acid residues (1.9%). The molecular weight of other viral proteins and percent repetition of their amino acid residue for were listed in Table 1. The percentage of amino acid residues gives an out look for their pI value and their probability of incidence in the antigenic epitopes. F (Phenylalanine), I (Isoleucine), L (Leucine), M (Methionine), V (Valine), W (Tryptophan) and Y (Tyrosine) were mainly the anchor residues for MHC II predicted epitopes and are nonpolar in nature. Total 150 epitopes were predicted against 51 alleles of MHC II ( Table 2). The highest number of epitopes was represented by E2 protein comprising 20% of all MHC II predicted epitopes. VFLLNPCGL, FVILVFLLL, WHINSTVLH, FNLLDVPKA, LELINTHGS, VQYLYGVGS are the promiscuous binders of 45-50 MHC II alleles. E2 is followed by NS2 and NS4B proteins representing 14.66% of the predicted MHC II epitopes. In case of NS2 VRAHVLVRL, VILLTSLLY and VRLCMFVRS are the best binders both in term of score and the HLA allele coverage (50-51 MHC II alleles). FFNILGGWV, VNLLPAILS and VVNLLPAIL are the best binders of NS4b protein both in terms of HLA coverage (41 HLA coverage for the first epitope and 51 for the next 2 epitopes) and binding efficiency. LVVGVICAA, FNILGGWVA, WQKLEAFWH, IQY-LAGLST and VVGVICAAL are also the epitopes of good quality covering 31 to 35 HLA alleles available in ProPred. For the NS5a_1a only three epitopes (MRLAGPRTC, FISCQKGYK and VVSTRCPCG) were predicted as promiscuous binders with the binding score higher then the selected threshold. Out of these three epitopes MRLAGPRTC is capable of binding all the HLA alleles available in ProPred server while FISCQKGYK and VVSTRCPCG bind 22 and 25 HLA alleles respectively. The predicted promiscuous binders against other proteins were also summarized in table 2.
Total 69 epitopes were predicted as promiscuous epitopes for MHC I alleles. The anchor residues in case of MHCI are quite varying both in amino acid residues and also in their nature. Mostly represented anchor residues are neutral nonpolar and neutral polar. However, quite small percentage of anchor residues were also acidic polar and basic polar in nature. The highest number of MHC I binding epitopes were represented by NS4b protein comprising 26% of all MHC I predicted epitopes. NFVSGIQYL epitope of NS4b is the best promiscuous binder of highest binding score. NS4b is followed by NS2, E2 and NS3 proteins representing 20.28% (NS2 epitopes) and 11.59% (for E2 and NS3). In case of NS2, 14 promiscuous epitopes were predicted with varying binding efficiency. GSRDGVILL, DGVILLTSL, WAAAGLKDL and LQVWVPPLL are the good binders both in term of score and the HLA allele coverage (21, 28, 27 and 28 alleles respectively). E2 predicted epitopes covers 20 to 28 HLA alleles except the PLLHSTTEL epitope that covers only 11 HLA alleles but with highest binding efficiency. NS3 epitopes covers 8 to 25 HLA alleles and were also ranked on the basis of their binding efficiency predicted by the score. The least represented epitopes were by NS5a_1a. It comprises only one epitope (HVKNGSMRL) as predicted promiscuous binders for 16 MHC I binding alleles. The promiscuous binders of MHC I for other proteins were also predicted and summarized in table 3.
Out of total 150 predicted MHC II epitopes, 75.33% were (77-100%) conserve in genotype 3 (Table 1) against the randomly selected viral proteins. Out of 75.33% conserved peptides of genotype 3, 71.68% peptides were 100% conserve while 22.12% peptides were having single residue variation (88% epitope conservancy). Only the     Bold amino acid residues in T-cell Epitope column indicates the anchor residues. Bold individual amino acid residues in HCV Genotype 1, 2 and 3 columns indicated the variation in peptide in comparison to the predicted epitope. *Indicates that one of the protein sequence selected for epitope conservancy either does not respond or have conservancy lower then 70%. ** Indicates that only one of the protein sequence from selected sequences respond to epitope conservancy. D Indicates that amino acid residue in case of single/double variation diverted their group compared to primary epitope using pI value. DD Indicates that both amino acid residues in case of double variation diverted their group compared to primary epitope using pI value.  40% peptides of singly varied residues diverted their amino acid group and the pI value while 60% singly varied residues retained the amino acid group as was in the predicted epitope of HCV 3a proteins. 6.19% peptides comprised the 77% epitope conservancy because of double residue variation in the peptides of general population in contrast to predicted epitopes of HCV 3a of Pakistani origin. Out of 6.19%, doubly varied amino acid residues 42.85% peptides retained their amino acid group and nearly same pI value as in case of predicted epitope while 28.57% peptides were having partial group divertion and 28.57% (of doubly varied amino acid residues) peptides diverted their amino acid group because of considerable variation in the pI value. Similar data was also obtained for the HCV genotype 1 and 2 consisting 47.33% and 40.66% conservancy respectively. However, in contrast to genotype 3, only 23.94% predicted epitopes were 100% conserve in randomly selected sequences of genotype 1 and 22.95% in genotype 2. Their rate of single/double residue variation was also predicted and expressed as figure 1. Out of total 69 predicted MHC I epitopes, 78.26% were (77-100%) conserve in genotype 3 (Table 2) against the randomly selected viral proteins. Out of 78.26% conserved peptides of genotype 3, 72.22% peptides were 100% conserve while 22.22% peptides were having single residue variation (88% epitope conservancy). 40.66% peptides of singly varied residues retained the amino acid group as was in the predicted epitope of HCV 3a proteins while 58.33% singly varied residues diverted their amino acid group and the pI value. 5.5% peptides comprised the 77% epitope conservancy because of double residue variation in the peptides of general population in contrast to predicted epitopes of HCV 3a of Pakistani origin. Out of 5.5%, doubly varied amino acid residues 66.66% peptides were having partial group divertion and 33.33% (of doubly varied amino acid residues) peptides diverted their amino acid group because of considerable variation in the pI value. Similar data was also obtained for the HCV genotype 1 and 2 consisting 55.07% conservancy. However, in contrast to genotype 3, only 21.05% predicted epitopes were 100% conserve in randomly selected sequences of genotype 1 and 2. Their rate of single/double residue variation was also predicted and expressed as figure 2.

Discussion
The modern technique for control of HCV infection is a vaccine preparation that can specifically induce antibody-mediated immunity. The rapid advancements in the computational methodologies and immunoinformatics/immuno-bioinformatics provide new strategies for the synthesis of antigen specific epitopic vaccine against infectious agents such as viruses and pathogens. Epitopic vaccine against HIV, malaria and tuberculosis provided promising results and supported the defensive Bold amino acid residues in T-cell Epitope column indicates the anchor residues. Bold individual amino acid residues in HCV Genotype 1, 2 and 3 columns indicated the variation in peptide in comparison to the predicted epitope. *Indicates that one of the protein sequence selected for epitope conservancy either does not respond or have conservancy lower then 70%. ** Indicates that only one of the protein sequence from selected sequences respond to epitope conservancy. D Indicates that amino acid residue in case of single/double variation diverted their group compared to primary epitope using pI value. DD Indicates that both amino acid residues in case of double variation diverted their group compared to primary epitope using pI value.  and therapeutic uses of these vaccines [33]. Thus in the present study, a new systematic immunoinformatics approach was applied for the predicted antigenic epitopes of HCV 3a proteins of Pakistani origin followed by diversity and conservancy in other genotypes (1,2 and 3) in randomly selected HCV sequences from NCBI and mainly belong to Thailand, Cuba, UK, USA, China, Japan, France, Italy and Germany. The immunogenic epitopes identified were nanomers and could be used diagnostically to detect HCV specific CTL responses in the patients and after vaccination. A CTL based HCV vaccine might not efficient enough to prevent from infection but it might protect the body from the disease. The analysis showed that the minimal number of epitopes required to represent the complete anigenicity of the whole proteins are significantly smaller then required to represent full length proteins. The majority of the epitopes reported here had intermediate to high HLA binding affinity. By the use of an efficient CTL based epitope delivery technology; the predicted epitopes could eventually become vaccines in their own or fused as polytopes. The design of the HCV vaccine using conserved epitopes can avoid viral mutation and thus provides more efficient results. The study shows that the predicted epitopes were highly conserved in HCV genotype 3 and also but less conserved in genotype 1 and 2 both for MHC I and MHC II. Moreover, to ensure the viral detection at all stages of its intracellular evolution we have used all the viral proteins. Therefore, the total number of predicted epitopes were also maximized in correspond to the number of covered proteins used for the analysis.