Structural analysis and epitope prediction of HCV E1 protein isolated in Pakistan: an in-silico approach
© Idrees and Ashfaq; licensee BioMed Central Ltd. 2013
Received: 26 September 2012
Accepted: 4 April 2013
Published: 10 April 2013
HCV infection is a major health problem causing acute and chronic hepatitis. HCV E1 protein is a transmembrane protein that is involved in viral attachment and therefore, can serve as an important target for vaccine development. Consequently, this study was designed to analyze the HCV E1 protein sequence isolated in Pakistan to find potential conserved epitopes/antigenic determinants.
HCV E1 protein isolated in Pakistan was analyzed using various bio-informatics and immuno-informatics tools including sequence and structure tools. A total of four antigenic B cell epitopes, 5 MHC class I binding peptides and 5 MHC class II binding peptides were predicted. Best designed epitopes were subjected to conservation analyses with other countries.
The study was conducted to predict antigenic determinants/epitopes of HCV E1 protein of genotype 3a along with the 3D protein modeling. The study revealed potential B-cell and T-cell epitopes that can raise the desired immune response against HCV E1 protein isolated in Pakistan. Conservation analysis can be helpful in developing effective vaccines against HCV and thus limiting threats of HCV infection in Pakistan.
Hepatitis C Virus infection is a global health problem affecting 270 million people worldwide . According to the World Health Organization, liver cancer by HCV caused approximately 308,000 annual deaths in 2004 . The number of HCV infected indviduals is increasing day by day, and there is variability in the prevalence reports of HCV in Pakistan but according to majority of studies, HCV is prevalent among 2.4-6.5% adults and among 0.44-1.6% of children . From the prevalence analysis, clearly HCV genotype 3a is most common in Pakistan .
HCV is an RNA virus like dengue virus, West Nile virus and yellow fever virus belonging to the Flaviviridae family  and has a 9.5 kb genome with a positive-single stranded RNA that encodes a large polyprotein which is cleaved to produce four structural (Core, E1, E2 and P7) and six non-structural proteins (NS2, NS3, NS4A, NS4B, NS5A, NS5B). These viral proteins are liable for viral replication and various cellular functions [5–8]. Among HCV structural proteins, envelope proteins play the primary role in viral entry. HCV envelope protein 1 (E1) is a transmembrane glycoprotein having a C-terminal domain responsible for membrane association and membrane permeability changes . E1 acts as a fusigenic subunit of the HCV envelope and contains 4–5 N-linked glycans. As it is known that the interaction of the virion with various cell receptors results in HCV infection [10, 11]. Therefore, it is important to target virus envelope proteins to stop viral entry. Although there is not much knowledge available about E1, but it is thought to be involved in intra-cytoplasmic virus-membrane fusion. Currently, the standard of care is pegylated interferon (PEG-INF) with ribavirin; this therapy gives 50% sustained virological response in genotype 1 and 80% for genotype 2 and 3 [12, 13]. One of the top priorities in HCV infection should be the development of more effective therapies by developing antiviral compounds for infected patients.
For designing effective inhibitors against envelope proteins, it is important to have knowledge of the epitopic regions/antigenic determinants of these glycoproteins. Bioinformatics analysis has opened new vistas to provide more insights into protein sequence and structural features. Both B-cell and T-cell epitopes/antigenic determinants are important in raising desired immune responses and the number of epitopes and modulation of immune recognition of antigens can be influenced by deglycosylation of viral glycoproteins . This study was designed to perform immunoinformatic analysis on the HCV E1 glycoprotein isolated in Pakistan and to analyze antigenicity, hydrophobicity, surface accessibility and epitopic location of epitopes in HCV glycoprotein structure.
Protein retrieval and comparative modeling
The HCV E1 protein sequence was retrieved from NCBI protein database using the ID: ACN92051. It was ascertained that the three-dimensional structure of the protein was not available in Protein Data Bank (PDB). Therefore, the present study was designed to predict the 3D model and to predict epitopes of HCV E1 proteins isolated in Pakistan. Primary structure analysis was performed using the Protparam online tool. The parameters computed by ProtParam  included the molecular weight, theoretical pI, amino acid composition, atomic composition, extinction coefficient, estimated half-life, instability index, aliphatic index, and grand average of hydropathicity (GRAVY) and secondary structure analysis was done using various online servers. Structure template with PDB ID 2VOV_A having 43% identity was selected for the E1 protein. This template was used as a reference to determine the 3D structures of E1. Protein Structure Prediction Server (PS)2 predicted the homology model based on a package MODELLER. Moreover, Glycosylation sites of HCV E1 of Pakistani origin were found and their conservation with other regions of the world was also checked through Multiple Sequence Alignment. For this purpose, HCV E1 protein sequences isolated in different countries were retrieved from the NCBI protein database.
Stereochemical analysis and model evaluation
Once the 3D model was generated, the Swiss-PdbViewer energy minimization test was applied to check for energy criteria in comparison with the potential of mean force derived from a large set of known protein structures. Structural evaluation and stereochemical analyses were performed using different evaluation and validation tools. Backbone conformation was evaluated by analyzing the Psi/Phi Ramachandran plot obtained from PROCHECK analysis. The Ramachandran plot of the phi/psi distribution in the model is developed using PROCHECK  for checking non-GLY residues at the disallowed regions. The Z-score is indicative of overall model quality and is used to check whether the input structure is within the range of scores typically found in native proteins of similar size. The Z-score was determined by PROSA web tool . The model was further evaluated through ERRAT . Furthermore, visualization of the generated model was performed using UCSF Chimera 1.5.3. The model generated for protein was successfully submitted to the Protein model database (PMDB) having PMID PM0078432.
T-cell epitope and B-cell epitope prediction
A systemic strategy was adapted to design potential T-cell and B-cell epitopes of HCV envelope protein. VaxiJen v2. 0 online antigen prediction server was used for analyzing the antigenicity of the E1 protein . Transmembrane topology of protein was checked using TMHMM . B-cell epitopes were predicted using the BCPREDS online server using 75% of specific criteria for epitope prediction. All the predicted B-cell epitopes were checked from whether they were present in transmembrane regions or not using TMHMM results, and epitopes exposed on the surface of the membrane were selected and were subjected to further analysis. Antigenecity of selected epitopes were again checked using the Vexijen online server. DiscoTope server predicts discontinuous B-cell epitopes from protein three-dimensional structures. Disco Top 2.0 Server  was employed for discontinuous B-cell prediction using 3D structure of the HCV E1 protein of Pakistan. Furthermore, T-cell epitopes were screened. For this, Propred-1 which predicts epitopes for 47 MHC Class-I alleles and Propred, which predicts epitopes for 51 MHC Class-II alleles were utilized. Both servers cover a maximum number of HLA (Human Leukocyte antigens), therefore, are considered acceptable for predicting epitopes. Proteasome and immunoproteasome filters were set to a 5% threshold for MHC class I alleles. MHC binders that have proteosomal cleavage site at the C - terminal have greater chances to be T-cell epitopes .
Epitope conservation analysis
Sequences of HCV E1 protein belonging to different regions of the world were retrieved from the NCBI database. A consensus sequence was drawn for each country, and all the consensus sequences were subjected to multiple sequence alignment using CLC workbench (data not shown). All the selected epitopes were checked for their conservation and variability by analyzing the multiple sequence alignment results and with the IEDB conservation analysis tool.
Structural description of the model
Protein 3D structure is very important in understanding the protein interactions, functions and their localization . Homology modeling is the most common structure prediction method. To perform the homology modeling, the first and basic step is to find a best matching template using similarity searching programs like PSI BLAST against a PDB database. Templates are selected based on their sequence similarity with query sequence. PDB ID 2VOV_A was selected for homology modeling, which is an X-ray diffraction structure of the Rev-erb Beta with resolution of 1.35 Å. Both template and target protein sequences were used to predict the 3D structure of the target protein using Protein Structure Prediction Server (PS) 2 (Figure 1B).
The 3D structure of the protein showed that it had 49 hydrogen bonds. Quality and reliability of structure were checked by several structure assessment methods, including Z-score, ERRAT and Ramachandram plots. Procheck checks the stereochemical quality of a protein structure by analyzing residue-by-residue geometry and overall structure geometry. This tool was used to determine the Ramachandran plot to assure the quality of the model. The result of the Ramachandran plot showed 84.5% of residues in the favorable region (Figure 1C, 1D). The Z-score is indicative of overall model quality and is used to check whether the input structure is within the range of scores typically found in native proteins of similar size. PROSA web was used to find the Z-score of the predicted structure. The Z - score of the protein was -0.11 (Figure 1E). Reliability of the model was further checked by ERRAT, which analyzes the statistics of non-bonded interactions between different atom types and plots the value of the error function versus position of a 9-residue sliding window, calculated by a comparison with statistics from highly refined structures. Results from ERRAT showed 71.930 overall model quality (Figure 1F). The Z-scores, Ramachandran plot and ERRAT results confirmed the quality of the homology model of the HCV E1 protein.
Glycosylation site analysis
Overall antigenicity of E1 protein was predicted to be 0.5362 indicating it as a probable antigen. Transmembrane protein topology was checked using the TMHMM online tool, and was found that residues 1–73 presented outside while residues 74–96 were within the transmembrane region, and residues 97–192 were inside the core region of the protein.
B-cell epitope prediction
Predicted B-cell epitopes
Discontinuous epitopes predicted from the 3D structure of the E1 protein using DiscoTop online server
T-cell epitope prediction
MHC class I binding peptides on the basis of antigenicity
HLA-A*0205/HLA-A24/HLA-A20/HLA-A2.1/HLA-B14/HLA-B*2702/HLA-B*2705/HLA-B*3701/HLA-B*3801/HLA-B*3901/HLA-B*3902/HLA-B*4403/HLA-B*5101/HLA-B*5102/HLA-B*5103/HLA-B*5201/HLA-B*5301/HLA-B*51/HLA-B60/ HLA-B62/HLA-B7/HLA-B8/HLA-Cw*0301/HLA-Cw*0401/HLA-Cw*0602/HLA-Cw*0702/MHC-Db/MHC-Dd/MHC-Kb/ MHC-Kd/ MHC-Kk/MHC-Ld
MHC class II binding peptides on the basis of antigenicity
DRB1_0305-309, DRB1_0311, DRB1_0401, DRB1_0421, DRB1_0426, DRB1_1107
DRB1_0101, DRB1_0305, DRB1_0309, DRB1_0402, DRB1_0404, DRB1_0405, DRB1_0408, DRB1_0410, DRB1_0423, DRB1_0813, DRB1_1107
DRB1_0101, DRB1_0305, DRB1_0309, DRB1_0401, DRB1_0402, DRB1_0404, DRB1_0405, DRB1_0408, DRB1_0410, DRB1_0421, DRB1_0423, DRB1_0426, DRB1_0701, DRB1_0703, DRB1_0801, DRB1_0802, DRB1_0813, DRB1_1101, DRB1_1114, DRB1_1120, DRB1_1128, DRB1_1302, DRB1_1305, DRB1_1307, DRB1_1321, DRB1_1323
DRB1-0102, DRB1-0306-0308, DRB1_0311, DRB1_1104, DRB1_1106, DRB1_1107, DRB1_1311, DRB1_1501, DRB1_1506, DRB5_0101, DRB5_0105
Epitope conservation and variability analysis
Conservation and variability analysis of B-cell and T-cell epitopes in comparison with HCV E1 proteins of other regions
VS QL FTFS PRRH
VS QL FTFS PRRH
IS QL FTFS PRRH
VGQL FTFS PRH H
AA QL FIIS PXHH
VGQVI TFK PRRH
VGQM FTY RPRQ H
VGQAFR FRQ RQ H
VAL TPTL AA RNA
VAL TPTL AA RNA
VAL TPTL AA RNS
VA VA PTVAT RDG
I PVS PNI AVQQP
K PVTPTVAVA YG
TPGCV PCVRE GN
TPGCV PCVQED N
TPGCV PCVRE GN
S PGCV PCVRE GN
V PGCV PCEKV GN
L PGCV PCVATA N
L PGCV PCVKT GN
TPGCV PCVKE GN
S PGCV PCVKS GN
TNDCS ND SITWQ
TNDCS NQ SIVYE
VP TTTIRR H
VP TTAI RR H
VP TTTI RR H
LP TTQL RR H
AL TRGL RT H
APLE SF RR H
AX TAPL RR A
VI TASI RSH
MNWS PTAT M
Q NWS PT VSL
AAL VVS QL LRI
TAL VVS QL LRI
AAL VAS QLF RI
TAL VVAQLL RV
AT MIL AYAM RI
I GLA VSHLM RL
TTLLL AQIM RI
TALLM AQL LRI
VA GAHWGI L
VA GAHWGV L
VA GAHWGV L
IA GAHWGV L
VA GG HWGV L
VA GG HWGV L
LV GS HWGV L
R HI DLLVGS
T HI DMV VMS
R HVDLM VGA
RA VDY LA GG
V TNDCS NSS
V TNDCS NSS
V TNDCS NSS
V TNDCS ND S
LTNDCS NQ S
WVAL TPTL A
WVAL TPTL A
WVAL TPTL A
WVA VA PTVA
WI PVS PNI A
A RNASVP TTTI
A RNASVP TTAI
A RNSNVP TTTI
T RDGKLP TTQL
VQQP GAL TRGL
VA YGS APLE SF
APSF GAX TAPL
VRAP GVI TASI
QP GAL TRGL
YGS APLE SF
SF GAX TAPL
AP GVI TASI
A RNASVP TT
A RNASVP TT
A RNSNVP TT
T RDGKLP TT
VQQP GAL TR
VA YGS APLE
APSF GAX TA
VRAP GVI TA
In this study, sequence and structure analysis, homology modeling and epitope analysis was performed on the HCV E1 protein isolated in Pakistan. We have used various sequence and structure analysis tools that helped in understanding of the sequence and its structure. Through primary structure analysis, amino acid composition of the HCV E1 glycoprotein was checked, and it showed that it has maximum Valine (V) residues and its N-terminus is a Leucine (L).
We used a homology modeling approach to predict the 3D structure of the HCV E1 protein of Pakistan. The predicted 3D structure will provide more insight into understanding the structure and function of this protein. Moreover, this structure can be used for drug designing or understanding the interactions between proteins. The HCV E1 protein was molecularly characterized using various online servers, and it was observed that it had five glycosylation sites, and all of them were conserved in HCV E1 protein sequences of other countries. Clustal W multiple sequence alignment was used to determine the conservation and variability of HCV E1 protein belonging to different regions of the world, and it was determined that there were frequent variations at position 6 (Threonine), 11 (Valine), 17 (Proline), 32 (Threonine), 36 (Isoleucine), 40 (Glutamine), 41 (Aspartic Acid), 44 (Isoleucine), 45 (Serine), 46 (Arginine), 50 (Proline), 58 (Arginine), 59 (Tyrosine), 62 (Alanine), 67 (Valine), 77 (Alanine), 89 (Metheonine), 96 (Valine), 103 (Arginine), 116 (Serine), 123 (Serine), 132 (Lysine), 136 (Threonine), 144 (Alanine), 145 (Glutamine), 152(Serine), 153 (Isoleucine), 157 (Leucine), 158 (Glutamine), 164 (Metheonine), 174 (Glutamic Acid), 181 (Glutamine), 182 (Isoleucine), 185 (Valine), 187 (Valine) in the HCV E1 protein sequences. All other residues of the HCV E1 protein were conserved in all sequences.
As a part of the present study, we predicted B-cell and T-cell epitopes of the HCV E1 protein using different online tools. Only four B-cell epitopes were found to be antigenically effective, and it can be inferred that these epitopes/antigenic determinants are important in raising the desired immune response. Using 3D structure of the E1 protein, eight B-cell epitopic locations were identified. All the predicted B-cell epitopes were checked for their localization in the protein structure, and it was found that the majority of predicted epitopes were in the outside region of the protein. T-cell epitopes were predicted using Propred I and Propred online servers and their antigenicity was found using the Vexijen online server. It was found that the MHC class I binding peptide MNWTPAVGM and the MHC class II binding peptide YVGATTASV had maximum antigenecity ensuring maximum binding affinity. Furthermore, all the selected epitopes were checked for their conservation with other countries of the world, and it was found that most of the epitopes were conserved among Pakistan and Canada, suggesting that these E1 epitopes of these two countries may be evolutionary related. Moreover, all the epitopes showed some conservation with all other countries but there were frequent variations at some points.
To develop effective vaccines it is important to target multiple antigenic components of the virus, thus directing the immune system to protect the host from the virus. Therefore, this study was conducted to predict antigenic determinants/epitopes of the HCV genotype 3a E1 protein along with the 3D protein modeling. The study revealed potential B-cell and T-cell epitopes that can raise the desired immune response to the HCV E1 protein isolated in Pakistan. For diagnosing HCV genotype 3a, these epitopes are highly useful and can also help in developing successful vaccines against HCV 3a infection to save the Pakistani population from potential HCV threats.
Sobia Idrees (MPhil student), Usman A Ashfaq (PhD molecular Biology and Group leader, Human Molecular Biology Group, Department of Bioinformatics and Biotechnology, GCU, Faisalabad.
- Kim JL, Morgenstern KA, Griffith JP, Dwyer MD, Thomson JA, Murcko MA, Lin C, Caron PR: Hepatitis C virus NS3 RNA helicase domain with a bound oligonucleotide: the crystal structure provides insights into the mode of unwinding. Structure 1998, 6: 89-100. 10.1016/S0969-2126(98)00010-0PubMedView ArticleGoogle Scholar
- World Health Organization: Hepatitis C Fact Sheet. http://www.who.int/mediacentre/factsheets/fs164/en/
- Jafri W, Subhan A: Hepatitis C in Pakistan: magnitude, genotype, disease characteristics and therapeutic response. Trop Gastroenterol 2008, 29: 194-201.PubMedGoogle Scholar
- Idrees M, Riazuddin S: Frequency distribution of hepatitis C virus genotypes in different geographical regions of Pakistan and their possible routes of transmission. BMC Infect Dis 2008, 8: 69. 10.1186/1471-2334-8-69PubMedPubMed CentralView ArticleGoogle Scholar
- Suzuki R, Suzuki T, Ishii K, Matsuura Y, Miyamura T: Processing and functions of Hepatitis C virus proteins. Intervirology 1999, 42: 145-152. 10.1159/000024973PubMedView ArticleGoogle Scholar
- Kato N: Molecular virology of hepatitis C virus. Acta Med Okayama 2001, 55: 133-159.PubMedGoogle Scholar
- Ashfaq UA, Ansar M, Sarwar MT, Javed T, Rehman S, Riazuddin S: Post-transcriptional inhibition of hepatitis C virus replication through small interference RNA. Virol J 2011, 8: 112. 10.1186/1743-422X-8-112View ArticleGoogle Scholar
- Ashfaq UA, Javed T, Rehman S, Nawaz Z, Riazuddin S: An overview of HCV molecular biology, replication and immune responses. Virol J 2011, 8: 161. 10.1186/1743-422X-8-161PubMedPubMed CentralView ArticleGoogle Scholar
- Ciccaglione AR, Costantino A, Marcantonio C, Equestre M, Geraci A, Rapicetta M: Mutagenesis of hepatitis C virus E1 protein affects its membrane-permeabilizing activity. J Gen Virol 2001, 82: 2243-2250.PubMedView ArticleGoogle Scholar
- Burlone ME, Budkowska A: Hepatitis C virus cell entry: role of lipoproteins and cellular receptors. J Gen Virol 2009, 90: 1055-1070. 10.1099/vir.0.008300-0PubMedView ArticleGoogle Scholar
- Ashfaq UA, Masoud MS, Khaliq S, Nawaz Z, Riazuddin S: Inhibition of hepatitis C virus 3a genotype entry through Glanthus Nivalis Agglutinin. Virol J 2011, 8: 248. 10.1186/1743-422X-8-248PubMedPubMed CentralView ArticleGoogle Scholar
- Munir S, Saleem S, Idrees M, Tariq A, Butt S, Rauff B, Hussain A, Badar S, Naudhani M, Fatima Z: Hepatitis C treatment: current and future perspectives. Virol J 2010, 7: 296. 10.1186/1743-422X-7-296PubMedPubMed CentralView ArticleGoogle Scholar
- Ashfaq UA, Khan SN, Nawaz Z, Riazuddin S: In-vitro model systems to study Hepatitis C Virus. Genet Vaccines Ther 2011, 9: 7. 10.1186/1479-0556-9-7PubMedPubMed CentralView ArticleGoogle Scholar
- Fournillier A, Wychowski C, Boucreux D, Baumert TF, Meunier JC, Jacobs D, Muguet S, Depla E, Inchauspe G: Induction of hepatitis C virus E1 envelope protein-specific immune response can be enhanced by mutation of N-glycosylation sites. J Virol 2001, 75: 12088-12097. 10.1128/JVI.75.24.12088-12097.2001PubMedPubMed CentralView ArticleGoogle Scholar
- Wilkins MR, Gasteiger E, Bairoch A, Sanchez JC, Williams KL, Appel RD, Hochstrasser DF: Protein identification and analysis tools in the ExPASy server. Methods Mol Biol 1999, 112: 531-552.PubMedGoogle Scholar
- Chen CC, Hwang JK, Yang JM: (PS)2: protein structure prediction server. Nucleic Acids Res 2006, 34: W152-W157. 10.1093/nar/gkl187PubMedPubMed CentralView ArticleGoogle Scholar
- Laskowski RA, MacArthur MW, Moss DS, Thornton JM: PROCHECK - a program to check the stereochemical quality of protein structures. J App Cryst 1993, 26: 283-291. 10.1107/S0021889892009944View ArticleGoogle Scholar
- Wiederstein M, Sippl MJ: ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res 2007, 35: W407-W410. 10.1093/nar/gkm290PubMedPubMed CentralView ArticleGoogle Scholar
- Colovos C, Yeates TO: Verification of protein structures: patterns of nonbonded atomic interactions. Protein Sci 1993, 2: 1511-1519. 10.1002/pro.5560020916PubMedPubMed CentralView ArticleGoogle Scholar
- Doytchinova IA, Flower DR: VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines. BMC Bioinformatics 2007, 8: 4. 10.1186/1471-2105-8-4PubMedPubMed CentralView ArticleGoogle Scholar
- Krogh A, Larsson B, Von Heijne G, Sonnhammer EL: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 2001, 305: 567-580. 10.1006/jmbi.2000.4315PubMedView ArticleGoogle Scholar
- Kringelum JV, Lundegaard C, Lund O, Nielsen M: Reliable B cell epitope predictions: impacts of method development and improved benchmarking. PLoS Comput Biol 2012, 8: e1002829. 10.1371/journal.pcbi.1002829PubMedPubMed CentralView ArticleGoogle Scholar
- Somvanshi P, Singh V, Seth PK: In Silico Prediction of Epitopes in Virulence Proteins of Mycobacterium Tuberculosis H37Rv for Diagnostic and Subunit Vaccine Design. J Proteomics Bioinform 2008, 1: 143-153. 10.4172/jpb.1000020View ArticleGoogle Scholar
- Idrees S, Ashfaq UA: A brief review on dengue molecular virology, diagnosis, treatment and prevalence in Pakistan. Genet Vaccines Ther 2012, 10: 6. 10.1186/1479-0556-10-6PubMedPubMed CentralView ArticleGoogle Scholar
- EL-Manzalawy Y, Dobbs D, Honavar V: Prediction of linear B-cell epitopes using string kernels. J Mol Recognit 2008, 21: 243-255. 10.1002/jmr.893PubMedPubMed CentralView ArticleGoogle Scholar
- Bui HH, Sidney J, Li W, Fusseder N, Sette A: Development of an epitope conservancy analysis tool to facilitate the design of epitope-based diagnostics and vaccines. BMC Bioinformatics 2007, 8: 361. 10.1186/1471-2105-8-361PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.