Phylogenetic comparison of exonic US4, US7 and UL44 regions of clinical herpes simplex virus type 1 isolates showed lack of association between their anatomic sites of infection and genotypic/sub genotypic classification

Background HSV-1 genome is a mosaic of recombinants. Clinical Herpes simplex virus -1 (HSV1) isolates were already genotyped as A, B and C types based on nucleotide variations at Unique Short (US) 4 (gG) and US 7 (gI) regions through phylogeny. Analysis of Glycoprotein C (gC) exon present on the Unique Long (UL) region had also revealed the existence of different genotypes. Glycoprotein C is mainly involved in initial viral attachment to heparan sulphate on host cell surface facilitating the virus's binding and penetration into cell. As the amount of heparan sulphate on the host cell surface varies according to the cell type, it is plausible that different genotypes bind differentially to cell types. Hence, this study was framed to determine the existence of novel genotypes/sub genotypes in the US or UL regions which could associate with clinical entities. Results All the twenty five isolates analyzed in this study were of genotype A as per their gG gene sequences. In case of gI gene, 16 out of 25 were found to be type A and the remaining nine were type B putative intergenic recombinants. Intragenic recombinations were also encountered in both the US genes, with gG possessing novel subgenotypes, arbitrarily designated A1 and A2. The 9 type B isolates of gI genes also branched out into 2 clades due to genetic variations. Glycoprotein C of UL region had two distinct genotypic clades α and β, whose topological distribution was significantly different from that of the US region. Neither the US nor UL regions, however, showed any preference among the genotypes to a specific anatomic site of infection. Even the non synonymous variations identified in the functional domain of gC, were not confined to a particular genotype/clinical entity. Conclusion The analyses of the US and UL regions of the HSV-1 genome showed the existence of variegated genotypes in these two regions. In contrary to the documented literature, in which Asian strains were concluded as more conserved than European ones, our study showed the existence of a higher degree of variability among Indian strains. However, the identified novel genotypes and subgenotypes were not found associated with clinical entities.


Background
Herpes simplex virus -1 commonly causes superficial watery blisters in humans in the oral mucosa or genitalia. Apart from infecting the dermis and muco-cutaneous regions, the virus is also capable of infecting a wider range of host tissues, especially of neuronal and corneal origin, leading to encephalitis and keratitis with high rates of morbidity and mortality [1][2][3][4][5].
It is imperative that nuances of replication of HSV are well understood in order to discern the reason behind the wide spectrum of tissues infected. Among the many glycoproteins and glycosaminoglycans adorning a host cell surface, it has been conclusively proved that heparan sulphate (HS), ubiquitously expressed on various cell surfaces, plays an important role in the viral attachment and penetration. HSV-1 penetration and membrane fusion with host cell surface HS takes place via viral glycoproteins C, B, H, L and D [6][7][8].
HSV-1 contains several glycoproteins, each with varied functions, concerning the overall pathogenesis and immune evasion by the virus. Glycoprotein C (gC) plays a significant role in the efficient attachment to the cell surface [9][10][11], and Glycoprotein G (gG)interacts with host immune system effecting successful evasion by the virus [12][13][14]. Glycoprotein I (gI) forms a hetero-dimeric complex with glycoprotein E and is responsible for cell to cell viral spread in epithelial and neuronal cells [15].
A detailed study of the molecular evolution of glycoprotein genes G and I, in European strains, threw up existence of genotypes arbitrarily labeled A, B, C and intragenic recombinants in a hitherto considered stable genetic make up [16]. Subsequent genomic studies carried out by Norberg et al (2011) [17] also showed HSV -1 to be a mosaic of recombinants. As gC region is essential in the initial binding to the HS moiety, any variations detected in this region would lead to classification of a separate genotype, which may differentially influence the binding to variegated tissues.
Hence the current study was undertaken to chart and compare the phylogenetic pattern of 2 genes (gG and gI) from the Unique Short (US) region and 1 gene (gC) from the Unique Long (UL) region of HSV-1 genome and determine the possibility of veritable association between the clinical sites of infection and genotypes of any or all of the three genes.

Results and discussion
The entire coding regions of gG (US 4), gI (US7) and gC (UL 44) genes were sequenced for twenty five clinical isolates. The standard strain HSV-1 ATCC 733VR was sequenced in parallel. These sequences were subjected to phylogenetic analyses by Maximum likelihood method using PHYLIP software. Genes targeting glycoproteins G and I were also subjected to RFLP analyses to genotype them as A, B, and C, based on the protocol elucidated by Norberg et al (2006), covering smaller intervals within the genes [5].

RFLP Analysis of gG and gI
The RFLP analyses of both genes showed a bias towards genotype A. All 25 isolates in their gG gene, and 16 out of 25 isolates in their gI gene and in the case of standard strain, both the genes, conformed to type A cleavage pattern. However, the remaining 9 out of 25 isolates in their gI gene were of type B. The varying genotypic pattern observed in these nine isolates indicated the presence of recombination crossovers between the genes and they were designated as putative intergenic recombinants. These recombinants were of variegated specimen origin with 7 being ocular; whilst, the other two were isolated from oral and genital mucosa.

Sequence and phylogenetic analyses of glycoprotein G
Phylogenetic analysis using Maximum likelihood method was performed for 25 isolates along with standard strain and 37 sequences covering all the three previously classified genotypes from Genbank.
The phylogenetic tree of gG clearly showed the presence of two distinct genogroups within the isolates. Both the genogroups had considerable evolutionary divergence with respect to Genotype B and C strains, but were very closely related to Genotype A strains obtained from Genbank, in spite of one genogroup being located in the same bifurcating branch, but in a different clade, as B and C strains. RFLP analyses had shown all strains to be Type A. Sequence analyses showed the existence of tandem repeats consisting of 3 nucleotides-GGA repeated nine times, from position 234 to 262 in 6 isolates (with a codon addition seen in 6 isolates), which were subsequently removed prior to phylogenetic analyses. The no. of tandem repeats (n = 9) were characteristic of Type A strains [16]. Hence, divergence into two genogroups indicated the existence of subgenotypes/ intragenic recombinants. The groups were arbitrarily labeled as sub genotypes A1 and A2, respectively. An unrooted phylogram of gG gene depicting classification of isolates into sub genotypes A1 and A2 is shown in Figure 1.
An earlier report by Norberg et al [16], stated nucleotides T and A at positions 267 & 280, respectively were specific for Genotype A. However, in this study we encountered TG, CG, CC in 22 out of 25 isolates. The other three isolates conformed to the T-A specific genotype A model. These variations were not specific to a particular sub genotype, but as the positions between 280 and 324 were considered to harbour putative recombination crossovers, [16], we provide the proof of variability within the gene, leading to potential recombination occurrence. These variations were seen only among our isolates.
Other variations were seen in the isolates along with a few Genbank isolates at 610, 645 nucleotide positions. Variations in the 267,398, 655 positions were exclusively seen among our strains (genotype A1). These 5 variations were mainly responsible for the subgenotypic classification.
Two ocular isolates did not fall into either of the sub genotype categories.

Sequence and phylogenetic analyses of glycoprotein I
Phylogenetic analysis was carried out using Maximum likelihood method for 25 isolates, standard strain along with 31 Genbank strains. A highly divergent tree was obtained. But interestingly, nine strains which were type B as determined by RFLP, and later confirmed by sequencing, did not group themselves together. Seven of them, along with one type A isolate branched out, and exhibited closer relationship to Genbank type C strains. The other two (isolate nos. AEH02 and SN02) were closer to Genbank type B strains. The nine type B isolates flanked the 15 type A clinical isolates and Genbank strains, which were seen grouped together, on either sides. An unrooted tree of gI showing the topological distribution of the isolates is shown in Figure 2.
Variations were seen between positions 270 and 363 in AEH02 and SN02 isolates only, which led them to separate out from other type B isolates. Furthermore, variations at positions 695,698,736,768 in these two isolates were similar to the Genbank Type B and A strains, leading them to be grouped together. The type B isolates which are already intergenic recombinants (comparing gG) showed variations, which could account for presence of putative recombination crossover points within this gene between positions 270 -363.Though type B isolates had specific variations among them which led their separated cladic topology, no sub genotypic variations were seen among type A isolates.

Sequence and phylogenetic analyses of glycoprotein C
Phylogenetic tree was constructed using Maximum likelihood method by PHYLIP along with 17 Genbank strains. Molecular phylogeny of exonic gC nucleotide sequences threw up existence of only 2 distinct geno groups, though previously 3 separate genogroups were deduced in this region [18]. The genotypes were arbitrarily labeled α andβ, in this region. an unrooted tree of gC gene depicting the existence of geno groups α and β is shown in Figure 3.
The nucleotide variations seen in positions -381 C-> G,669 G-> A,724 and 725 TA-> AT among typeβ isolates and 781 G-> T among type α isolates were attributed to this cladic separation. Several synonymous and non synonymous substitutions were encountered, but not affecting the formation of α andβ genotypes. Addition of a codon was encountered in 2 corneal isolates (416-ACG and AEH06-ACC) after position 1235.

Comparison between genotypes and clinical entities
Two novel subgenotypes (A1 and A2) were detected in gG gene, of which type A1 contained all the isolates that turned out to be putative intergenic recombinants on based analyses of gI. These intergenic recombinants, in turn had variations within their gI region, which classified them as intragenic gI type B recombinants. However, the gC gene presented a different picture in terms of topological distribution of isolates, with little correlation between evolutionary pattern seen in the US region.
Though there were genotypic variations in all the three genes, none of these classifications were specific to a particular site of infection. The 97 -367 nucleotide region of glycoprotein C (Amino acid region: 33 -123) is responsible for its binding to HS moieties on host cell surface, and any mutation in this region will adversely affect its binding capacity [19,20]. In 8 out of 25 isolates studied, we encountered a non synonymous variation at position 187 (A -> C) which resulted in an amino acid change from threonine (polar) to proline (non -polar). Moreover, another variation at position 283(A -> C) in four isolates, resulted in a change from glutamine to lysine. However, none of these changes were specific to a particular genotype or clinical specimen.
The analyses comprised of three genes -gG, gI and gC, of which gG and gI are located in the US region of the genome, close enough to remain linked during replication, whereas, gC present in the UL region is at a considerable genetic distance from the other two genes. The evolutionary patterns of US and UL regions were found to be varied, which can be attributed to the genetic distance between these two regions, accounting for the existence of large number of recombination crossover points. However, surprisingly, despite the proximity of gG and gI genes, even these two genes contained intergenic recombinants and variations in previously characterized potential recombination crossovers [16], thereby making them susceptible to intragenic recombination, which we encountered in both the genes, accounting for the existence of different genotypes as well as sub genotypes. Quantitative evaluation of the genomic polymorphisms of HSV 1 strains from six countries -3 Asian and 3 non Asian origin (Japan, Korea, China, Sweden, U.S.A. and Kenya) concluded that the evolutionary pattern was similar among same ethnic groups, and the variability encountered among the Asian strains was lesser compared to the Non-Asian ones [21]. Nevertheless, we have seen more variations within our strains (all Indian) with respect to the European strains from Genbank. This in spite of the virus having a stable genome and a low mutation rate of 3.5*10 -8 /site/year [21].
Phylogenetic analysis of 28 European clinical HSV-1 non ocular isolates in their glycoproteins G (gG), I (gI) and E (gE), comprising 2.3% of the unique short (US) region revealed that the sequences could be separated into three genetic groups A, B and C [16].
Despite Duan et al [22], mentioning the presence of a type B bias among ocular strains, such a phenomenon was not encountered in the isolates used in our study, comprising of gG, gI and gC genes. All our strains were grouped to either Genotype A or intergenic recombinants. Though novel sub genotypes exist in gG gene, there was no predilection to specific anatomic site. Even the tandem repeats (seen among subgenotype A2 of gG), whose number and nature are more conserved than the other parts of the gene, showed no association with clinical entities. Phylogeny of gC threw up novel genotypes arbitrarily designated as α and β. The gC region is involved in the initial attachment of virus to the host cell. Since different host cells have varying amounts of heparan sulphate on their cell surface, it is possible that a genotypic variation could be anatomic site specific, which was not encountered, making the affinity purely quantitative. Hence, no association was seen between clinical specimen and genotypes on studying three different parts of the HSV genome.
Glycoproteins D, B, H, L apart from gC are involved in the host -virus interactions. Detailed analyses covering all these genes could add on to the input provided by us regarding strain specific site association. However, the high frequency of recombination associated with HSV-1 genome, makes it impossible to assign a particular genotype for a strain [17], sometimes variations are present even in the functional domain, as seen in our study in case of gC. Hence, an increased number of novel genotypes/sub genoytpes due to intergenic/intragenic recombination are likely to be encountered in these glycoprotein genes as well, which will make the segregation of strains to anatomic site difficult.

Conclusion
Phylogenetic and sequence analyses of clinical HSV-1 isolates comparing three genes from two different segments of the genome revealed the existence of novel genotypes and sub genotypes, adding on the data that widespread variations are present across the genome which is a mosaic of genotypes. As variations seen among our isolates out numbered the variations present in the Genbank strains used in our study, we also differ from the previous study by Sakoaka et al [21] that Asian strains are more conserved than their European counterparts by stating that Indian strains have a high degree of variability. However, neither of the two segments of the genome could associate a genotype with a particular clinical entity.

Characteristics of isolates used
A total of twenty five isolates from variegated clinical specimens, 15 ocular, 3 genital, 3 oral, 2 skin and 2 throat swab specimen isolates, were used for this purpose ( Table 1) Amplification of gG, gI genes DNA from isolates and clinical standard strain were extracted using Qiamp DNA mini kit (Germany) according to manufacturer's protocol. The PCR based DNA amplification for gG and gI were done using primers previously documented [16,23].

Amplification of gC gene
Primers were designed using Primer 3 software to split the exonic gC gene into 5 overlapping sub regions labeled, for convenience, from C1 through C5. The primers are listed in Table 2.
The PCR was performed in a 50 μL reaction volume containing 1X PCR buffer with additional 2.5 mM MgCl2 (10 mM Tris with 15 mM MgCl2), 200 mM of each dNTPs, 2.5 units of Taq DNA polymerase, 1 mM each primer. The profile consisted of 40 cycles of denaturation at 95°C/30 sec, annealing at 56°C/60 sec for C1, C3-C5 and 62°C/60 sec for C2 and extension at 72°C/60 sec.
The amplicons were gel extracted using Qiagen Gel elution kit and cycle sequenced using Ready reaction mix and analyzed using ABI genetic analyzer AVANT 3130.

Construction of phylogeny and analyses of tandem repeats
Contig assembly was performed using DNA baser software. Multiple sequence alignment was done using ClustalW2 and Multalin Interface softwares [24]. The nucleotide sequences of gG [Genbank: JN181118-JN181143] and gC [Genbank: JN712694-JN712718] were deposited in Genbank. Unrooted phylogenetic trees were constructed with maximum likelihood tool with bootstrap validation (500 replicates) using PHYLIP software [25,26]. Tandem repeats were detected using Etandem tool of Emboss package.
Various methods were characterized to serotype and genotype HSV using RFLP [27][28][29], however, in this paper PCR based RFLP was done using enzymes, targeting 269 bp region within gG and 410 bp region within gI to genotype the isolates [5]. This method was used as it was seen to be applicable to European strains on a larger geographic distribution [22,30].