Virology Journal the Complete Genome Sequence of a Crimean-congo Hemorrhagic Fever Virus Isolated from an Endemic Region in Kosovo

The Balkan region and Kosovo in particular, is a well-known Crimean-Congo hemorrhagic fever (CCHF) endemic region, with frequent epidemic outbreaks and sporadic cases occurring with a hospitalized case fatality of approximately 30%. Recent analysis of complete genome sequences of diverse CCHF virus strains showed that the genome plasticity of the virus is surprisingly high for an arthropod-borne virus. High levels of nucleotide and amino acid differences, frequent RNA segment reassortment and even RNA recombination have been recently described. This diversity illustrates the need to determine the complete genome sequence of CCHF virus representatives of all geographically distinct endemic areas, particularly in light of the high pathogenicity of the virus and its listing as a potential bioterrorism threat. Here we describe the first complete CCHF virus genome sequence of a virus (strain Kosova Hoti) isolated from a hemorrhagic fever case in the Balkans. This virus strain was isolated from a fatal CCHF case, and passaged only twice on Vero E6 cells prior to sequence analysis. The virus total genome was found to be 19.2 kb in length, consisting of a 1672 nucleotide (nt) S segment, a 5364 nt M segment and a 12150 nt L segment. Phylogenetic analysis of CCHF virus complete genomes placed the Kosova Hoti strain in the Europe/Turkey group, with highest similarity seen with Russian isolates. The virus M segments are the most diverse with up to 31 and 27% differences seen at the nt and amino acid levels, and even 1.9% amino acid difference found between the Kosova Hoti and another strain from Kosovo (9553-01). This suggests that distinct virus strains can coexist in highly endemic areas. Findings Bioinformatics analysis of complete microbial genomes has led to advances in the development of novel diagnostic techniques, in the research of microbial pathogenesis, and in the control and prevention of infectious diseases. Until the year 2006, only 2 complete genomes of Crimean-Congo hemorrhagic fever virus (CCHFV) had been sequenced [1]. CCHFV, is a tick-borne virus with tri-which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

partite RNA genome (S, M and L segment), and is the causative agent of a lethal zoonosis named Crimean-Congo hemorrhagic fever (CCHF). The virus is distributed over much of Asia, extending from China to the Middle East and Southern Russia and to the focal endemic areas in Africa and southern Europe, including Kosovo and Turkey [2]. Yearly epidemics, as well as sporadic cases of CCHF are seen in some of these areas, often with high case fatality (approx. 30%) [3]. CCHFV can be transmitted to humans by bites of Ixodid ticks and by the contact with blood or tissue from viremic livestock and human patients [2]. Development of diagnostic approaches and potential vaccines is dependent on knowledge of the broad geographic distribution of diverse virus variants and on understanding of the extent of virus genetic reassortment and recombination [3,4]. The analysis of the 16 existing complete CCHFV genomes up to date indicated considerable evolution and high diversity of CCHFV [1,5]. Presumably this reflects the typical high polymerase error rates seen with negative stranded RNA viruses. In addition, previous reports have found evidence of RNA segment reassortment events between CCHFV M segments, and the recombination in CCHFV S segments [1,3,4]. The genetic diversity of CCHFV, its virulence, and its potential as a bioterrorism agent, make it important to obtain the complete genome of CCHFV from all geographically distinct endemic areas.
The Balkan peninsula, and Kosovo in particular, is a wellknown endemic region for CCHF, and epidemic outbreaks and sporadic cases have been frequently been recorded [6][7][8]. Five nucleotide sequences of CCHFV from Kosovo have been published [9][10][11][12]. Three of them are partial sequences of S segment, the remaining 2 represent complete sequences of S and M segment of different CCHFV strains, Kosova Hoti and Kosovo 9553-01, respectively. We describe the first complete CCHFV genome sequence of a virus (strain Kosova Hoti) isolated from a hemorrhagic fever case in the Balkans.
The CCHFV Kosova Hoti strain was isolated from a blood of a female fatal case during the epidemic in Kosovo in 2001 [6]. The blood was taken on the 5 th day after onset of symptoms. Results of the laboratory analysis showed the presence of IgM antibodies (titer 1:400) and the presence of viral RNA in the concentration of 1.08 × 10 10 copies per mL of serum. Virus was grown on Vero E6 cells in BSL-3 laboratory. Viral RNA was extracted with the Trizol reagent from the second passage of the CCHFV in Vero E6 cells, and used for the direct sequencing of the complete genome of the virus. Amplicons of S, M and L full length segments were obtained by following the protocols described previously [1,9,13]. Briefly, a total of 16 S, 40 M and 84 L sequencing primers were used to generate the complete sequence of the S, M and L segments and these are deposited in the GenBank under the accession numbers DQ133507, EU037902 and EU044832, respectively. Sequence alignment of CCHFV Kosova Hoti strain complete genome with preexisting CCHFV genomes was performed using the CLUSTAL W algorithm of MegAlign module (Lasergene 1999, DNASTAR, USA). Phylogenetic relationships of different CCHFV strains were established with a software package TREECON [14]. The phylogenetic tree was constructed by the neighbor-joining method. The topology of the tree was obtained with the Kimura 80 model and support for the tree nodes was calculated with 500 bootstrap replicates. SignalIP was used to predict the signal sequence cleavage site and TMHMM 2.0 was used to predict transmembrane helices of M segment [15,16]. The amino acid (aa) sequence of L segment was subjected to the PSI-BLAST and PredictProtein server for search of conserved aa motifs [17,18].
The genome size of CCHFV, strain Kosova Hoti, was found to be approximately 19. Three phylogenetic trees were constructed based on the ORF sequences of S, M and L segments of CCHFV (Fig. 1). The general topologies of the trees were consistent with those described previously [1,13]. Seven distinct groups were formed representing the approximate geographic distribution of CCHFV. Based on the analysis of S, M and L segment of Kosova Hoti CCHFV, this strain clustered in group V., which represents the Europe/Turkey geographic lineage [1]. The position of Kosova Hoti strain within group V. was similar in the S and L segment tree (Fig. 1, panels A and C), where it formed a separate lineage ancestral to the three Russian isolates and CCHFV strain 200310849 from Turkey was the most ancestral member of V. group. The group V. topology based on the M segment, was a little different (Fig. 1, panel B). Two Russian strains (VLV-100 and Kashmanov) clustered together with the Turkish CCHFV whereas the Russian Drosdov strain clustered together with both CCHFV strains from Kosovo.
The sequence differences between the CCHFV strains in the group V. are shown in Tables 1, 2, 3. Significant difference was noted between the nt (ORF) and aa sequences of S and L segments, in comparison to the M segment. The majority of nt changes in the S and L segments were synonymous (not amino acid changing) (Tables 1, 3), whereas over 80% of M segment nt changes were non-synonymous (amino acid changing) ( Table 2). As seen in ear- The phylogenetic analysis of the complete genome of CCHFV Kosova Hoti strain        lier studies [12,19,20], considerable glycoprotein amino acid variation was observed, particularly in the mucin-like variable region (Fig. 2), and presumably reflects the biological function of the glycoproteins encoded by the M segment. It is somewhat surprising that the glycoproteins of Kosova Hoti and another strain from Kosovo, 9553-01, differed by 1.9% in complete aa sequence, and up to 4.5% in the mucin-like domain ( Table 2). This suggests different genetic strains of CCHFV co-exist in this highly endemic region.
The analysis of the Kosova Hoti strain M segment encoded polyprotein predicted the cleavage of the signal peptide to occur between aa 27 and 28 (AHG-QS). This site is identical to those described for Kosovo 9553-01 and Kashmanov but differs from other strains in group V. (Fig. 2). The mucin-like variable region of Kosova Hoti strain polyprotein stretches from aa 28 to 251 and differs by up to 20.5% from Turkish 200310849 strain (Table 2). Tetrapeptides RSKR251, RKLL523 and RKPL1043 were identified in Kosova Hoti and are identical among all strains in V. group. They represent the cleavage sites for GP38, Gn and Gc proteins, respectively [21,22]. The RKLL523 tetrapeptid of Kosova Hoti is typical for all strains in group V (Europe/Turkey) but it differs from RRLL tetrapeptid in all other CCHFV strains sequenced. However, both tetrapeptides constitute a cleavage recognition site for subtilase SKI-1 [12,22,23]. Five transmembrane helices were predicted for polyprotein of Kosova Hoti as shown on Figure 2.
Analysis of L protein encoded by the L segment of the Kosova Hoti strain revealed the conserved OTU-like protease domain from aa 35 to 152 (Fig. 2). The identified sequence G 37 DGN 40 CFYHSIAE..... 151 HFD with the catalytic triad (indicated in bold) was identical among all CCHFV strains used in the L segment alignment (Fig. 1,  panel C). Amino-acids 2043-2714 corresponded to the RNA-dependent RNA polymerase catalytic domain, similarly to the Nigerian IbAr10200 strain [24]. In addition, a zinc finger C2H2-type domain (aa 609-632) was found in the L protein of Kosova Hoti, but a previously identified leucine zipper could not be predicted. A leucine zipper motif (composed of three heptads) previously identified at aa 1386-1407 in the L sequence of a Nigerian strain [24,25], was not identified in the Kosova Hoti L sequence. However, the L sequence of Kosova Hoti (and other strains from group V) in this region differs from the Nigerian strain only in the substitution of the leucine for isoleucine at the position 1386.
Frequently it is observed that arthropod-borne viruses of vertebrates exhibit low genetic diversity which is thought to be due to essentially a double filter in operation, whereby evolution of these viruses is tightly constrained by the need to maintain high fitness in both vertebrate and arthropod host environments [26]. The very high genetic diversity seen in CCHFV is a strikingly exception.
Presumably less constraint or greater positive selection is molding the evolutionary pattern of this virus. The complete genome of this representative CCHFV isolate (Kosova Hoti) from a highly endemic region of the Balkans is clearly divergent from strains present in other endemic regions of the world, and considerable sequence difference is even observed among virus strains found within Kosovo. These findings have importance for design of molecular diagnostic tools and vaccine development efforts, as they clearly illustrate the need to consider the high viral diversity and complexity of CCHF viral variant geographic distribution in these efforts.

Competing interests
The author(s) declare that they have no competing interests.

Authors' contributions
DD performed RNA extraction, qualitative and quantitative RT-PCR, analyzed the data and prepared the draft manuscript. MK and STN provided the complete M and L segment sequences and revised the draft manuscript. AS sequenced the complete S segment. IHB performed the protein analysis. MP, ID and SA collected the samples and clinical data. TAZ isolated the virus, supervised the study and revised the final draft. All authors read and approved the final manuscript.
The protein analysis of the complete genome of CCHFV Kosova Hoti strain