Sequencing and phylogenetic analysis of the gp51 gene from Korean bovine leukemia virus isolates

Background Bovine Leukemia virus (BLV) infection of cattle has been reported in Korea for more than three decades. However, to date, there have been few studies regarding Korean BLV since 1980s. Thus, the purpose of this study is to perform a diagnosis and molecular characterization of BLV strains circulating in Korea and to estimate genetic diversity of different genotypes of BLV. Method To investigate the distribution of BLV variants in the world and assess the evolutionary history of Korean BLV isolates, a comprehensive molecular analysis of the BLV env gp51 gene was conducted using recent worldwide BLV isolates. The isolates included 50 samples obtained from two cattle farms in southeastern Korea in 2014. Results Sequence and phylogenetic analyses of partial 444-nt fragment sequences and complete gp51 sequences of BLV revealed eight distinct genotypes of BLV showing geographic distribution of the world. Most Korean BLV isolates were found to belong to genotype 1 which is a major genotype prevailed throughout the world, and only four isolates from one farm were classified as genotype 3 related to the US and Japan isolates. Analysis of amino acids of Korean BLV isolates showed several sequence substitutions in the leader peptide, conformational epitope, and neutralizing domain regions. The observations suggest the possibility of affecting on viral infectivity and formation. Conclusion Korean BLV isolates showed the close relationship to genotype 1 and 3. Further study to identify the diversity of BLV circulating in Korea is necessary with samples collected nationwide because this study is the first report of BLV genotype 3 being in circulation in Korea. Electronic supplementary material The online version of this article (doi:10.1186/s12985-015-0286-4) contains supplementary material, which is available to authorized users.

BLV has an envelope (Env) protein complex that is composed of surface glycoprotein (SU) subunits, which are anchored to virions by their association with transmembrane (TM) protein subunits [15]. The envelope glycoproteins of BVL play a crucial role in the virus life cycle; envelope proteins are responsible for cellular tropism because they contain the recognition site for the cell surface receptor required for virus entry and are the natural target for neutralizing antibodies [16].
Recent phylogenetic analysis of BLV env genes showed that this virus could be classified into 8 or more genotypes [17,18]. EBL was first reported by the serological method in Korea in the 1980s [19], and recently, its prevalence in dairy cattle was found to be over 50% [20][21][22][23]. To date, there has been no reports of the BLV genotype circulating in Korea since partial env gene sequences that belonged to genotype 1 were deposited in 2009 [24]. It is known that some of the genotype groupings correlate with the geographical origin of the strain [16,[25][26][27][28][29][30][31][32][33][34], but with the increased movement of cattle and people around the world, a variety of genotypes have been reported in places where only a specific genotype was dominant in the past [17,33]. In Japanese studies, it was reported that many genotypes of BLV (genotypes 1~5) co-circulated nationwide [35] with genotype 1 being predominant and genotypes 3 and 5 being widely distributed [36].
This study was performed to identify the genetic diversity of BLV in Korean cattle. The gp51 sequence in 50 BLV isolates was analyzed in our study.

Samples information
A total of 185 whole blood samples were collected from dairy cattle on two farms (119 from farm A and 66 from farm B) in southeast Korea. Farm A is located in Gyeongsangbuk-do and farm B in Gyeongsangnam-do. The samples were collected between April and July 2014.

Viral DNA extraction
Peripheral blood lymphocytes (PBLs) were isolated from whole blood samples and stored at −70°C until further use. PBLs isolation was performed using Luecosep tube (Greiner Bio-One Gmbh, Frickenhausen, Germany) according to the manufacturer's protocol. Isolated PBLs were resuspended in 1 ml of PBS. Viral DNA was extracted from 200 μL of PBL solution with QIAamp DNA Mini Kit (QIAGEN, Hilden, Germany) following the manufacturer's recommendations. DNA samples were stored at −20°C until further use.

Sequencing of the gp51
The gp51 PCR products were purified using a PCR purification kit (QIAGEN, Hilgen, Germany). The purified amplicons were cloned into the pDrive vector (QIAGEN, Hilgen, Germany) and sequenced by a local company (Macrogen Inc. Seoul, Korea). The sequences were analyzed using the DNAstar program (DNASTAR Inc., Ver. 5.0).

Phylogenetic analysis
To elucidate the genetic relationships between Korean and worldwide BLV isolates, currently available BLV env sequences were downloaded from the GenBank database and combined with 17 unique Korean BLV env fragment sequences. After online alignment using MAFFT v. 7 (http://mafft.cbrc.jp/alignment/server/) [38], sequences with too many missing characters were manually excluded. To consider all of the possible genotypes of the worldwide BLV isolates, two different datasets were generated for phylogenetic analyses: partial gp51 (444 bp), ranging from 93 to 240 amino acids, and complete gp51 (903 bp), ranging from 1 to 301 amino acids [39]. To estimate the genetic divergence of the gp51 gene, the Kimura-2-parameter (K2P) [40] of the genetic distance was calculated using MEGA v.5 [41] for both intra-and inter-genotype distances. To compare the gp51 amino acid sequences from Korean to other isolates from other countries, protein sequences based on a BLV reference sequence (i.e., K02120 and AY995174) were obtained by MEGA.
For accurate and robust phylogenetic analyses of the BLV gp51 sequence, the Maximum likelihood (ML) and Bayesian Inference (BI) approaches were performed. To determine the most appropriate nucleotide substitution models, jModelTest v.2.1.4 [42] was used among 88 candidates on a BioNJ tree, under the Akaike Information Criterion (AIC), corrected AIC(c), and Bayesian Information Criterion (BIC). The closest model available in the corresponding ML and Bayesian analyses were chosen for each dataset.
For the ML analyses, GARLI 2.01 [43], based on genetic algorithms for the ML search, and RAxML 7.4.2 [44], implemented in raxmlGUI v.1.3 [45], were used to compare the phylogenetic results. Because RAxML only allows the most complex GTR (General Time Reversible) substitution model but GARLI is available for the standard nested submodels of the GTR, the following models were selected after model estimation for each dataset: GTR + I (proportion of the invariable sites) (903 bp) and GTR + G (Gamma distribution of the rates among sites) (444 bp) for RAxML and TPM3uf + I (903 bp) and TPM2 + G (444 bp) for GARLI. Bootstrap supports were calculated using 1,000 bootstrap replicates for the four datasets, and the bootstrap trees were summarized with a 50% majority rule consensus tree by the SumTrees script (v.3.3.1) implemented in DendroPy v.3.10.1 [46].
For the Bayesian analyses, two programs were used. The first was MrBayes v.3.2.2 [47], but such analyses have been reported to produce biased posterior probabilities in certain cases (e.g., the "star-tree paradox" [48]). To address this issue, Phycas v.1.2.0 [49] was used to as a second approach for the polytomy prior [50]. As the best-fit models, HKY (Hasegawa, Kishino and Yano) + I (903 bp) and HKY + G (444 bp) were specified for each dataset in MrBayes. Two runs with four chains were carried out simultaneously for 10 million generations, and the trees were sampled every 100 generations. The convergence diagnostics were evaluated when the average standard deviation of the split frequencies of the two runs was less than 0.01. In Phycas, the same substitution models of MrBayes were used for each dataset, but with a 1 million generation. Tree samples prior to reaching a stationary posterior distribution were discarded, and the remaining samples were summarized with SumTrees.
The phylogenetic trees are rooted on genotype 5, which includes isolates from Central and South America for a comparison to previous studies [17,18,33,51].  (Table 1). Another PCR reaction to amplify the full gp51 gene was also performed. Nineteen out of 37 positive samples from farm A and 31 out of 41 positive samples from farm B were successfully amplified. As a result of the phylogenic analysis based on the Korean isolates, the genotypes from farm A was divided into two groups, genotype 1 and genotype 3 (Figures 1 and 2). Out of the 19 samples analyzed from farm A, 15 were genotype 1 (78.9%) and 4 were genotype 3 (21.1%). We found that all 31 isolates from farm B belong to genotype 1. Of all of the cloned gp51 sequences, 46 samples were genotype 1 (92%) and 4 samples were genotype 3 (8%). All of the sequences of BLV gp51 obtained in this study can be found in Additional file 1: Table S1 and are deposited in the GenBank databases with accession numbers KP20146-KP201482.

Diagnosis and identification of Korean BLV isolates
Genotyping of the BLV gp51 gene based on global phylogenetic analyses ML and BI trees based on the complete gp51 gene showed congruent topologies. The existence of eight genotypes of BLV was highly supported by high bootstrap and posterior probability values of more than 95 and 0.99, respectively ( Figure 2). Phylogenetic analyses based on the partial 444 bp of gp51 also supported the distinct eight genotypes of BLV, which was previously described [17,18]. However, there were some discrepancies in the genotypic phylogenetic relationships between the ML and Bayesian analyses ( Figure 1, Additional file 1: Figure S1). The use of different substitution models had little effect on the overall analyses of the genotype separation, regardless of the datasets. Supporting values for each node were relatively lower in ML than BI (Figures 1 and 2, Additional file 1: Figure S1).
Evolutionary relationships among the genotypes showed some correlation between the genotypes and geographic affinities. For instance, genotypes 7 and 4, mostly from European countries, have a close relationship with a node receiving less than 80% support, except for MrBayes (74/67/0.9/68: data not shown on the tree). Compared to the G4-G7 relationship, genotypes 2 and 3 are more closely related to each other, as has been previously described [17,18,33,36,51,52], with 88/85/ 0.92/67. However, the phylogenetic relationships  between these groups were not fully resolved in the partial gp51 analyses. Worldwide distribution of BLV genotypes based on partial gp51 sequences is summarized in Table 2. Genotype 1 is found in 10 countries (i.e., Korea, Japan, USA, Costa Rica, Argentina, Uruguay, Brazil, Iran, Australia, and Germany) and is the most prevalent genotype worldwide, whereas genotypes 4, 7, and 8 are mainly distributed in Europe and Russia. However, the second most widely distributed genotype, G4, is found more frequently in Europe and America. The prevalent BLV genotypes in Central and South America appear to have more variability, including G1, G2, G4, G5, G6, and G7. Among these, genotypes 5 and 6 were exclusively found in Argentinean, Brazilian, and Costa Rican isolates. Interestingly, genotype 3 isolates from the USA are closely related to those from Korea and Japan.

Genetic variation of the complete gp51 gene
The complete gp51 sequences in this study include 87 isolates with 903 nucleotides, of which, 142 nucleotides were parsimoniously informative. The average evolutionary divergence of the sequence pairs is 2.3%; similarly, 2.5% divergence was observed in the partial 444 bp from 109 sequences. Specifically, the minimum to maximum divergence range was 0.11% to 1.91% for genotype 1 and 0.33% to 1.58% for genotype 3. The estimated average divergence over sequence pairs within and between genotypes is described in Table 3.
More divergent isolate sequences were observed in genotype 2 (from Argentina, Brazil, and Japan), with an average at 1.42% for the complete gp51 sequence, whereas genotype 4 included diverse isolates at 1.38% for the partial 444 bp of gp51 (Table 3, diagonal line). Genotype 4 was found in 9 out of 20 countries, including Argentina, Chile, Belarus, Russia, Ukraine, Poland, Belgium, France, and Germany, and the nucleotide variation within the genotype varied from 0% to 2.79% (Figure 1, Tables 2 and 3). The average genetic distance between genotypes did not exceed 5% in either datasets, except for G5 to G3 (5.13%) and G5 to G2 (5.31%) for the complete gp51 sequence (Table 3). In general, genetic variation within genotype 1 appeared to be minimal compared to that of other genotypes.

Amino acid changes in the complete gp51 sequence
Eighty-seven deduced full-length amino acid sequences of BLV gp51 were aligned with the annotation of epitopes and functional domains as previously described [39,[53][54][55][56][57] (Table 4). Specifically, amino acid sequences from the Korean BLV isolates were compared to 87 BLV isolates found throughout the world. Various single amino acid changes caused by a point mutation were discovered over the full length of gp51. A total of 85 amino acid variations were observed among the 301 amino acid sites, and 40 of these (47%) were observed in multiple isolates with at least two types of amino acids. Table 4 represents the distribution of the amino acid substitutions based on the parsimony-informative sites, according to genotype. Significant amino acid changes were not observed in the majority of Korean BLV isolates from genotype 1 compared to other isolates from different countries. However, it is interesting that only two isolates, GNCN-10 and 11, have 3 amino acid substitutions at 12 (Q-> P), 23 (T-> A), and 28 (C-> S), located in the leader peptide region, and 2 changes at 54 (K-> Q) and 69 (S-> L), corresponding to the conformational epitope region. In particular, one substitution at 140 (V-> A) in GNCN-1, GNCN-2, and GNCN-3 from Gyeongsangnam-do, occurs in the region of the second neutralizing domain (ND2), which is a binding zinc peptide that interacts with zinc ions and viral fusion proteins [58]. In addition, two isolates, GBGS-6 and 7, in Gyeongsangbuk-do had another amino acid change at 134 (D-> N) in the same region. Additionally, the Korean genotype 3 isolates were identical with those from Japan and the USA, which are based on amino acid substitutions. Interestingly, amino acid substitutions were more highly variable in the regions containing the conformational epitope region (Table 4). These changes were manly found in genotypes 4, 5, 6, and 7, encompassing European, Russian, and South American isolates. Additionally, some of the amino acid changes had a trend towards geographic clustering; for instance, substitutions found at 73 (A-> P) and 121 (R-> H) were only found in Russian and European isolates and substitutions at 15 (I-> V) and 37 (S-> T) were only found in Costa Rican isolates.

Discussion
The phylogenetic analysis presented in this study supports previously established data indicating that BLV has eight genotype groups. We also discovered unknown phylogenetic relationships between newly (See figure on previous page.) Figure 1 Maximum-likelihood phylogenetic tree of the partial gp51 gene (444 bp) sequences from different geographic regions. Korean isolates are shown in blue bold-italic names. The remaining isolates in the tree are denoted by country of origin, author with published date (or directly submitted date), and accession number. Genotypes 1 through 8 are indicated by vertical lines with the symbol 'G'. Two numbers at the branches indicate ML bootstrap support values (RaxML/Garli). An asterisk indicates unpublished or direct submission to GenBank sequences. + denotes that the selected sequences were used for the complete gp51 analyses. *Means that the sequences have not yet been investigated by a published phylogenetic analysis. The tree is rooted on genotype 5. analyzed sequences (i.e., more recently unpublished sequences from 2014), including isolates from Korea. Due to the different number of isolates for each genotype and two different gp51 regions sequenced for the two datasets (i.e., 903 bp and 444 bp), the phylogenetic results and genetic divergence values for the isolates are not the same in all analyses. Nevertheless, the eight genotypes of BLV were completely identified through comprehensive phylogenetic analyses. Our results are consistent with previous studies [17,18].
Maximum likelihood and Bayesian approaches showed that a complex of several BLV genotypes has worldwide distribution, and phylogenetic separation of the eight major genotypes was well resolved by all of the analyses. Additionally, we found that the phylogenetic relationships among the genotypes are correlated to the (See figure on previous page.) Figure 2 A phylogenetic tree based on the full-length gp51 gene sequences of BLV isolates. Korean isolates are shown in blue bold-italic names. The remaining isolates in the tree are denoted by country of origin, author with published date (or directly submitted date), and accession number. Numbers at nodes indicate bootstrap support values for Maximum likelihood (RaxM:1, Garli:2) and posterior probabilities for Bayesian inference (MrBayes:3 (0-1), Phycas:4 (0-100)). *Means that the sequences have not yet been investigated by a published phylogenetic analysis. The tree is rooted on genotype 5. The number in each column corresponds to each country's prevalent BLV genotype, and the genotype in bold italic numbers denote that their sequences were excluded for accurate phylogenetic analyses because they were too short (France: M35241), very divergent (Brazil: DQ059417), or of unknown origin (USA: AF033818). However, they are closely related to their corresponding genotypes with a high degree of sequence similarity. 2 A reference list of sequences used for phylogenetic analyses and genotyping in this study. Unpublished and direct submission to GenBank sequences are marked with an asterisk in the phylogenetic trees (Figures 1 and 2, Additional file 1: Figure S1).
geographical origin of the isolates (e.g., G4-G7 in Europe/Russia, G5-G6 in South America) based on the complete gp51 sequence. As shown in Figures 1 and 2, genotype 1 is the most dominant BLV type in the world and is found in Australia, Iran, USA, Argentina, Brazil, Uruguay, Costa Rica, Japan, and Germany. Genotype 3 included only isolates from three geographically distinct countries, Japan, the USA, and Korea. Similarly, genotypes 5 and 6 were exclusively found in three countries, Costa Rica, Argentina, and Brazil in Central and South America. In addition, isolates of genotypes 8 and 7 mostly originated from Eurasian areas, excluding Korea and Japan. Noticeably, genotypes 1 and 4 covered large geographic areas from Europe to America, suggesting the possibility of extensive cattle trading between countries (Table 2). Phylogenetic analyses of the complete gp51 gene revealed that 17Korean BLV isolates belonged to genotypes 1 and 3 (Additional file 1: Figure S1 and S2, Additional file 1: Figure S1). Seven genotype 1 isolates were found in Gyeongsangbuk-do (GBGS) and 8 isolates were found in Gyeongsangnam-do (GNCN). Indeed, genotype 1 was reported in Korea in 2009 [24] and was expected to be the dominant form found in our study. Two isolates from the Gyeongsangbuk-do region were assigned to genotype 3, a result that was also seen in a recent Japanese study [36]. Considering the geographic affinity of the two countries, these results are not unusual.
As shown in Table 3, the average distance between genotype 1 isolates (0.74% for the complete gp51 sequence and 0.83% for the partial gp51 sequence) was relatively lower than those observed within G2-G8. The low level of genetic diversity did not reflect geographic barriers, but perhaps indicates a single origin of BLV infection or international importation of cattle across the border from several countries.
Analyses of the BLV gp51 amino acid sequences in isolates collected from multiple geographic locations showed specific sequence conservation depending on the genotype (Table 4).
Considering this result, a few nucleotide mutations or amino acid substitutions may affect the ability of BLV to survive and may infect hosts. The gp51 region characterized by amino acid changes of several functional sites remained unchanged over a long period in this study (e.g., 216/301 conserved region (72%)) ( Table 4). Based on Korean genotype 1 BLV isolates, several amino acid changes were observed at positions 12, 23, and 28 in the leader peptide region and at positions 54 and 69 in the conformational epitope region in only two isolates from the GNCN locality of Gyeongsangnamdo. In addition, unique antigenic variations were identified in the second neutralizing domain, position 140 of GNCN-1, 2, and 3 along with position 134 of GBGS-6 and 7. The biological significance of these changes in the Korean isolates needs to be investigated by functional studies in vivo, but the changes in the conformational epitope overlapping with the second neutralizing domain region indicate a possibility of them affecting viral infectivity and formation.

Conclusion
This study shows that two genotypes of BLV are circulating throughout Korea. It is notable that most of these isolates belong to genotype 1 (46 out of 50 isolates), which is referred to as the dominant BLV genotype around the world, whereas only 4 isolates belong to genotype 3. Further study to identify the diversity of BLV circulating in Korea is necessary with samples collected nationwide because we identified genotype 3 in Korean isolates. Two distinct genotypes imply that there were two independent introduction events in Korea. At this point, the exact geographic origin of the Korean isolates remains uncertain, even though they are highly similar (99%) to Costa Rican (EF065640) and Japanese (LC007985) genotype 1 isolates and highly similar (99%) to Japanese (EF065650) and USA (EF065648) genotype 3 isolates. The values in bold along the diagonal are the average percentage within-genotype divergence (left, partial 444 bp of gp51; right, complete gp51 gene). 2 NA denotes that calculation is not available due to a single data. 3 Maximum and minimum divergences are represented by bold italic numbers for each dataset.   Japan (Sagata,1985) [56] K02120 C V Japan (Zhao, 2007)

Additional file
Additional file 1: Figure S1. A Bayesian inference phylogenetic tree based on the partial gp51 (444 bp) sequences from different geographic regions. Korean isolates are shown in blue bold-italic names. The remaining isolates in the tree are denoted by country of origin, author with published date (or direct submitted date), and accession number. Genotypes 1 through 8 are indicated by vertical lines with the symbol 'G'. Two numbers at the branches indicate BI posterior probabilities values (MrBayes(0-1)/Phycas (0-100)). An asterisk indicates unpublished or direct submission to GenBank sequences. + denotes that the selected sequences were used for the complete gp51 analyses. *Means that the sequences have not yet been investigated by a published phylogenetic analysis. The tree is rooted on genotype 5. Table S1. Accession number and geographic information regarding the fifty Korean BLV isolate sequences used in this study.