Detecting transmission and reassortment events for influenza A viruses with genotype profile method

Evolutionary events of transmission and reassortment for influenza A viruses were traditionally detected by phylogenetic analysis for influenza viruses' eight gene segments. Because the phylogenetic analysis can be complex, we developed genotype profile method which packaged the phylogenetic algorithms to analyze combination patterns of gene segments and integrated epidemiology knowledge. With the method, the analysis of reassortment and transmission becomes a simple and reliable process that combines genotypes, which is identical for the biological process of the virus. An application called IVEE that implements the method is available for all academic users to apply the method http://snptransformer.sourceforge.net. Furthermore, we found that a previous summary of the reassortment events in swine influenza A viruses may be inaccurate.


Background
Influenza A viruses annually cause seasonal epidemics and occasional global pandemics in humans. Three pandemics occurred in the 20th century, in 1918, 1957 and 1968, which were the result of the transmission of avian viruses or a reassortment between human and avian viruses that greatly changed virus antigenicity [1,2]. At the end of 2008 or the beginning of 2009, a novel swine reassortant was transmitted to humans [3], and a global pandemic broke out in Mexico and USA in April 2009 [4]. Researchers confirmed that the reassorted virus consisted of six gene segments that emerged from triplereassortant viruses circulating in North American swine and two gene segments from Eurasian avian-like swine H1N1 viruses [3,4]. Because the common ancestor of the new swine-origin influenza A (H1N1) virus (S-OIV) and its most closely related swine viruses existed approximately 10 years ago, the reassortant viruses may have been circulating in pigs for several years before their transmission to humans [3]. Due to the lack of swine surveillance, the details regarding the reassortment event are unclear.
Phylogenetic analysis has been an essential method for research into the molecular evolution of influenza A viruses, especially for cross-host transmission and reassortment). Holmes et al. sequenced 156 complete genomes of human H3N2 influenza A viruses collected between 1999 and 2004 from New York, USA, and phylogenetic analysis revealed that multiple reassortment events had occurred among the co-circulating clades [5]. Nelson et al. showed that segmental reassortment has played an important role in the genomic evolution of H1N1 since 1918 and that intra-subtype reassortment appeared to be an important process in the evolution and epidemiology of H1N1 influenza A virus [6]. Nelson et al. found that multiple clades of both H1N1 and H3N2 entered and co-circulated in the United States during the 2006-2007 influenza season, even in localities that were distant from major metropolitan areas [7]. These data were concordant with other research by the same group concluding that the stochastic processes of viral migration and clade reassortment played a vital role in shaping short-term evolutionary dynamics [8]. Vijaykrishna et al. discovered a novel swine reassortant in Hong Kong containing genes from both 2009 S-OIV and triple-reassortant virus which implied that swine might be a reservoir of reassortment for 2009 S-OIV [9]. Li et al. revealed avian reassortment patterns of highly pathogenic avian influenza (HPAI) H5N1 virus in eastern Asia [10] and the HAPI H5N1 virus had crosshost transmitted to human and caused fatal respiratory illness [11].
Traditionally, transmission and reassortment events have been mostly revealed by separate phylogenetic analysis for the eight gene segments [5][6][7][8][9][10][11]. This analysis method is not so straightforward, and the key process lies in identifying the lineages to which each gene segment belongs, a process that requires professional knowledge about numerous virus lineages. Rabadan et al. proposed an interesting method for revealing potential reassortment that calculates the paired nucleotide differences of the third codon positions between the same segments of any two virus strains [12,13]. If the two viruses have a common origin, the differences between all eight segment pairs should be proportional. In contrast, a violation of this rule probably indicates potential reassortment events. The method sounds reasonable; however, several factors may interfere with the calculation of the differences, such as time since divergence, number of generations and geographical isolation. Most importantly, it is difficult to parse the exact parents for potential reassortment. Lu et al. introduced the concept of genotype to define gene segment combinations [14] and developed an online tool called FluGenome [15] to determine genotypes for influenza A viruses and to detect virus reassortment in theory. Lineages for each segment are assigned by a cutoff of 10% nucleotide difference by p-distance in the phylogenetic tree of all nearly complete sequences of influenza A viruses (see Figure 1). The genotypes can be determined by comparing the genomic sequences of new viruses with the genome database using the BLAST algorithm. The best BLAST results are used to assign lineages to the viruses and create genotypes by the sequential combination of the lineages for each segment in the gene order. Thus, the nomenclature of influenza A viruses consists of all eight gene segments, rather than the serotype of hemagglutinin and neuraminidase alone. Reassortment can be detected, in theory, by combining the known genotypes in the database. Unfortunately, FluGenome only provided the process for determining the genotypes, and the analysis process for the reassortment was not implemented directly. Furthermore, the hundreds of genotypes that are collected in the database complicate the analysis even in theory.
In this study, we introduce the concept of "genotype profile" based on "genotype" to describe classic or dominant virus strains. With genotype profiles, the genotypes for the viruses were divided into several basic genotypes and various rare genotypes that may be transmitting viruses or reassortants. Employing genotype profile method, analyzing reassortment and transmission events is a simple and reliable process that combines basic genotypes, similar to the biological process of the virus, while complex phylogenetic analysis is packaged under the method. An application called IVEE that implements the method is available for all academic users to apply the method.

Methods
Genotypes "Genotype" was previously defined by Lu et al. as a sequential combination of the lineages for each of the eight segments in an influenza virus genome [14]. A letter was assigned to each lineage of PB2, PB1, PA, NP and M, and a number followed by a letter was assigned to each lineage of HA, NA and NS, with the number representing the serological subtype or allele. We downloaded the genotype information for all influenza A viruses of humans, swine and avian species in the database on 10 th Sep, 2010 from FluGenome [15]. In total, there were 3161, 324 and 2572 genomes for human, swine and avian viruses, respectively, and the number of types of corresponding genotypes was 26, 40 and 397, respectively. Two virus strains (A/Texas/09/2009 and A/Canada-ON/ RV1527/2009) were used to determine the genotypes of 2009 S-OIV because 2009 S-OIVs and their genotypes were not included in the FluGenome database till now. We downloaded their complete genomic sequences from the NCBI Influenza Virus Resource [16] and determined their genotypes via FluGenome. Most of the lineages of gene segments could be confirmed directly; however, the lineage of NA was assigned using looser parameters (80% coverage and 80% identity).

Genotype profiles
We defined genotype profiles as lists of all genotypes of classic or predominant virus strains for humans, swine or avian species to filter hundreds of not so important genotypes and import epidemiological knowledge for viruses (see Table 1). With such a definition, the genotypes for the viruses could be divided into common genotypes and rare genotypes. Common genotypes meant that these genotypes occur frequently and mainly refer to classic or predominant virus strains, whereas rare genotypes have low frequencies. However, in some cases, such as human infecting, highly pathogenic avian H5N1 viruses, they should not be considered genotypes in the human genotype profile (we name them basic genotypes) because they have not yet adapted to human hosts, despite the fact that these genotypes were observed in human hosts with high frequencies. Similarly, some reassortants that have not adapted to their hosts may be isolated with high frequency due to frequent sampling. Thus, the difference between common genotypes and basic genotypes depends on whether the hosts are natural hosts or new hosts after adaptation and are judged based on epidemiological knowledge of influenza A viruses. We further assumed that most of the viruses that have rare genotypes are transmitting viruses or reassortants that emerged from combinations of basic genotypes, and would test the hypothesis later. Occasionally some rare genotypes such as [A, A, A, 1A, A, 1A, B, 1A] for A/Brevig Mission/1/1918 had low frequencies due only to the lack of sampling. These genotypes were all excluded from the genotype profiles to decrease the complexity of the genotype profiles. Genotype profiles for human, swine and avian species were established by the following steps: (i) divide genotypes for the viruses into common genotypes and rare genotypes with a cutoff of 5 for genotype frequencies; (ii) common genotypes were further checked based on knowledge of the evolutionary history of influenza A virus to distinguish basic and non-basic genotypes; (iii) basic genotypes in genotype profiles were further classified into groups and subtypes; (iv) rare genotypes and non-basic genotypes were analyzed following the process for detecting transmission and reassortment events described below.

Detecting transmission and reassortment events
After the genotypes of the candidate viruses are determined, the genotypes are compared to genotype profiles Figure 1 Workflow of detecting transmission and reassortment events using genotype profile method. Left part: FluGenome [14] determines genotypes for influenza A viruses. Right part: Candidate genotypes are compared to genotype profiles of human, swine and avian ( Table 1 Additional file 1, Table S1) to test whether they are of following genotypes: basic genotypes, transmission genotypes, reassortment genotypes or novel genotypes. An application called IVEE is freely available to facilitate the detecting process.
to detect transmission and reassortment events (see Figure 1). The detecting process includes two steps. First, ascertain whether the genotypes exist in the genotype profiles. If the genotypes of candidate viruses exist in the genotype profiles, the genotypes are basic genotypes as long as the sampling hosts are the same as the hosts of the basic genotype or transmission genotypes when they are different. Thus, the virus is a transmission virus if it has switched hosts from the natural host to the host from which it was sampled. For example, the frequent observation of avian genotypes [K, G, D, 5J, F, 1J, F, 1E] and [G, G, E, 5J, F, 1G, F, 1E] in humans provided evidence that humans were infected with avian H5N1 viruses (see Table 2). Otherwise, the genotypes are considered potential reassortants only if they do not exist in the genotype profiles. Next, observe the characteristics of the lineages of the genotypes and infer the possible reassortment parents by combining the basic genotypes of the genotype profiles. In most cases, the reassortment parents and how they were combined can be easily concluded, except for some novel genotypes that have emerged from unknown origins. We can then test the hypothesis that most of the viruses that have rare genotypes are transmitting viruses or reassortants.     1A] (see Table 2). They proposed that the virus was originally transmitted from avian host and adapted to human before the 1918 pandemic [1,2,17,18] while some researchers argued that the evidence for the avian-origin hypothesis was not enough and it might be the result of a reassortment or a recombination between human and swine viruses [19][20][21][22]. It's hard to solve the debate exactly due to the lack of sampling viruses dominating in that era and the factors such as constraining selection and reassortment/recombination complicates phylogenetic analysis [19][20][21][22][23][24] Table 2) [35]. The latter was also reassorted from triple-reassor-  Table 1 and Additional file 1, Table S1). Additional file 1, Table S1 shows genotypes and typical hosts for multiple subtypes of waterfowl and domestic poultry. For example, waterfowl such as mallard ducks and green-winged teals mainly have the waterfowl genotype pattern regardless of their serotypes. Interestingly, the domestic poultry genotype was present in the viruses of subtype H5N1 isolated from mallards. This may have been the result of bi-directional virus exchange [10,36]. Similarly, most genotypes of chicken and duck viruses maintain the domestic poultry genotype pattern, whereas some of the viruses have waterfowl genotypes.

Conserved lineages and genotype codes
From genotype profiles, various conserved lineages of genotypes could be observed (see Table 1). For example, lineage A for PB2 is conserved in human H1N1 viruses, whereas lineage B is conserved in classic swine, lineage C is conserved in waterfowl and lineage K/G is conserved in domestic poultry.  ), most of which were transmitted from swine to turkeys (data not shown). The latter was also confirmed by Olsen et al. in Canada [38]. Because avian viruses have complex genotypes and abundant genetic diversity, it is difficult to trace transmission and reassortment events among avian species. However, it is still possible to infer potential cases of transmission and reassortment between waterfowl and domestic poultry because their genotype patterns are different.

Detecting transmission and reassortment events
Comparing swine reassortment patterns with previous work Table 3 lists the reassortment events in swine influenza A viruses detected by Rabadan's method [13] in the left part of the table. Most gene segments have one of the two swine-origin lineages (S1 and S2) except for some lineages being avian (A) or human (H) derived. Due to the mixed lineages of S1 and S2, the gene segments may be the result of reassortment. To our surprise, we found that most of the virus strains belonged to the classic swine, triple-reassortant swine and Eurasian avian-like swine virus groups when the results of the genotype profile method were analyzed. Rabadan's distance-based method was based on such a rule: the nucleotide differences at the third codon position between two segments of two strains should be proportional if the two segments have a common origin. A violation of this rule indicates that the co-occurrence of two segments may be the result of reassortment events. The rule will experience problems when the reference strains and segments for swine viruses are not selected carefully. For example, if a classic swine strain is set as the reference strain, then all triple-reassortant viruses will be classified as reassortants because some segments originate from classic swine viruses, whereas others do not.   [41]. However, Rabadan assumed that all gene segments except HA had the same lineages as classic swine. In fact, the H1N2 reassortant viruses were derived from traditional H3N2 triple reassortants and H1N1 classic swine viruses. Phylogenetic analysis confirmed the results of genotype profile method (see Additional file 1, Table S2 and Additional file 1, Figure S1). These examples show that our method is easy to use and accurate for analyzing reassortment process, whereas the distance-based method has difficulty dealing with complex situations, although it does work in certain cases.  1A] are true reassortants that can be determined by genotype profiles. However, for H1N1 strains belonging to the classic swine viruses, the genotype profile method could not identify whether they are reassortants because our method has difficulty inferring intra-subtype reassortment events within the same host, as will be discussed below. The greatest advantage of the genotype profile is that it is a straightforward method following the real viral reassortment process. In fact, the phylogenetic analysis process is hidden under the definition and construction of genotypes [14]. As shown in Figure 1, lineages were defined as significant clusters (about 10% nucleotide difference by p-distance) in phylogenetic trees constructed by all viruses with full genomic sequences. There is no need to reconstruct the phylogenetic trees as long as the shape of the trees has not changed due to epidemiology and no novel lineage needs to be assigned. The determination of genotypes is then performed by finding a position for the virus in the established trees using a BLAST algorithm. The genotype profile method takes advantage of the phylogenetic algorithms' inferring from sequences, integrates epidemiology knowledge into genotype profiles and keeps the epidemiological interfaces for the users to assemble.
An application (see Figure 2) called "Analysis Tool for Influenza A Virus Evolutionary Events (IVEE)" that implements genotype profile method is available for all academic users from the website http://snptransformer. sourceforge.net. In current version, the genotype profiles as shown in Table 1 are embedded and fixed in the program by the authors. The custom genotype profile function will be implemented in the next version, which allows the users to edit the genotype profiles to satisfy various research demands such as updating the newlydiscovered genotypes and studying the historic reassortment events. Furthermore, each genotype will be associated with a representative virus strain to help read the analysis results.
The genotype profile based method still had limitations when inferring intra-subtype reassortment within the same host. For example, nearly all human H3N2 viruses have the genotype [A, D, B, 3A, A, 2A, B, 1A], which restrains distinguishing intra-subtype reassortment events as Holmes [5] and Nelson have done [6]. The reason is that a cutoff of a 10% nucleotide difference by p-distance for defining lineages in the phylogenetic tree is too coarse to distinguish some virus clades among subtypes, although it is sufficient for inter-subtype reassortment. One possible solution is to define clades under lineages by FluGenome with lower cutoffs such as 1%. In fact, the WHO/OIE/FAO H5N1 Evolution Working Group has developed a web tool for HPAI H5N1 HA clade prediction http://h5n1.flugenome.org based on FluGenome with average distances of ≧1.5% between clades. The tool and its unified nomenclature system for the HA clade designation of HPAI H5N1 viral strains were used later to facilitate resolution of the nomenclature problem and to make comparisons among virus clades easier across publications. However, the clade designation was designed only for HA without taking into account other gene segments. We suggest that a similar international committee should be established to assign unified clades for all segments of influenza A viruses. This will greatly help research and increase our knowledge of the evolutionary tendency of influenza A viruses.

Conclusions
In conclusion, we extended the concept of "genotype" to "genotype profile" to describe classic or dominant virus strains and constructed the genotype profiles for influenza A viruses of humans, swine and avian species. Genotype profiles not only decrease the complexity of combinations of hundreds of genotypes but also provide epidemiological information of influenza A viruses for the analysis. With genotype profile method, the analysis of reassortment and transmission events is a simple and reliable process that combines genotypes. We detected various transmission and reassortment events from rare genotypes stored in FluGenome and found that one previous summary of the reassortment events in swine influenza A viruses may be inaccurate. Using genotype profile method, surveillance for virus transmission and reassortment becomes straightforward and it's possible to setup an automatic surveillance system for detecting such evolutionary events.

Additional material
Additional file 1: Table S1. Genotype profiles for avian influenza A virus. Table S2. The results compared by phylogenetic analysis and genotype profile method. Figure S1. Phylogenetic trees for influenza A virus strains.