Diversity and distribution of type A influenza viruses: an updated panorama analysis based on protein sequences

Background Type A influenza viruses (IAVs) cause significant infections in humans and multiple species of animals including pigs, horses, birds, dogs and some marine animals. They are of complicated phylogenetic diversity and distribution, and analysis of their phylogenetic diversity and distribution from a panorama view has not been updated for multiple years. Methods 139,872 protein sequences of IAVs from GenBank were selected, and they were aligned and phylogenetically analyzed using the software tool MEGA 7.0. Lineages and subordinate lineages were classified according to the topology of the phylogenetic trees and the host, temporal and spatial distribution of the viruses, and designated using a novel universal nomenclature system. Results Large phylogenetic trees of the two external viral genes (HA and NA) and six internal genes (PB2, PB1, PA, NP, MP and NS) were constructed, and the diversity and the host, temporal and spatial distribution of these genes were calculated and statistically analyzed. Various features regarding the diversity and distribution of IAVs were confirmed, revised or added through this study, as compared with previous reports. Lineages and subordinate lineages were classified and designated for each of the genes based on the updated panorama views. Conclusions The panorama views of phylogenetic diversity and distribution of IAVs and their nomenclature system were updated and assumed to be of significance for studies and communication of IAVs. Electronic supplementary material The online version of this article (10.1186/s12985-019-1188-7) contains supplementary material, which is available to authorized users.

Phylogenetic diversity and distribution of IAVs have been analyzed in a panorama view 9 years ago [1][2][3][4][5]. Now the analysis needs to be updated because some sequences of the viruses (e.g. A/swine/Quebec/4001/2005, A/mink/Nova Scotia/1055488/2007) in GenBank have been revised, and many novel sequences have been uploaded in GenBank, and some novel subtypes, lineages and subordinate lineages of IAVs have emerged. Meanwhile, the analysis capacity of computers also has increased so greatly that much more sequences can be covered in phylogenetic analysis than several years ago, to better reveal the panorama views.
It remains challenging to invent a rational, concise, universal nomenclature system for all lineages and subordinate lineages of IAVs, although a couple of numeral nomenclature systems have been proposed or used for all or some lineages and subordinate lineages of IAVs. For example, clades 2.3.4.4, 2.3.2.1 and 7.2 have been used for designations of some subordinate lineages of the H5 subtype highly pathogenic avian influenza viruses (HPAIVs) circulating in the Eastern Hemisphere in recent years [18,25,26]. This nomenclature can be simplified if some of the numbers are replaced by letters, e.g. clade 2.3.4.4 can be simplified as clade 2c4d which is more convenient for communication.
In this report, thousands of protein sequences of each of the genes of IAVs were aligned, and their phylogenetic trees were constructed. Then lineages and subordinate lineages of the viruses were designated according to the topology of the phylogenetic trees and the host, temporal and spatial distribution of the viruses using a novel universal and concise nomenclature system with the aim to update the panorama views of IAVs.

Download and selection of sequences
Viral proteins sequences were downloaded from the Influenza Virus Resource at the website of https://www. ncbi.nlm.nih.gov/genomes/FLU/Database/nph-select.cgi [27]. As for those internal genes and those subtypes of HA or NA genes with ≥2000 downloaded sequences, only the sequences with all amino acid residues revealed were selected for further analysis. For those internal genes and those subtypes of HA or NA genes with < 2000 downloaded sequences, only the sequences with ≥80% amino acid residues revealed were selected for further analysis.

Phylogenetic analysis of selected sequences
The selected sequences were aligned by the MUSCLE method using the software package of MEGA 7.0 (https:// www.megasoftware.net/) [28,29]. The phylogenetic relationships among the aligned sequences were calculated with the neighbor-joining method under the Poisson model [28,30]. Substitution rates among sites were set in Gamma distribution (α = 1.0) and gaps in the sequences were treated in pairwise deletion.

Host distribution of the HA sequences
The hosts of 32,623 of the 33,066 HA sequences were known. Of the 32,623 HA sequences, 13,581 were from birds, 12,460 from humans, 6276 from pigs, 155 from horses, 87 from dogs, 16 from cats, 9 from seals, 7 from tigers, 6 from ferrets, 5 from bats, 5 from minks, 4 from pika, 2 from whales, 1 H1 from cheetah, 1 H3 from donkey, 1 H1 from giant anteater, 1 H5 from leopard, 1 H5 from lion, 1 H2 from muskrat, and 1 H1 from sloth bear ( Table 1). These data suggested that birds, humans, pigs are the major hosts of IAVs. Also showed in Table 1, most AIVs belong to H9, H5, H6, H7, H3 or H4 subtypes, and most HuIVs and SIVs belong to H1 or H3 subtypes, and most EIVs belong to H3 or H7 subtypes, and most CIVs belong to H3 subtype.

Host distribution of the NA sequences
The hosts of 26,792 of the 27,026 NA sequences were known. Of the 26,792 sequences, 10,599 were from birds, 10,157 from humans, 5717 from pigs, 160 from horses, 109 from dogs, 14 from cats, 9 from minks, 6 from seals, 5 from bats, 4 from ferrets, 1 N8 from camel, 1 N1 cheetah, 1 N8 from donkey, 1 N1 from lion, 1 N6 from muskrat, 6 N1 from tigers, and 1 N9 from whale ( Table 2). These data suggested that birds, humans and pigs are the major hosts of IAVs. As showed in Table 2, most AIVs, HuIVs and SIVs belong to N1 or N2 subtypes, and most EIVs belong to N7 or N8 subtypes, and most CIVs belong to N2 or N8 subtypes.

Temporal distribution of the HA sequences
The temporal distribution of 33,018 of the 33,066 HA sequences were known and given in Table 3, which suggested that the sequences increased more and more rapidly after 1970. Sequences of H1, H3, H5, and H9 were significantly more than those of other subtypes after 2000.

HA subtype distribution of the NA sequences
Of the 27,026 NA sequences, 26,133 were of known HA subtypes, and the HA-subtype distribution of the 26,133 NA sequences was given in Table 4, which suggested that H1N1, H1N2, H3N2, H5N1, and H9N2 were the predominant subtypes circulating worldwide.
The above phylogenetic relationships were largely consistent with our previous study in 2009 which classified H1 subtype IAVs into three primary lineages, h1.1-h1.3 [31]. Of them, h1.1 (AIVs) corresponded to H1.1 in this report, and h1.2 and h1.3 corresponded to H1.2 in this report. Moreover, h1.2 corresponded to H1.2a and H1.2d in this report, and h1.3 corresponded to H1.2b and H1.2c in this report. We put h1.2 and h1.3 into the same primary lineage (i.e., H1.2) in this report because they likely shared the same origin (i.e., H1.2a) and the same host range (humans and pigs), and they were located relatively together in the phylogenetic tree. The sequences used for analysis of phylogenetic relationships were much more in this study than in our previous study in 2009, and consequently, this report exhibited more clades of H1 subtype SIVs than our previous study.
Consistent with our previous study [31], here we did not classify H1 subtype IAVs into H1 subtype AIVs, HuIVs and SIVs. This is because the avian lineage H1.1 also comprised a major group of SIVs (i.e., H1.1c), and the swine lineage H1.2c comprised a major group of HuIVs (i.e., H1.2c1). Additionally, although some groups of IAVs within a lineage of H1 subtype were of significant genetic distances between each other, we did not designate them as subordinate lineages in this report, since these AIVs were of no significant difference with respect to the host, temporal or spatial distribution.
Phylogenetic diversity and distribution of H2 subtype IAVs based on HA sequences Figure 2 (the thumbnail of the phylogenetic tree) and Additional file 2: Figure SI_2 (the original phylogenetic tree) both suggested that, from a panorama view, H2 subtype IAVs could be classified into two primary lineages, H2.1 and H2.2, based on their HA sequences, largely corresponding to the AIVs isolated from the Western and Eastern Hemispheres, respectively.
H2.2 was more complicated than H2.1, including some HuIVs and AIVs from multiple continents. Two secondary lineages within H2.2, H2.2a and H2.2b, were designated. H2.2a corresponded to H2N2 subtype HuIVs which circulated in the world during the period 1957-1968, indicating that the HA gene of the HuIVs originated from AIVs. H2.2b corresponded to some AIVs from North America. Unlike H1 subtype IAVs, no H2 subtype IAVs have ever been isolated from pigs with only two exceptions from the USA in 2006.
The above phylogenetic relationships were largely consistent with our previous study in 2009 which classified H2 subtype IAVs into two primary lineages, h2.1 and h2.2, which corresponded to H2.1 and H2.2 in this report [31]. Similar to H1 subtype, here we did not designate subordinate lineages in this report for some groups of IAVs which were of significant genetic distances between each other, since these AIVs were of no significant difference with respect to the host, temporal or spatial distribution.
Phylogenetic diversity and distribution of H3 subtype IAVs based on HA sequences Figure 3 (the thumbnail of the phylogenetic tree) and Additional file 3: Figure SI_3 (the original phylogenetic tree) both suggested that, from a panorama view, H3 subtype IAVs could be classified into three primary lineages, H3.1, H3.2 and H3.3, based on their HA sequences.
H3.1 corresponded to H3 subtype AIVs, and several secondary lineages within H3.1, H3.1a-H3.1c, were designated. H3.1a corresponded to some AIVs isolated from the Western Hemisphere. H3.1b corresponded to some AIVs isolated from the Pacific. H3.1c corresponded to many AIVs distributed worldwide. Two tertiary lineages within H3.1c were designated: H3.1c1 corresponded to many AIVs isolated from the Western Hemisphere, and H3.1c2 corresponded to some canine or feline IAVs which were similar to some AIVs from South Korea, indicating that these viruses likely originated from the AIVs circulating in South Korea. We did not allocate a subordinate lineage for the groups of AIVs isolated from the Eastern Hemisphere in this study, because these groups were similar to the viruses from the Western Hemisphere (H3.1c1) as compared to H3.1a and H3.1b (Additional file 3: Figure SI_3). Similar to H1.2d, H3.2 corresponded to many HuIVs and SIVs circulating worldwide. H3.2 covered multiple groups of HuIVs which circulated in humans during the periods from 1968 to nowadays, and multiple groups of SIVs circulating worldwide for decades. Although some groups of HuIVs in H3.2 were inserted by swine groups in the phylogenetic tree (Additional file 3: Figure SI_3), this should be mainly due to that some SIVs were similar to some HuIVs in protein sequences by chance rather than through cross-species transmission. That is to say, cross-species transmission of the viruses in H3.2 did exist, but cross-species transmission has not caused frequent replacement of HuIV lineages by SIV lineages in H3.2. This is because most HuIVs were in human-tohuman transmission, and most SIVs were in swine-toswine transmission, according to their epidemiology. On the other side, the average protein sequence identity of the HuIVs in H3.2 of the same decade after 1980 was around 97%, and its counterpart of the SIVs in H3.2 was around 91%, suggesting that SIVs in H3.2 were of more diversity than HuIVs in H3.2 (P < 0.01, by T test), and therefore it was possible that some SIVs were similar to some HuIVs by chance. With these considerations, we designated HuIV groups in H3.2 as H3.2a, and SIV groups in H3.2 as H3.2b, according to their hosts, although these two subordinate lineages inserted into each other in the phylogenetic tree. Nevertheless, the insertion suggested that H3.2a and H3.2b were of the same origin and possible crossspecies transmission.
Two subordinate lineages of H3.2b were designated, and H3.2b1 corresponded to a group of SIVs mainly circulating in the Eastern Hemisphere with few exceptions and H3.2b2 corresponded to a group of SIVs mainly circulating in the Western Hemisphere with some exceptions.
Our previous study in 2009 classified H3 subtype IAVs into three primary lineages, h3.1-h3.3 [31]. Of them, h3.1 corresponded to H3.1 and H3.2 in this report, and h3.2 corresponded to H3.3 in this report. Lineage h3.3 which covered two viruses, A/equine/ Argentina/1/2001(H3N8) and A/swine/Quebec/4001/ 2005(H3N2), disappeared in this report. This is because the sequence of A/swine/Quebec/4001/ 2005(H3N2) has been changed greatly and consequently this virus was located in the lineage H3.2b2 in this report, and the sequence of A/equine/ Argentina/1/2001(H3N8) might be wrong since its nucleotide sequence is quite different from all others (sequence identity < 85%) and its amino acid sequence is quite similar to some others (sequence identity > 99%). Actually, the panorama view of this study was largely consistent with our previous study in 2009 after removing lineage h3.3 which was likely based on false sequences. In addition, two canine lineages (H3.1c2 and H3.3a) and two avian lineages (H3.1a and H3.1b) were added in this report because the related viruses were isolated after our previous study in 2009. One swine lineage (H3.2b2) was also added in this report because this lineage was neglected in our previous study in 2009.
Phylogenetic diversity and distribution of H4 subtype of IAVs based on HA sequences Figure 4 (the thumbnail of the phylogenetic tree) and Additional file 4: Figure SI_4 (the original phylogenetic tree) both suggested that, from a panorama view, H4 subtype IAVs could be classified into three primary lineages, H4.1-H4.3.
H4.1 and H4.2 mainly corresponded to AIVs circulating in the North America, and H4.3 corresponded to AIVs from both Hemispheres. Two secondary lineages (H4.3a and H4.3b) within H4.3 were designated, which mainly corresponded to AIVs circulating in the Western and Eastern Hemispheres.
The results of H4 subtype was different with our previous study in 2009 [31]. Our previous study in 2009 which classified H4 subtype IAVs into two primary lineages, h4.1 and h4.2, and h4.1 corresponded to H4.3a in this report, and h4.2 corresponded to H4.3b in this Phylogenetic diversity and distribution of H7 and H15 subtypes of IAVs based on HA sequences Figure 5 (the thumbnail of the phylogenetic tree) and Additional file 5: Figure SI_5 (the original phylogenetic tree) both suggested that, from a panorama view, H7 subtype IAVs could be classified into three primary lineages, H7.1, H7.2 and H7.3, based on their HA sequences.
H7.1 and H7.2 mainly corresponded to H7 subtype AIVs circulating in the Western and the Eastern Hemispheres. H7.3 corresponded to H7 subtype EIVs which have disappeared after the 1970s. Three secondary lineages within H7.2 were designated, and H7.2a mainly corresponded to H7 subtype AIVs circulated in the period from the 1900s to the 1940s, with two surprising exceptions (A/duck/Taiwan/33/1993(H7N7) and A/duck/ Taiwan/Ya103/1993(H7N7)). H7.2b corresponded to some H7 subtype AIVs isolated in the Pacific. H7.2c corresponded to H7 subtype AIVs isolated in the Eastern Hemisphere. A tertiary lineage (H7.2c1) within H7.2c were designated, which largely corresponded to the H7N9 subtype AIVs isolated in China which have caused many human infections and fatalities since the year 2013. The H7N9 viruses were designated as A(H7N9) CN2013 below.
H15 subtype IAVs were analyzed along with H7 subtype IAVs because H15 subtype IAVs were of limited sequences and close to H7 subtype IAVs [31]. Figure 5 suggested that H15 subtype IAVs could be classified into two lineages, H15.1 and H15.2. H15.1 corresponded to some AIVs isolated in Australia in the 1970s and the 1980s. H15.2 corresponded to a few AIVs isolated from Russia around the year 2010.
The above phylogenetic relationships were largely consistent with our previous study in 2009 [31], except for the addition of lineage H15.2. Some H7 subtype AIVs without distinct distribution have not been designated as subordinate lineages, and on the other side, we did designate A(H7N9)CN2013 as a subordinate lineage, because this designation is needed to facilitate communication about the viruses of great biomedical significance.
Phylogenetic diversity and distribution of H9 subtype IAVs based on HA sequences Figure 6 (the thumbnail of the phylogenetic tree) and Additional file 6: Figure SI_6 (the original phylogenetic tree) both suggested that, from a panorama view, H9 subtype IAVs could be classified into two primary lineages, H9.1 and H9.2, based on their HA sequences.   [31]. Of them, h9.1, h9.2 and H9.3 constituted H9.1 in this report, and h9.4 was equal to H9.2 in this report. The lineages h9.1 and h9.2 were located together with h9.3 in this study possibly because more nucleotide substitutions of the viruses within h9.1 and h9.2 led to synonymous mutations, as compared with other lineages. Three secondary lineages were designated within the primary lineage h9.3 in our previous study, while no secondary lineages were designated for these viruses in this study, as in this report we paid less attention to genetic distance in the classification. Secondary lineages in the lineage h9.4 were designated in the same way as H9.2 in this study.
A secondary lineage (H5.1a) within H5.1 was designated because it corresponded to some strains mainly from Taiwan after 2012, and a secondary lineage (H5.2a) Fig. 9 Phylogenetic diversity and distribution of H8 subtypes of IAVs based on HA sequences. H8 subtype IAVs were classified into two primary lineages, which mainly corresponded to AIVs circulating in the Western and Eastern Hemispheres within H5.2 was designated because it was of distinct epidemiological significance as it corresponded to the H5 subtype HPAIVs widely circulating in the world in recent years. Multiple hierarchies of clades within H5.2a, such as clades 2, 2.3, 2.3.2, 2.3.2.1, and 2.3.2.1c, have been classified by an ad hoc expert group [18], and these clades have not been classified in this report because we paid more attention to major lineages of IAVs.
We designated two secondary lineages (H11.2a and H1.2b) within H11.2 which corresponded to some AIVs isolated from North America and the Antarctic, and one secondary lineage (H6.1a) within H6.1 which corresponded to many AIVs from both Hemispheres.
The results of the subtypes H5-H6, and H10-H12 were similar to those of our previous study in 2009 with minor revision [31]. Our previous study in 2009 which classified H5 subtype IAVs into three primary lineages, h5.1-h5.3 [31], and h5.1 corresponded to H5.1 in this study, and h5.2 and h5.3 corresponded to H5.2 in this study. Additionally, only one group of H5 subtype AIVs was designated as a subordinate lineage because of its distinct epidemiological significance rather than genetic distances, while multiple groups of AIVs were designated as subordinate lineages in our previous study in 2009. H8, H14 and H16 could not be classified into some primary lineages in our previous panorama study in 2009, and they were all classified into primary lineages mainly because more sequences have been available in GenBank after 2009.
Phylogenetic diversity and distribution of H13 subtypes of IAVs based on HA sequences Figure 15 (the thumbnail of the phylogenetic tree) and Additional file 15: Figure SI_15 (the original phylogenetic tree) both suggested that, from a panorama view, H13 subtype IAVs could be classified into two global primary lineages for H13 subtype. Each of them was classified into two secondary lineages, which mainly corresponded to the AIVs circulating in the Western The results of the subtype H13 were similar to those of our previous study in 2009 with minor revision [31]. Our previous study in 2009 classified H13 subtype IAVs into three primary lineages, h13.1-h13.3. The previous primary lineages h13.1 and h13.3 constituted H13.1 in this report, and h13.2 was equal to H13.2 in this report which was further classified into two secondary lineages mainly because novel sequences have been available in GenBank after 2009.

Phylogenetic diversity and distribution of H17 and H18 subtypes of IAVs based on HA sequences
We constructed the phylogenetic tree covering some sequences of H1-H16 subtypes and the strains of H17 and H18 subtypes based on their HA gene sequences as shown in Fig. 16 (the original phylogenetic tree). The phylogenetic tree suggested that the two bat subtypes (H17 and H18) were quite distinct in their sequences as compared with H1-H16 subtypes, and thus we proposed or supported such a hypothesis is that there should be more subtypes of IAVs in bats in the world.
Phylogenetic diversity and distribution of N1 subtype IAVs based on NA sequences Figure 17 (the thumbnail of the phylogenetic tree) and Additional file 16: Figure SI_16 (the original phylogenetic tree) both suggested that, from a panorama view, N1 subtype IAVs could be classified into three lineages, N1.1, N1.2 and N1.3, largely corresponding to avian, human and classical swine N1 subtype IAVs.
Three secondary lineages within N1.1 were designated, and N1.1a corresponded to a few of AIVs circulating in the 1930s, and N1.1b corresponded to some AIVs circulating in the Western Hemisphere, and N1.1c corresponded to many AIVs circulating in the Western Hemisphere (N1.1c1) and many AIVs circulating in the Eastern Hemisphere (N1.1c2).
N1.2 mainly corresponded to seasonal H1N1 subtype HuIVs circulating worldwide in humans before 2010. Four secondary lineages within N1.2 were further designated according to their circulation years. N1.2a corresponded to an H1N1 subtype HuIV Fig. 13 Phylogenetic diversity and distribution of H14 subtypes of IAVs based on HA sequences. H14 subtype IAVs were classified into two primary lineages, which mainly corresponded to AIVs circulating in the Western and Eastern Hemispheres    The above phylogenetic relationships were largely consistent with our previous study in 2009 which classified N1 subtype IAVs into three primary lineages, n1.1-n1.3 [31], which are equal to N1.1-N1.3 in this report. The secondary lineages of N1.1 (AIVs) was fewer than those of n1.1, but the level of subordinate lineages of N1.1 was more than that of n1.1 and a new lineage (N1.1c2b1) was added, due to different criteria for classification of subordinate lineages.
Phylogenetic diversity and distribution of N2 subtype IAVs based on NA sequences Figure 18 (the thumbnail of the phylogenetic tree) and Additional file 17: Figure SI_17 (the original phylogenetic tree) both suggested that, from a panorama view, N2 subtype IAVs could be classified into two primary lineages, N2.1 and N2.2, largely corresponding to avian and human/ swine IAVs.
N2.2 mainly corresponded to HuIVs and SIVs circulating worldwide. The groups of HuIVs were inserted by multiple groups of highly diversified SIVs within the phylogenetic tree (Additional file 17: Fig SI_17) Figure SI_17).
The above phylogenetic relationships were largely consistent with our previous study in 2009 which classified N2 subtype IAVs into two primary lineages, n2.1 and n2.2 [31], which were equal to N2.1 and N2.2 in this report. N2.1 (AIVs) was given less secondary lineages in this report than n2.1 in the previous report, but N2.1 was given more tertiary lineages than n2.1, with the effect to better exhibit the phylogenetic relationships of the viruses. N2.2 (HuIVs and SIVs) was given more secondary lineages in this report than n2.2 in the previous study.
Our previous study in 2009 identified that the nucleotide sequence of the NA gene of A/swine/Quebec/4001/2005(H3N2) was quite distinct from its   Figure SI_18, Additional file 19: Figure SI_19 and Additional file 20: Figure SI_20 (the original phylogenetic tree) suggested that, from a panorama view, N4, N5 and N6 subtypes of IAVs all could be classified into two primary lineages, which mainly corresponded to the AIVs circulating in the Western and Eastern Hemispheres. Additionally, one secondary lineage was designated within the primary lineage, N4.2a, N5.2a and N6.2a, which all corresponded to AIVs isolated from the Pacific. Two other secondary lineages were designated within N6.2, N6.2b and N6.2c, which both corresponded to H5N6 subtype HPAIVs.
The above phylogenetic relationships of N4-N6 subtypes were largely consistent with our previous Fig. 18 Phylogenetic diversity and distribution of N2 subtype IAVs based on NA sequences. N2 subtype IAVs were classified into two primary lineages, N2.1 and N2.2, largely corresponding to avian and human/swine IAVs study in 2009 which also classified these subtypes into two similar primary lineages [31]. However, our previous study in 2009 did not designate any secondary lineages for the primary lineages, and this report designated one to three secondary lineages for them due to their spatial distribution or biomedical significance.
Phylogenetic diversity and distribution of N3 and N9 subtypes of IAVs based on NA sequences  Figures SI_21-22 (the original phylogenetic tree) suggested that, from a panorama view, N3 and N9 subtypes of IAVs could be classified into two primary lineages. Three secondary lineages within N3.1 were designated, and they largely corresponded to the AIVs circulating in North America, South America and the Eastern Hemisphere, respectively. Like N3.1, three secondary lineages within N3.2 were designated, and N3.2a and N3.2b largely corresponded to the AIVs circulating in North America, and N3.2c largely corresponded to the AIVs circulating in the Eastern Hemisphere. Two secondary lineages were designated within N9.1 and N9.2 corresponding to the AIVs from the Western and Eastern Hemispheres, respectively. A tertiary lineage within N9.1b designated as N9.1b1 corresponded to A(H7N9)CN2013.
The above phylogenetic relationships were largely consistent with our previous study in 2009 which classified the two NA subtypes into two similar primary lineages [31]. However, our previous study in 2009 did not designate any secondary lineages for the primary lineages of n3.2 and n9.2, and this report designated two or three secondary lineages for them according to their spatial distribution. Moreover, one tertiary lineage emerging after 2009 (N9.1b1) was designated in this study due to its great significance in public health.   Figures SI_23-24 (the original phylogenetic tree) all suggested that, from a panorama view, N7 and N8 subtypes of IAVs could be classified into three primary lineages. N7.1 and N8.1 corresponded to AIVs circulating in the Western Hemisphere; N7.2 and N8.2 corresponded to AIVs circulating in the Eastern Hemisphere; N7.3 and N8.3 corresponded to EIVs. Additionally, a secondary lineage (N8.2a) within N8.2 corresponded to H5N8 subtype HPAIVs, and two secondary lineages (N8.3a and N8.3b) within N8.3 corresponded to H3N8 subtype CIVs.
The above phylogenetic relationships of N7 and N8 were largely consistent with our previous study in 2009 which classified N7 and N8 subtype IAVs into three primary lineages, the primary lineages were equal to the corresponding primary lineages in this study [31]. Unlike to the previous study in 2009, no secondary lineages were designated within N7.2 in this study. Meanwhile, more secondary lineages of N8.2 (AIVs) and N8.3 (EIVs) were designated than those of n8.2 and n8.3 because two secondary lineages (N8.3a and N8.3b) of H3N8 subtype CIVs emerged after 2009.

Phylogenetic diversity and distribution of N10 and N11 subtypes of IAVs based on NA sequences
We constructed the phylogenetic tree covering some sequences of N1-N9 subtypes and the strains of N10 and N11 subtypes based on their NA gene sequences, as shown in Fig. 26 (the original phylogenetic tree). The phylogenetic relationships among the strains of N1-N9 subtypes revealed by this tree were consistent with our previous report in  [31]. Similar to the H17 and H18 subtypes, the two bat subtypes (N10 and N11) were quite distinct in their sequences as compared with N1-N9 subtypes, and thus we proposed or supported such a hypothesis that there should be more subtypes of IAVs in bats in the world.
Phylogenetic diversity and distribution of IAVs based on PB2 sequences Figure 27 (the thumbnail of the phylogenetic tree) and Additional file 25: Figure SI_25 ( Figure SI_25).
Our previous study in 2009 classified the viral PB2 gene into eight primary lineages, S1.1-S1.8 [32]. Of them, the avian lineages S1.1, S1.2 and S1.7 corresponded to PB2.1 in this report, and the human lineage S1.3 corresponded to PB2.2 in this report, and the swine lineage S1.4 corresponded to PB2.3 in this report. The equine lineage S1.6 corresponded to PB2.4 in this report, and the other equine lineage S1.5 corresponded to the secondary lineage PB2.1c in this report. The viruses within S1.7 were located in PB2.1 in this study possibly because multiple nucleotide substitutions of the viruses within S1.7 led to synonymous mutations. S1.8 included two viruses, A/mink/Nova-Scotia/1055488/2007(H3N2) and A/swine/Quebec/4001/2005(H3N2), and this lineage Fig. 21 Phylogenetic diversity and distribution of N6 subtype IAVs based on NA sequences. N6 subtype IAVs were classified into two primary lineages, which mainly corresponded to the AIVs circulating in the Western and Eastern Hemispheres disappeared in this report because the nucleic acid sequences of the two viruses have been revised greatly, and now these two viruses were assigned into PB2.1b. Actually, the panorama view of this study was largely consistent with our previous study in 2009 after removing the lineages of S1.7 and S1.8. In addition, three canine lineages (PB2.1d, PB2.1b1a and PB2.1c1) were added in this report, and the related viruses were isolated after our previous study in 2009 or neglected by our previous study in 2009. Similarly, some other subordinate lineages, such as PB2.1b1 were added in this report because the related viruses were isolated after our previous study in 2009. We did not classify the human lineage PB2.2 into any secondary lineages because it is difficult to assign some intermediate strains to a secondary lineage. Actually, we selected some strains within PB2.2 to represent some secondary lineages within PB2.2. Some other subordinate lineages of PB2 were also represented by strains in the same way and for the same reason.
Phylogenetic diversity and distribution of IAVs based on PB1 sequences Figure 28 (the thumbnail of the phylogenetic tree) and Additional file 26: Figure SI_26 (the original phylogenetic tree) suggested that, based on 16,009 protein   Figure SI_26).
Our previous study in 2009 classified the viral PB1 gene into eight primary lineages, S2.1-S2.8 [32]. Of them, S2.1, S2.2 and most of the isolates of S2.6 (AIVs) corresponded to PB1.1 in this report, and S2.3 corresponded to PB1.2 in this report, and S2.4 corresponded to PB1.3 in this report, and S2.5 corresponded to the secondary lineage PB1.1 h in this report, and S2.8 corresponded to PB2.4 in this report. S2.7 including two viruses, A/mink/Nova Scotia/1055488/2007(H3N2) and A/swine/Quebec/4001/2005(H3N2), disappeared in this report because their nucleic acid sequences have been revised greatly, and now the two viruses were assigned into PB2.1 g. Actually, the panorama view of this study was largely consistent with our previous study in 2009 after removing the lineage S2.7. In addition, two canine lineages (PB1.1 h1 and PB1.1i) were added in this report, and the related viruses were isolated after our previous study in 2009 or neglected by our previous study in 2009. Similarly, some other subordinate lineages, such as PB1.1d were added in this report because the related viruses were isolated after our previous study in 2009. Figure 29 (the thumbnail of the phylogenetic tree) and Additional file 27: Fig SI_27 ( PA.4 corresponded to H7N7 subtype EIVs isolated in the 1950s and the 1960s. The equine lineage PA.1c was subordinate to PA.1 because the viruses within PA.1c were close to some AIVs within PA.1. This is different from the equine lineage PA.4 as all the viruses within PA.4 were distant from all the AIVs within PA.1 (Additional file 27: Fig SI_27).

Phylogenetic diversity and distribution of IAVs based on PA sequences
Our previous study in 2009 classified the viral PA gene into nine primary lineages, S3.1-S3.9 [32]. Of them, the avian lineages S3.1, S3.2, S3.6, and S3.7 corresponded to PA.1 in this report, and the human lineage S3.3 corresponded to PA.2 in this report, and the swine lineage S3.4 corresponded to PA.3 in this report. The equine lineages S3.9 and S3.5 corresponded to PA.4 and the secondary lineage PA.1c in this report, respectively. The lineage S3.8 including two viruses, A/mink/Nova-Scotia/1055488/2007(H3N2) and A/swine/Quebec/ 4001/2005(H3N2), disappeared in this report because their nucleic acid sequences of the two viruses have been revised greatly, and now these two viruses were assigned into PA.1b. Actually, the panorama view of this study was largely consistent with our previous study in 2009 after removing the lineage S3.8. In addition, three canine lineages (PA.1b1a, PA.1c1 and PA.1d) were added in this report, and the related viruses were isolated after our previous study in 2009  NP.1 corresponded to AIVs circulating worldwide. Four cross-species lineages within NP.1 were designated as secondary lineages NP.1a-NP.1d. NP.1a corresponded to H1 subtype Eurasian avian-like SIVs from the 1980s. NP.1b corresponded to H7N7 subtype EIVs isolated in 1950s-1960s. NP.1c corresponded to H3N8 and H7N7 subtypes of EIVs circulating worldwide. Two tertiary lineages NP.1c1 and NP.1c2 within NP.1b corresponded to H3N8 subtype CIVs. NP.1d corresponded to H3N2 subtype CIVs similar to some AIVs from Korea and China emerged in recent years. NP.3 corresponded to SIVs circulating from the 1930s, three secondary lineages (NP.3a, NP.3b and NP.3c) were designated within NP.3. NP.3a mainly corresponded to H1N1 subtype SIVs circulating in 1930s-1949, NP.3b corresponded to H1N1 subtype SIVs circulating in 1930s and 1954, NP.3c corresponded to H1N1, H1N2 and H3N2 subtype SIVs circulating worldwide after 1957. Two tertiary lineages It is interesting that NP.2f including A/duck/Zhejiang/ LS02/2014(H7N9) and A/swine/Jilin/19/2007(H3N2), which was distinct with other viruses in NP.2. We found these two viruses were with high homology with HuIVs by BLAST in NCBI.
Our previous study in 2009 classified the viral NP gene into ten primary lineages, S5.1-S5.10 [32]. Of them, S5.1, S5.2, S5.5.1, S5.6, S5.7, and S5.9 (AIVs) corresponded to NP.1 in this report, and S5.3 (HuIVs) corresponded to NP.2 in this report, and S5.4 (SIVs) corresponded to NP.3 in this report. S5.5.2 (EIVs) corresponded to the secondary lineage NP.1c in this report, and S5.8 (EIVs) corresponded to NP.1b in this report. S5.9 disappeared in this report because the strain A/northern pintail/California/44291-259/ 2007(H10N3) in S5.9 was absent in GenBank and the other strain within S5.9 (A/duck/LA/17G/ 1987(H3N8)) was now assigned into NP.1 possibly because more nucleotide substitutions of the virus led to synonymous mutations as compared with other strains. The lineage S5.10 covering two viruses, A/ mink/Nova-Scotia/1055488/2007(H3N2) and A/swine/ Quebec/4001/2005(H3N2), disappeared in this report because the nucleic acid sequences of these two viruses have been revised greatly, and now the two viruses were assigned into NP.3c. Actually, the panorama view of this study was largely consistent with our previous study in 2009 after removing the lineages S 5.9 and S5.10. In addition, three canine lineages (NP.1c1, NP.1c2 and NP.1d) were added in this report, and the related viruses were isolated after our previous study in 2009 or neglected by our previous study in 2009. Similarly, some other subordinate lineages, such as NP.3c1 and NP.3c2 were added in this report because the related viruses were isolated after our previous study in 2009. Phylogenetic diversity and distribution of IAVs based on MP sequences Figure 31 (the thumbnail of the phylogenetic tree) and Additional file 29: Fig SI_29 ( Fig SI_29).
Our previous study in 2009 classified the viral MP gene into six primary lineages, S7.1-S7.8 [32]. Of them, S7.1 and S7.2 (AIVs) corresponded to MP.1 in this report, and S7.3 (HuIVs) corresponded to MP.2 in this report, and S7.6 (EIVs) corresponded to MP.3 in this report. S7.4 (SIVs) corresponded to the secondary lineage MP.1a, and S7.5 corresponded to the secondary lineages of MP.1c and MP.1d. In general, the panorama view of this study was largely consistent with our previous study in 2009. Additionally, three canine lineages (MP.1d1, MP.1e and MP.1f ) were added in this report, and the related viruses were isolated after our previous study in 2009 or neglected by our previous study in 2009. Similarly, some other subordinate lineages, such as MP.1b1-MP.1b3 were added in this report because the related viruses were isolated after our previous study in 2009.    Fig SI_30).

Phylogenetic diversity and distribution of IAVs based on NS sequences
Our previous study in 2009 classified the viral NS gene into ten primary lineages, S8.1-S8.10 [32]. Of them, S8.1, S8.2, S8.7, S8.8, and S8.9 (AIVs) corresponded to NS.1 in this report, and S8.6 (AIVs) corresponded to primary lineage NS.2 in this report, and S8.10 (EIVs) corresponded to NS.3 in this report. S8.3 (HuIVs) corresponded to secondary lineage NS.1a in this report; S8.4 (SIVs) corresponded to secondary lineage NS.1b in this study; S8.5 corresponded to secondary lineage NS.1d. In general, the panorama view of this study was largely consistent with our previous study in 2009. Four canine lineages (NS.1b2, NS.1d1, NS.1d2, and NS.1e) were added in this report, and the related viruses were isolated after our previous study in 2009 or neglected by our previous study in 2009. Similarly, some other subordinate lineages, such as NS.1b1 was added in this report because the related viruses were isolated after our previous study in 2009.

Phylogenetic analysis of representative sequences
Phylogenetic relationships of representative sequences were calculated using the same software tool and the same parameters as the above sequences, and bootstraps values were calculated with 1000 replicates. The results suggested that most of the lineages and subordinate lineages designated in this report were supported by the bootstrap values calculated based on the representative sequences ( Fig. 33 and Additional file 31: Figure S2_1, Additional file 32: Figure S2_2, Additional file 33: Figure  S2_3, Additional file 34: Figure S2_4, Additional file 35: Figure S2_5, Additional file 36: Figure S2_6, Additional file 37: Figure S2_7, Additional file 38: Figure S2_8, Additional file 39: Figure S2_9, Additional file 40: Figure S2_ 10, Additional file 41: Figure S2_11, Additional file 42: Figure S2_12, Additional file 43: Figure S2_13, Additional file 44: Figure S2_14, Additional file 45: Figure  S2_15, Additional file 46: Figure S2_16, Additional file 47: Figure S2_17 and Additional file 48: Figure S2_18). For example, as showed in Fig. 33, H1 subtype IAVs could be classified into two primary lineages, H1.1 and H1.2, and each of them were further classified into four secondary lineages, and six of the eight secondary lineages were of the support of bootstrap values > 70.

Discussion
In order to calculate more accurately the diversity and distribution of IAVs, much more sequences were calculated in this study than in previous studies [17,[31][32][33]. This is also useful to demonstrate the panorama diversity and distribution of IAVs in a direct way. For example, more groups, clades or lineages of SIVs in H1 and H3 subtypes were demonstrated directly in this study than our previously studies which were based on selected representative sequences [31].
Beside our two panorama studies published in 2009, Shi et al. also reported a similar panorama analysis of the diversity and distribution of IAVs in 2010 based on the HA and NA genes, although Shi et al. did not analyze the diversity and distribution of IAVs based on the viral internal genes [33]. All these reports should be updated because multiple subtypes and lineages have emerged since 2010, and some sequences have been revised.
The diversity and distribution of IAVs were calculated and updated from a panorama view in this study. The results further confirmed some previous conclusions, such as that none of the HA gene of H3, H13, the NA gene of N2 subtypes of AIVs and the viral NS gene of AIVs could be simply classified into two lineages according to the Eastern and Western Hemispheres. The results also revised some previous conclusions, such as that some lineages designated previously (e.g. S1.7, S1.8, S2.7, S3.8, S5.9, S5.10, h3.3, and h5.3 [31,32]) were deleted in this report because these lineages were designated previously with wrong sequences which have been revised thereafter. Meanwhile, some novel lineages, such as PB2.1b1, PB1.1f and H11.2b were added in this report mainly because these lineages did emerge in recent years.
This report also provided some novel suggestions on the diversity and distribution of SIVs. First, the SIVs in H1.2d should pose a considerable threat to humans in the future even though H1.2d has disappeared in humans for years. Second, SIVs were of more diversity than HuIVs during the same year if they were of the same HA primary lineage, and consequently we could not differentiate cross-species transmissions from accidental similarity through phylogenetic analysis. Third, it was rare for SIVs circulating in the Eastern Hemisphere to circulate in the Western Hemisphere, but not that rare for SIVs circulating in the Western Hemisphere to circulate in the Eastern Hemisphere.
This report also provided some novel suggestions on other respects of IAVs. First, it is not rare that IAVs from some avian lineages were isolated from humans, pigs or other mammalian hosts, but very rare that IAVs from any mammalian lineages were isolated from any birds. Second, isolated regions are favorable for forming some clades of IAVs because multiple clades or lineages of IAVs corresponded to some islands including Taiwan, Japan or Australia. Third, since multiple canine clades were classified through this report, and close contacts between humans and dogs are frequent, it is possible that dogs play a more important role in the ecology and evolution of IAVs than that we have imagined, and more surveillance of CIVs should be conducted to monitor the relevant risk [34,35]. Fourth, it suggested that there are likely some other subtypes of IAVs in bats which have not been identified with the consideration of the diversity of known bat IAVs and the fact that bats usually do not move faraway. Fifth, most AIVs from Oceania belong to the lineages of the Eastern Hemisphere and most AIVs from South America belong to the lineages of the Western Hemisphere, but most AIVs from Oceania have formed specific subordinate lineages within the lineages of the Eastern Hemisphere, and most AIVs from South America have formed specific subordinate lineages within the lineages of  [33,[36][37][38][39][40][41]. Sixth, a few lineages of AIVs (H6.1a, H9.1, H13.1 and N2.1a) are distributed worldwide rather than restricted to the Western or Eastern Hemisphere, and this is consistent with a global pattern of AIV in wild birds proposed by Olsen et al. [42].
The panorama views provided by this report could act as a map for rapid identification of a strain or a group of IAVs is of a special significance. For example, the MP gene of two H3N8 subtype equine strains isolated in Japan in the 1970s ((A/equine/Sachiyama/1/1971(H3N8) and A/equine/Tokyo/ 2/1971(H3N8)) was located in the clade different from that of other H3N8 subtype equine strains isolated in the same decade, while the other genes of the two Japanese strains were located in the same clade with other H3N8 subtype equine strains isolated in the same decade (Additional file 3: Fig SI_3, Additional file 24: Fig S1_24 and Additional files 25-30: Figs S1_25-30), suggesting that the two Japanese strains were from gene reassortment with IAVs from other hosts.
This report identified some unexpected strains of IAVs. For example, A/Iran/187/2016, A/Iran/279/2016, A/Iran/ 1417/2016, and A/Shiraz/106/2015 were similar to the HuIVs isolated in the 1930s in lineage H1.2, and their sequences were indeed of high identity through nucleotide and protein BLAST analysis. This report also identified that seven genes excluding the PB2 gene of the swine strain A/swine/Yantai/16/2012(H9N2) were located in the same clades with the avian strain A/turkey/ Wisconsin/1/1966(H9N2), and their sequences were indeed of high identity through nucleotide and protein BLAST analysis. In NP gene, one human secondary lineage (NP.2f ) harbored one H7N9 subtype AIV (A/duck/ Zhejiang/LS02/2014(H7N9)) and one H3N2 subtype SIV (A/swine/Jilin/19/2007(H3N2)), and their sequences were indeed of high identity through Fig. 33 Phylogenetic tree of the viral HA gene representative sequences of H1 subtype. H1 subtype IAVs were classified into two primary lineages, H1.1 and H1.2, and each of them were further classified into four secondary lineages. Bootstrap values were given at relevant nodes nucleotide and protein BLAST analysis. All of these special sequences were of no references and their origin and reliability need further confirmation.
Three influenza virus sequence databases have been available for this study. Among them, the NCBI FLU Database which we used for the analysis covers more sequences of the viral internal genes than the Influenza Research Database, and the Influenza Research Database covers more sequences of the viral HA and NA genes than the NCBI FLU Database, after collapsing the same protein sequences. The third database GISAID really covers more sequences than the NCBI FLU Database and the Influenza Research Database nearly by 30%, but GISAID could not support online collapsing the same protein sequences which is assumed to be important for random selection of the viral sequences for the analysis. Because we only used a part of randomly selected sequences rather than all the sequences covered by the NCBI FLU Database, to run the analysis, we may have not revealed the whole diversity and distribution of the viruses. We dowloaded 2722 H5 subtype of HA gene sequences and 1697 H7 subtype of HA gene sequences which are only available in GISAID, and phylogenetic analysis of these sequences fully supported our results in this study and no new groups or lineages were found for the addition of these sequences only available in GISAID (Additional files 49-50: Figs S1_31-32). This can be explained by the two facts: one is that the groups or lineages we classified are only the trunks rather than twigs of the relevant phylogenetic trees; the other is that we have used many randomly selected sequences for the analysis.
The analysis of this report was based on protein sequences rather than nucleotide sequences due to the need for simplification of calculation. Otherwise, so many nucleotide sequences should be beyond the capability of computers in laboratory. Also for the same reason the phylogenetic relationships were calculated using the neighbor-joining model rather than the maximum likelihood model. We have tested previously that the phylogenetic relationships calculated using the neighborjoining model were quite similar to those calculated using the maximum likelihood model for IAVs [31]. The neighbor-joining model used in this report is of the advantage that the calculated phylogenetic tree is unique.
Analysis of the viral diversity and distribution based on protein sequences was reliable mainly because frame shift mutations are quite rare in IAVs [1]. Otherwise, the analysis should be questionable as one nucleotide substitution can lead to changes of multiple amino acid residues. We have tested 18,589 randomly selected sequences and no frame shift mutations were found (unpublished data).
The reliability of this paper was supported by the facts that the conclusions of this paper are largely consistent with many studies based on limited sequences [15-17, 21, 24, 33, 43, 44], and largely consistent with our previous panorama studies [31,32]. Reasonable explanations were all given with the respect to the results different from our previous panorama studies (Additional file 51: Table S1). The reliability of this paper was further supported by the phylogenetic trees calculated using randomly selected representative sequences ( Fig. 33 and Additional files 31-48: Figs S2_1-S2_18).
On the other side, many synonymous mutations occurring in IAVs could not be considered in our analysis based on protein sequences. Consequently, the internal genes of AIVs could not be classified into some lineages corresponding to the Western or Eastern Hemisphere, as could using nucleotide sequences in previous studies [32]. Nevertheless, this report did reveal the panorama views of AIVs, HuIVs, SIVs, EIVs, and CIVs successfully in a sketchy way, which are difficult to be revealed through other approaches.
Miscellaneous nomenclature systems have been reported targeting one specific gene, subtype or a small number of IAVs in previous studies [15, 16, 18-21, 43, 45-47], and some of these nomenclature systems are misleading. For example, some so-called "North American lineages" of SIVs or AIVs actually circulated widely in South America and/or Asia as well. Great efforts have been given to unify the nomenclature systems for IAVs, including the ones for designations of some lineages of H5 subtype HPAIVs (e.g. clade 2.3.4.4) [18,19], for designations of HA and NA genes of IAVs (e.g. h9.4.2 or H2g2.3) [31,33], and for designations of internal genes of IAVs (e.g. S8.6.3) [32], to facilitate communications on IAVs. In this report we further simplified the universal nomenclature system by alternately using numbers and letters. This simplified nomenclature system, if widely accepted, should simply communications on the ecology, evolution and epidemiology of IAVs among researchers.
An important issue regarding nomenclature of IAVs is that sometimes we should differentiate the HuIVs from a lineage of HuIVs and the HuIVs from a lineage of AIVs or SIVs caused by accidental cross-species transmission. We proposed here that, if needed, the HuIVs from a lineage of HuIVs be designated as HuIVs or adapted HuIVs, and the HuIVs from a lineage of AIVs or SIVs caused by accidental cross-species transmission be designated as unadapted HuIVs.

Conclusions
In this study, 139,872 protein sequences available in GenBank with clear background were analyzed phylogenetically, and lineages and subordinate lineages were designated for each of the genes based on the updated panorama views using a novel universal nomenclature system. Phylogenetic trees of the two external viral genes (HA and NA) and six internal genes (PB2, PB1, PA, NP, MP, and NS) were constructed, and the diversity and the influenza viruses; HPAIVs: Highly pathogenic avian influenza viruses; HuIVs: Human influenza viruses; IAVs: Type A influenza viruses; SIVs: Swine influenza viruses (SIVs)