Identifying novel biomarkers of the pediatric influenza infection by weighted co-expression network analysis

Background Despite the high yearly prevalence of Influenza, the pathogenesis mechanism and involved genes have not been fully known. Finding the patterns and mapping the complex interactions between different genes help us to find the possible biomarkers and treatment targets. Methods Herein, weighted gene co-expression network analysis (WGCNA) was employed to construct a co-expression network among genes identified by microarray analysis of the pediatric influenza-infected samples. Results Three of the 38 modules were found as the most related modules to influenza infection. At a functional level, we found that the genes in these modules regulate the immune responses, protein targeting, and defense to virus. Moreover, the analysis of differentially expressed genes disclosed 719 DEGs between the normal and infected subjects. The comprehensive investigation of genes in the module involved in immune system and viral defense (yellow module) revealed that SP110, HERC5, SAMD9L, RTP4, C19orf66, HELZ2, EPSTI1, and PHF11 which were also identified as DEGs (except C19orf66) have the potential to be as the biomarkers and also drug targeting for the treatment of pediatric influenza. Conclusions The WGCN analysis revealed co-expressed genes which were involved in the innate immune system and defense to virus. The differentially expressed genes in the identified modules can be considered for designing drug targets. Moreover, modules can help to find pathogenesis routes in the future.


Background
Influenza virus is one of the most incident infectious agent, which is mainly classified into three types of A, B, and C. The prevalence amount of type A is more than other influenza types in the world. Also, the burden of seasonal influenza virus caused the infection of 3-5 million cases with severe illness symptoms [1]. Influenza viruses affect the human life more than other respiratory illnesses. The pathogenesis of Influenza has not been yet well understood since it depends on the immune system and viral determinants. Moreover, the previous infection or vaccination causes the cellular immunity which affected the efficacy of infection with various seasonal, zoonotic, and pandemic influenza viruses [2]. Therefore, different hosts can have distinct effects on the incidence level of the disease.
Microarray is a high-throughput technique has the ability of simultaneous measuring of thousands of gene expressions and so generating tremendous data. In order to general and then detailed evaluation of the biological phenomena in each study, the special and sometimes complicated statistical analysis is required [3,4].
Many genes are involved in the pathogenesis routes of influenza infection, which constitute the complicated networks. Among various genes, the change in expression levels of some genes regulate the expression of others, so they can put in a group called module. These modules have different biological responsibility. Also, each group may activate various pathways and control specific functions [5]. Finding the modules can help researchers to design the proper treatment route by targeting key proteins with specific functions. Moreover, finding novel biomarkers including important functions in a pathogenic-related disease is of importance for prognosis, risk assessment, and progression monitoring of disease [6,7]. Biomarkers can be detected by calculation of differentially expressed genes between the normal and infected subjects, however, discovery the proteins that each biomarker is in connection with them, help to find the key pathways functioning in a disease.
Exploration of the involved genes which have close correlation patterns could be achieved by weighted gene co-expression network analysis (WGCNA) [8]. Through this algorithm, the highly co-expressed genes are placed in one module and different modules may contain key genes implicated in the pathogenesis process [9].
In this study, we aimed to find modules containing highly co-expressed genes which involve in the pediatric influenza pathogenesis. Also, the key genes with the prognosis potential were identified. Moreover, the genes in each module were analyzed using Gene Ontology (GO) function and pathway enrichment methods.

Gene expression data and preprocessing
The microarray gene expression dataset GSE29366 was downloaded from the NCBI Gene Expression Omnibus (GEO). It contains 12 healthy subjects and 19 Pediatric influenza patients which their whole blood samples were analyzed. All of the involved children in this study were in the age range from 1.5 months to 17 months. All individuals developed primary infection and showed signs of the disease. The Illumina HumanWG-6 v3.0 expression beadchip (GPL6884) was used to produce the sequencing data. The goodSamplesGenes function in Weighted Gene Co-expression Network Analysis (WGCNA) was used to filter the samples and genes with too many missing entries and zero-variance genes [10].

Weighted gene co-expression network
The weighted gene co-expression network based on Pediatric influenza samples was constructed by R package "WGCNA" [11]. To this purpose, the adjacency matrix containing Pearson's correlations between each gene pair was firstly generated with the optimized soft power and then was transformed into a Topological Overlap Matrix (TOM). The highly co-expressed genes were then grouped by hierarchical clustering. In the next step, the dynamic tree cut algorithm was utilized to cut clustering dendrogram branches and generation of the modules. The gene expression profiles of each module were summarized by the first principal component named as module eigengene (ME). Moreover, each module denotes as discrete colors. Finally, the similar modules (module eigengenes) were merged into a single module and was then used to further interpretation [12].

Finding differentially expressed genes
To detect the differentially expressed genes (DEGs) among normal and infected subjects, GEO2R tool in the GEO database was employed. Benjamini-Hochberg FDRadjusted p-values < 0.05 was selected as a criterion for finding DEGs.

Finding candidate genes and their functional annotations
The top highly interconnected genes in each module were identified as hub genes. Afterward, they were submitted in the Search Tool for the Retrieval of Interacting Genes (STRING) database to build the protein-protein interaction network (PPIN) [13]. Then, the nodes with zero betweenness and degree one were set aside from the constituted network. The PPIN was visualized using Gephi 0.9.2 [14]. The gene ontology and pathway enrichment analysis were carried out in g:profiler website (https://biit.cs.ut.ee/gprofiler/) [15].

Data preprocessing and sample selection
Gene expression dataset GSE29366 was firstly quantile normalized and log-transformed. Then, the probe sets with unknown gene name were removed. The samples were evaluated in terms of missing entries and zerovariance genes. The number of 15,436 probes remained from 28,742 probes. As shown in the cluster tree ( Fig. 1), two samples (GSM725944 and GSM725948) are clustered in different branches and excluded for further analysis.

Weighted gene co-expression network construction and modules identification
To exploration the required criterion for WGCNA, the scale-free topology fit index was calculated for various softthresholding power. As Fig. 2 shows, it reaches 0.84 for a power of 7 while a relatively high mean connectivity remained. This value was then applied to measure the adjacency and topological overlap matrixes. To specify the groups of co-regulated genes identified as modules, the dynamic cut-tree algorithm was utilized (Fig. 3). After merging the similar modules by applying a threshold of 0.25, 38 modules were identified to further analysis (Fig. 4a). Each row and column of the heatmap plot is in accordance with one module, in which red denotes positive correlation and blue reveals a negative correlation. Figure 4b shows the dendrogram and dynamic cut tree before and after merging modules. To determine the biologically meaningful modules, all modules were submitted into the STRING and ones which their proteins were highly connected, were selected. Figure 5 shows the protein-protein interaction  networks (PPINs) of each module, in which mediumpur-ple2 has 18 nodes and 46 edges, skyblue has 41 nodes and 107 edges, and yellow has 60 nodes and 689 edges. The size and color of each node illustrate the degree value in which red and blue colors indicate the higher and lower degree, respectively. The chosen modules were under scrutiny in terms of gene ontology and pathway enrichment. The most related modules to the infection with influenza were identified as mediumpurple2, skyblue, and yellow. Table 1 demonstrates the genes in each module.

Gene ontology enrichment
To explore the biological relationship of the genes in each module, the gene ontology (GO) enrichment analysis was performed. The genes of mediumpurple2 module were enriched in different immune systems, neutrophil and leukocyte degranulation, antimicrobial humoral response, and defense response. The most related terms to the viral infection and immune system were specified from genes of yellow module which contains defense response to virus, innate immune response, defense response to other organism, response to type I interferon, response to cytokine, regulation of viral life cycle, immune response, viral process, negative regulation of viral life cycle, regulation of viral process, negative regulation of viral process, regulation of innate immune response, cellular response to virus, regulation of defense response to virus, negative regulation of viral release from host cell, and positive regulation of immune system process. As mentioned above, the genes in the mediumpurple2 and yellow modules are activated due to viral infection and the immune system provides defense against pathogenic agent. In the skyblue module, various protein targeting and localization such as co-translational protein targeting to membrane, protein targeting to ER, protein localization to endoplasmic reticulum were highlighted. Viruses usually use the complex membrane network existing in the host cell such as endoplasmic reticulum (ER) membrane to entry, replication, and assembly. Therefore, the advent of this module was expected.

Pathway enrichment analysis
To evaluate the pathway enrichment analysis, the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome Pathway Database were explored. The immune pathways, Neutrophil degranulation, Antimicrobial peptides, and Defensins were enriched by genes in the mediumpur-ple4. This module contains genes which involved in immune response system and have antimicrobial activity. The genes pertaining the skyblue module were enriched in Axon guidance, signaling by NOTCH2, Infectious disease, Influenza Infection, Influenza Life Cycle, Influenza Viral RNA Transcription and Replication, Viral mRNA Translation, and rRNA processing. This module contains genes which their roles have been specified in the influenza infection pathways. Notch signaling involves several functions of cellular differentiation, cell fate, and cell survival. Viruses use several mechanisms to escape innate immune antiviral responses and cause cell survival. The downregulation of CREB1, RPS27A, and UBA52 in this pathway show the immune system domination.
Eventually, the genes grouped in yellow module were enriched in the viral and immune pathways including Influenza A, RIG-I-like receptor signaling pathway, NOD-like receptor signaling pathway, Herpes simplex infection, Cytosolic sensors of pathogen-associated DNA, Immune System, Cytokine Signaling in Immune system, Interferon Signaling, and Antiviral mechanism by IFN-stimulated genes. The RIG-I-like receptors activate innate immunity and inflammation after discernment viral RNA ligands [16]. RIG-I is required for type I IFN production in response to influenza. One of the cellular defense mechanism is performed by receptors that check uncommon RNA and DNA resulted from viral infection in the cytosol. It causes the limitation in the virus replication and activation of antiviral immunity process [17].

Discussion
In the present study, three PPI network were identified based on the WGCNA which are significantly related to the pediatric influenza infection. Functional analysis revealed that the genes in the mediumpurple2 module implicate in the immune responses through different routes, skyblue module in the protein targeting, and the yellow module in the defense and immune responses to virus. Moreover, the analysis of differentially expressed genes revealed 719 DEGs between the normal and infected subjects. Further analysis showed that most members of the yellow and some genes in the skyblue modules were among identified DEGs, so these modules were more discussed.
Children who did not experience previous exposures to influenza viruses are more vulnerable against infection and shed larger number of virus particles for a long time [18]. Moreover, the function of immune system may be different from adult-infected subjects. The initial defense and adaptive immune responses against viral infection are performed by type I IFNs (IFN-α/β) such as RTP4, GBP1, OAS1, IFI27, and IF44L [19]. They induce the IFN-stimulated genes (ISGs) through activation of the Janus kinase-signal transducer and activator of transcription pathway [20]. The influenza A viruses hinder type I IFN signaling via induction of the suppressor of cytokine signaling-3 (SOCS-3) protein [21]. As a result, the expression level of SP110, OAS1, and IRF-1, which are IFN-induced genes, increases.
HERC5 is another IFN-induced gene that mediates ISGylation of protein targets. It is induced by Influenza infection and elevates ISGylation as an ISG15 E3 ligase [20,24,25]. It also functions as a critical antiviral protein against Influenza A virus by restricting IAV-NS1.
Samd9l is a Sam domain containing protein which has important roles during virus infection and innate immunity [26]. The expression of Samd9l is increased by type I interferon and its role in the control of influenza virus infection and also pathogenesis has been proposed [27].
In particular, we also identified the C19orf66 gene which previously reported as an IFN-induced transcript that suppresses dengue virus [28]. C19orf66 named as RyDEN can constitute a complex with the indispensable cellular mRNA binding proteins, PABPC1 and LARP1, for the efficient replication of DENV. Moreover, it was proposed that C19orf66 is a new antiviral effector which has a prominent role in the inhibition of virus replication. The increase in the expression of C19orf66 was accompanied with up-regulation of IFIT genes [29,30] which can inhibit the translation initiation, bind and sequester uncapped viral RNA, and viral protein in the cytoplasm.
Another substantial gene found in this study is HELZ2 (Helicase With Zinc Finger 2) which is an IFN stimulated gene and immediate early gene (IEG). It was reported that the HELZ2 knockout in mice causes the enhancement of the dengue virus infectivity. Also, the mediatory of HELZ2 in the IFN antiviral response was disclosed, so that the up-regulation of the HELZ2 transcriptional and protein in the nucleus, and activation of a transcriptional program were involved. Moreover, IEGs were introduced as the biomarkers due to their influence on the host response to viral infections [31].
EPSTI1 is an IL-28A-induced ISGs which was identified as an anti-HCV. Also, the knockdown of EPSTI1 caused the enhancement of virus. It was stated that EPSTI1 may actuate PKR promoter and induce IFN-β, IFIT1, OAS1, and RNase L, which are PKR-dependent genes and responsible for the EPSTI1-mediated antiviral activity. Therefore, EPSTI1 was presented as a proper therapeutic target to treat HCV infection [32]. In this study, the expression level of EPSTI1 has increased in the influenza-infected subjects versus normal cases. The up-regulation of the IFIT proteins especially IFIT1 and also OAS1 may induce the overexpression of EPSTI1, so it can be considered as the biomarker or a target for designing a drug.
PHF11 is a transcriptional co-activator of IL2 and IFNG which its knock-down causes up-regulation of the pro-inflammatory chemokine IL-8, therefore the contribution of PHF11 in epidermal recovery was proposed [33]. The overexpression of PHF11 due to the infection with Japanese encephalitis virus [34], avian influenza A virus [35], and Epstein-Barr Virus [36] were reported previously. Similarly, the expression of PHF11 was increased after influenza infection which can confirm its role in the diminution of pro-inflammatory chemokine and increase the IFNG due to the infection.
IRF7 which is overexpressed due to the influenza infection regulates the antiviral response. The viral infection causes the phosphorylation and translocation of IRF7 to the nucleus and as a result the induction of expression of type-I interferons which in turn activates IRF7 transcription through STAT2 [37,38].
EIF2AK2 is the interferon-induced dsRNA-dependent protein kinase which has a prominent role in the innate immune response to viral infection, apoptosis, cell proliferation, and differentiation. MX1 as a member MxA protein belongs to the GTPase family and suppresses the influenza virus replication via targeting the viral nucleoprotein [39]. OAS proteins family are IFN-stimulated proteins which have prominent roles in the innate immune responses. They also catalyze the synthesis of 2 ′ -5 ′ -linked oligoadenylates, which in turn cause the activation of RNAse L and degradation of viral and cellular RNAs [40]. ZBP1 has been recently identified as an innate immune sensor of influenza virus. It regulates NLRP3 inflammasome activation and induces the apoptosis, necroptosis, and pyroptosis in the influenza-infected cells [41].
The genes involved in the mediumpurple4 were enriched in the immune pathways and neutrophil degranulation. Neutrophils are granulocytes that comprise innate phagocytic cells packed with granules containing proteins with antibacterial function [42,43]. In this study, we identified proteins belonging to three types of neutrophil granules: primary (azurophilic: ELANE, MPO, PRTN3, BPI), secondary (specific: LTF, DEFA4, DEFA1, DEFA1B), and tertiary (gelatinase: PGLYRP1) granules. These genes were upregulated in the influenza-infected subjects. The neutrophil granule proteins have antibacterial function and have key roles in the innate immune defense against bacteria. The upregulation of neutrophil granule proteins has been reported in viral infections and RSV-stimulated neutrophils [44][45][46].
In the skyblue module, UBA52 and some genes belonging to RP family were identified in consistent with the required interaction between UBA52 and RP family for virus replication. The previous study revealed that the knockdown of UBA52 in the chicken cells causes the diminution of the progeny viral titer denoting the substantial function of UBA52 in the H5N1 influenza A virus infection [47]. However, UBA52 and most of the RP genes were down-regulated in this study after the influenza infection with respect to the normal cases. It can be due to the overexpression of genes involved in the  defense response to virus and innate immune response  including EIF2AK2, C19ORF66, PML, TRIM22, IFI44L,  IFIT5, IFIT1, OAS2, PLSCR1, ADAR, RTP4, IFIT2, IFI16,  HERC5, STAT2, OAS3, RSAD2, OAS1, MX1, IRF7,  DDX60, DDX58, IFIH1, DHX58, PARP9, TRIM25, OASL,  TRIM5, ISG15, MX2, and IFIT3. The innate immune system responses to the entry Influenza virus by members of Toll-like receptors, and RIG-I and the NOD-like receptor family. Finally, the inter-individual variation in expression of the identified differentially expressed genes was low. It is found that inter-individual variation in gene expression profiles is related to sex, age, and the time of sample collection [48,49]. Moreover, differences in genotype, epigenetic or environmental factors can be cause of the intrinsic differences in expression patterns.

Conclusions
Overall, the systems virology approach based on the application of WGCNA helped us to find three modules of co-expressed genes related to pediatric influenza. Moreover, SP110, HERC5, SAMD9L, RTP4, C19orf66, HELZ2, EPSTI1, and PHF11 were remarkably co-expressed with other known genes which were involved in innate immune system and defense to virus. These genes and subsequently related proteins are proposed as the candidate biomarkers and also drug targeting. However, further studies are required to evaluate and confirm them.