Identification of stage-related and severity-related biomarkers and exploration of immune landscape for Dengue by comprehensive analyses

At present, there are still no specific therapeutic drugs and appropriate vaccines for Dengue. Therefore, it is important to explore distinct clinical diagnostic indicators. In this study, we combined differentially expressed genes (DEGs) analysis, weighted co-expression network analysis (WGCNA) and Receiver Operator Characteristic Curve (ROC) to screen a stable and robust biomarker with diagnosis value for Dengue patients. CIBERSORT was used to evaluate immune landscape of Dengue patients. Gene Ontology (GO) enrichment, Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis and Gene set enrichment analysis (GSEA) were applied to explore potential functions of hub genes. CD38 and Plasma cells have excellent Area Under the Curve (AUC) in distinguishing clinical stages for Dengue patients, and activated memory CD4+ T cells and Monocytes have good AUC for this function. ZNF595 has acceptable AUC in discriminating dengue hemorrhagic fever (DHF) from dengue fever (DF) in whole acute stages. Analyzing any serotype, we can obtain consistent results. Negative inhibition of viral replication based on GO, KEGG and GSEA analysis results, up-regulated autophagy genes and the impairing immune system are potential reasons resulting in DHF. CD38, Plasma cells, activated memory CD4+ T cells and Monocytes can be used to distinguish clinical stages for dengue patients, and ZNF595 can be used to discriminate DHF from DF, regardless of serotypes.


Introduction
Dengue was listed by the World Health Organization (WHO) as one of the top ten global health threats announced at the beginning of 2019 [1]. In the past few decades, Dengue has become the fastest-growing mosquito borne disease in the world [2][3][4], seriously endangering human health [5,6]. Vaccine research and development continue to make progress [7][8][9][10][11][12], but the antibody-dependent enhancement (ADE) limits the effectiveness of vaccines [13][14][15][16][17]. Asymptomatic infections increase incidence of dengue [16,18] and effective treatments have not been identified. Therefore, it's urgent to explore the pathogenic mechanism of dengue fever and screen out molecular markers for better diagnosis and treatments.
Autophagy, a catabolic process that degrades damaged or abnormal intracellular components to recover nutrients, is essential for maintaining cell and body homeostasis [19,20], and benefits to proliferation and infection of the Dengue virus (DENV) [21][22][23][24]. In DENV-ADE infection, cross-reactive antibodies mediate infection by inducing autophagy related proteins, and then suppress the innate immunity mediated by the mitochondria antiviral protein (MAVS) [25]. Immune response affects directly or indirectly host response to DENV in varying degrees, including symptomatic infection, asymptomatic infection [26,27], dengue shock syndrome (DSS) and dengue hemorrhagic fever (DHF) [28][29][30]. Therefore, it is essential to explore autophagy and immune response during DENV infection.
Transcriptomics researches are beneficial to assist researchers in better understanding disease causes [31] and locating biomarkers [32][33][34]. Our precious transcriptomics researches contributed to understand viral evolution and its impact on pathogenicity and vaccine development of DENV [35][36][37][38]. However, studies [39][40][41] published focused on multi-gene researches, and single analytical method (differentially expressed genes (DEGs) analyses), and did not link genomics with immune landscapes. In this study, we used a combination of DEG analyses, weighted co-expression network analysis (WGCNA) and Receiver Operator Characteristic Curve (ROC) to identify, validate and test biomarkers with diagnostic value of stages and severity in independent datasets, and applied the CIBERSORT website to analyze immune landscape differences among three stages and between DHF and Dengue Fever (DF) and explore correlations between genes and immune cells.

Data source
Gene expression datasets of Dengue patients were selected from a public database named the Gene Expression Omnibus databases (GEO, http:// www. ncbi. nlm. nih. gov/ geo). We followed these reference points: 1. datasets analyzed whole blood samples of different stages from Dengue patients including DF and DHF, and normal samples; 2. Datasets should include at least 20 dengue patients. Based on these criteria, we obtained GSE43777 dataset analyzing Two chip types (affymetrix HG-U133 plus 2 in GLP570 platform and HG-Focus in GLP201 platform) [41], GSE28405 dataset [42] and GSE51808 dataset [43].
In GSE43777, more than 200 samples (one sample for each stage) were collected from 51 DF and 13 DHF subjects in Maracay, Venezuela, and demographics and clinical, immunological and hematological characteristics of participants were preciously summarized [41]. In GSE28405, 31 clinically undifferentiated DENV RT-PCR positive patient samples were selected in Singapore within 72 h (Early Acute stage, EA), 4-7 days (Late Acute stage, LA) and 3-4 weeks (Convalescence stage, C) after self-reported fever onset, and other clinical information had been recorded as preciously described [42].
In GSE51808, whole blood samples were obtained from 47 Dengue patients (DF n = 31, DHF = 16) hospitalized at the Siriraj Hospital in Bangkok within days 2 and at 4 weeks or later after discharge (the Convalescence stage) and 9 normal subjects were analyzed， and detail clinical information were also showed [44].
Regarding gene expression levels in C or EA stages as baselines, these patients were divided into 3 groups, C vs EA, C vs LA and EA vs LA, as precious studies [41]. The Principal Component Analysis (PCA) analysis [45] was used to explore whether different stages can be distinguished clearly.

DEGs analysis and functional annotation of DEGs
A website (http:// sange rbox. com/ Tool) based on a R package "Limma" was applied to analyze DEGs among three phases (we showed the analysis regardless of serotypes and also performed separate analysis for each serotype) and between DHF and DF for Dengue patients. A R package "clusterProfiler" in R version 4.1.0 and a gene set enrichment analysis (GSEA) software (4.1.0) [46] were used to performed Gene Ontology (GO) enrichment analysis, Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis and GSEA to explore potential biofunction and enrichment pathways for DEGs.

WGCNA analysis [47]
We obtained a total of 4843 genes in the C vs EA group, 4472 genes in the C vs the LA group and 4463 genes in the EA vs LA group by a "WGCNA" package. We converted the adjacency matrix into the topological overlap matrix (TOM) when the power of β were equal to 5, 4 and 12. According to a height cutoff of 0.25, we merged similar modules and the module displaying the highest conformity with the development disease of Dengue was used to intersect with DEGs to identify more stable DEGs.

Evaluation immune cells of the whole blood samples
we estimated fractions of immune cells in whole blood samples by the "CIBERSORT" website (https:// ciber sortx. stanf ord. edu/ index. php) which can be used to input gene expression data and then obtain a fraction estimate of 22 immune cell types in whole blood samples [48].

ROC analysis
ROC [49] was applied to identify, valid and test value of biomarkers and immune cells in distinguish stages and severity based on Area Under The Curve (AUC).

Statistical analyses
R software (version 4.1.0) and R. Studio (version 1.9.0) were used to analyze data and visualize results. We applied the t-test, Mann-Whitney U test and the chisquare test to identify differences between two groups. Correlation analyses were showed according to the Pearson and Spearman theories. P values < 0.05 was regarded as statistical significance. Figure 1 shows illustration for this study.

Acquisition of DEGs and mechanisms analyses for dengue
The Principal Component Analysis (PCA) shows three stage samples can be distinguished (Additional file 1: Fig. 1 Flow chart diagram. RNA expression levels in peripheral blood mononuclear cell (PBMC) of dengue fever patients will change during Dengue virus (DENV) infection. We downloaded these RNA-sequence data and clinical information from National Center of Biotechnology Information (NCBI) dataset (https:// www. ncbi. nlm. nih. gov/ geo/). Combining R package "Limma" with "WGCNA", stable differentially expressed genes (DEGs) among three phases and between DHF and DF for Dengue patients were screened and we also explored potential functions for these DEGs using Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Set Enrichment Analysis (GSEA) analysis. Based on gene expression data, fractions of immune cells in whole blood samples were estimated by the "CIBERSORT" website and correlations between immune cells and huh genes were also qualified. Finally, we used Area Under the Curve (AUC) to evaluate diagnosis value of huh genes and important immune cells in distinguishing clinical stages and severity Figure S1 A-C). After normalizing the expression data from GSE43777 dataset on GLP201 platform, we identify 147 DEGs (121 up-regulated genes and 26 down-regulated genes) between the C and the EA stages ( Fig. 2A, F and Additional file 7: Table S1), 124 DEGs (102 upregulated genes and 22 down-regulated genes) between the C and the LA stages (Fig. 2B, G and Additional file 8: Table S2) and 195 DEGs (74 up-regulated genes and 121 down-regulated genes) between the EA and LA stages (Fig. 2C, H and Additional file 9: Table S3) by a "Limma" package based on |Log2FC|≥ 1, and adjusted P value < 0.05. Likewise, in the EA stage we obtained 30 DEGs (16 up-regulated genes and 14 down-regulated genes) (Fig. 2D, I and Additional file 10: Table S4) between DF (n = 15) and DHF (n = 11) samples from GSE43777 dataset on GLP570 platform, and 58 DEGs (48 up-regulated genes and 10 down-regulated genes) (Fig. 2E, J and Additional file 11: Table S5) between DF (n = 25) and DHF (n = 26) samples in the LA stage, based on |Log2FC|≥ 1 and P value < 0.05.
GO enrichment analysis results show that 147 DEGs (between the C and the EA stages) are mainly enriched in defense response to viruses and type I interferon (Fig. 2K); DNA replication is a common bio-activity for 124 DEGs (between the C and the LA stages) (Fig. 2L); the Fig. 2M shows that neutrophil activation and defense response to virus are dominating enrichment biofunctions for 195 DEGs between the EA and LA stages; biofunction involved in negative regulation of viral genome replication are more active for 30 DEGs between DF and DHF samples in Fig. 2N; neutrophil and humoral immune response are activated for DHF samples in the LA stage (Fig. 2O).
GO enrichment analysis results (down-regulated genes in DHF samples enriched in negative regulation of viral replication) in Fig. 2I and N caught our attention, because it suggested it was difficult to control viral replication in DHF patients in the EA stage. Relative studies [21,24,50] demonstrated DENV inhibited cell apoptosis through autophagy, thereby promoting its replication. Therefore, we suspecte related-autophagy genes may be activated in EA stage and LA stage. As the Additional file 1: Figure S1 J shown, in the LA stage, we identify definitely 1 different expression autophagy-related gene (CCL2) by intersecting DEGs between DHF and DH samples from GSE43777 analyzed on GLP570 platform with 222 autophagyrelated genes downloaded from Human Autophagy Database (HADb) which provides a complete and an upto-date list of human genes and proteins involved directly or indirectly in autophagy. The boxplot diagram (Additional file 1: Figure S1 K) shows that the expression levels of CCL2 increases significantly in DHF samples in the LA stage, suggesting that DENV replicates more strongly in DHF than DF which may be one reason resulting in DHF in the LA stage. While we do not identify different expression of autophagy-related genes between DF and DHF in the EA stage.
KEGG analysis results reveal that pathogen infection and NOD − like receptor signaling pathway are obvious enrichment pathways for 147 DEGs (between the C and the EA stages) (Fig. 2P); Cell replication, Legionellosis and Malaria, are major enrichment pathways for 124 DEGs (between the C and the LA stages) (Fig. 2Q); 195 DEGs (between the EA and LA stages) are enriched in pathogen infection and Cell replication (Fig. 2R); NOD − like receptor signaling pathway and pathogen infection are more active for 30 DEGs between DF (n = 15) and DHF (n = 11) samples in Fig. 2S; inflammatory pathways are activated for DHF samples in LA stage in Fig. 2T.

Acquisition of key modules by WGCNA
Using the GSE43777 dataset analyzed on GLP201 platform, we performed a "WGCNA" package to obtain key modules associated with procession features of Dengue. We select 5, 4 and 12 ( Fig. 3A-C) as the soft-threshold power respectively and 0.25 (Additional file 1: Figure S1 D-F) as the cutoff height in the C vs EA group, in the C vs LA group and in the EA vs LA group. Modules consist of genes with similar expression properties (Additional file 1: Figure S1 G-I). In the cluster tree, we use each branch to represent a gene, and one color to show a coexpression module (Fig. 3D-F).
The association of module eigengene (ME) values and clinical stages is applied to assess the module-stage feature correlation. WGCNA results show 14, 12 and 8 modules, respectively ( Fig. 3G-I). Compared the C with the EA group, the turquoise module is the most matched strongly with the EA group (Pearson co-efficient = − 0.89, P = 1e−33) and the C group (Pearson co-efficient = 0.89, P = 1e−33) in Fig. 3G. Compared the C with the LA group, the most significant correlation can be observed between the LA group (Pearson co-efficient = 0.71, P = 5e−20) and the blue module, and between the C group (Pearson co-efficient = − 0.71, P = 5e−20) and the blue module in Fig. 3H. While comparing the LA group with the EA group, the black module and the green module show the same highest correlation (Pearson coefficient = 0.72) in Fig. 3I. We regard these modules mentioned above as key modules.

Acquisition of key genes
To filter reliable and strong hub genes (shared genes) accurately, Venn diagrams were used to visualize hub genes between DEGs and key modules by a website (http:// www. ehbio. com/ test/ venn/#/) [51]. The results illustrate 143, 99 and 187 hub genes by interacting the turquoise module with 147 DEGs (between the C and the EA stages) (Fig. 4A), interacting the blue module with 124 DEGs (between the C and the LA stages) (Fig. 4B), interacting the black and green modules with 195 DEGs (between the EA and LA stages) (Fig. 4C), respectively. CD38 and CDKN1C are shared by the three phases (Fig. 4D). When considering serotypes, CD38 and CDKN1C still express differentially in three comparison groups (C vs EA, C vs LA, EA vs LA) for each serotype and log|FC|> 1 or log|FC| is extremely closed to 1 (Additional file 12: Table S6). The Fig. 4G shows the expression levels of CD38 firstly increase and then decrease from day 0 to the C stage (***P < 0.001). While the expression of CDKN1C is firstly down-regulated and subsequently up-regulated from the day 0 to the C stage (***P < 0.001) (Fig. 4G). The GSEA shows CD38 is enriched in cellar proliferation and metabolic pathways (Fig. 4E) and CDKN1C is enriched in lipid metabolism and inflammation pathways (Fig. 4F). Correlation analyses show in Fig. 4H expression levels of CD38 are negatively correlated with CDKN1C (spearman correlation = − 0.5). We select CD38 and CDKN1C for follow-up analyses.
In the training group, CD38 has 0.960, 0.976, and 0.829 of AUC in three comparing groups (C vs EA, C vs LA, EA vs LA) respectively and CDKN1C has 0.852, 0.868 and 0.955 of AUC in three comparing groups (C vs EA, C vs LA, EA vs LA) respectively (Fig. 5D-F). In the verifying group, the staging diagnostic value of CD38 and CDKN1C in staging for Dengue is as follows ( show that CD38 can distinguish admirably different phases for Dengue patients and has a higher distinguishable value than CDKN1C. Although we consider serotypes (serotype I-IV), results are similar among three comparison groups (C vs EA, C vs LA, EA vs LA) (Additional file 2: Figure S2, Additional file 3: Figure S3).
In addition, as Fig. 5C shown, CD38 distinguishes perfectly Dengue samples from normal samples with an AUC of 100% in GSE51808 dataset (9 normal samples and 28 Dengue samples). Therefore, CD38 shows the high value in distinguishing stages for Dengue patients. We selected CD38 as a biomarker for staging diagnosis of Dengue.
There is no significant difference in expression levels of CD38 and CDKN1C between DH and DHF samples (Additional file 1: Figure S1

Immune landscape related to characteristics of dengue patients
GO enrichment analyses for 58 DEGs between DHF samples and DF samples show neutrophil and humoral immune response are activated and KEGG pathway enrichment analyses show rich inflammatory pathways. Therefore, we analyze fraction changes for 22 types of immune cells during DENV infection by CIBERSORT. The GSE43777 dataset analyze on the GLP201 platform is used to explore fractions of 22 types of immune cells in whole blood samples from Dengue patients in three phases and the GSE43777 dataset analyzed on the GLP570 platform is applied to explore differently infiltrating-immune characteristic between DF and DHF samples.
As Fig. 7A-C and 7F shown, in the EA phase, fractions of activated dendritic cells, Neutrophils, Monocytes, and M1 Macrophages (P < 0.05) increase significantly compared to the LA and C phases. Compared the LA phase with the EA phase and the C phase, fractions of Plasma cells and activated memory CD4+ T cells (P < 0.05) increase clearly (Fig. 7A-C and F). While Compared with the EA phase and the LA phase, the increments in fractions of memory B cells, resting memory CD4+ T cells, resting dendritic cells, Eosinophils, and naïve B cells (P < 0.05) are more obvious in the C phase (Fig. 7A-C and F). Analytical results above mean that immune cells with antigen presentation, phagocytosis and chemotaxis, immune cells involved in humoral immunity and memory cells are more obviously active in the EA, LA and C phase, respectively. When analyzing in a single serotype, we can still get above results (Additional file 4: Figure S4).
In the LA stage, fractions of activated NK cells (P < 0.05) increase in DHF samples ( Fig. 7E and H), but fractions of CD8+ T cells, gamma delta T cells, resting NK cells, M2 Macrophages and resting mast cells decrease (Fig. 7E and H) (P < 0.05), which imply immune response are damaged in DHF samples. In the EA stage, the immune response of DF patients is similar to that of DHF patients (Fig. 7D and G). The immune response of DHF patients in the LA stage is significantly impaired which explain rapid deterioration of DHF patients in the LA stage.

Correlations between gene and immune cells
We analyze correlations between CD38 and immune cells in GSE43777 dataset (Fig. 7I). The infiltration level of Plasma cells, activated memory CD4+ T cells, gamma delta T cells and CD8+ T cells are positively related to CD38 (Fig. 7I); the infiltration level of naïve B cells, Monocytes, resting dendritic cells, resting memory CD4+ T cells, naïve CD4+ T cells, memory B cells, Eosinophils and activated mast cells are negatively related to CD38 (Fig. 7I). Therefore, the analytical results above show there is strong co-expression relationship between CD38 and plasma cells and between CD38 and activated memory CD4+ T cells (Pearson's correlation > 0.5). In individual serotype, we can still draw this conclusion (Additional file 5: Figure S5). Correlation between immune cells and ZNF595 is not significant (Fig. 7J).

Analyzing the staging characteristic of immune cells
Because of high co-expression correlation between CD38 and plasma cells, we speculated that infiltratingimmune plasma cells could also show distinct differences in three stages. To explore this characteristic, we regarded GSE43777 analyzed by GPL201 as a training set and GSE43777 analyzed by GPL570 as a test set. As the  When considering different serotypes, above results were still obtained (Additional file 6: Figure S6). Therefore, we can discriminate three stages based on the fraction of Plasma cells, activated memory CD4 + T cells and Monocytes in Dengue patients.

Discussion
This study combined DEGs analysis, WGCNA and ROC to identify, valid and test potential biomarkers associated with the staging and severity of Dengue, and used GO enrichment analysis, KEGG analysis and GSEA analysis to explore potential reasons resulting in DHF. The CIB-ERSORT website was also applied to explore immune differences during Dengue infection. Our research was the first to show that CD38 and ZNF595 had clinical significance in the stage and severity of Dengue and that they could be used as biomarkers to distinguish clinical stages and severity for Dengue patients. It was worth noting that Plasma cells, activated memory CD4+ T cells, and Monocytes also showed distinguishing value.
CD38 which was identified, verified and tested in independent datasets could distinguish three clinical stages of Dengue fever and was significantly associated with plasma cells, but it could not use to predict severity which was similar to precious results [52] and distinguish different serotypes. Our results showed expression levels of CD38 and fractions of plasma cells were similar in four serotypes. We identified, verified and tested ZNF595 in independent datasets that could predict DHF. Mechanistically, EA-DEGs which inhibited viral replication were down-regulated in DHF, and a related-autophagy gene (CCL2) expressed differently and increased significantly in DHF, all of which suggested DENV regulated autophagy to lead to DHF.
In addition, we analyzed systematically immune differences among three stages and between DHF and DF in the acute stage, and correlation between immune cells and genes. Immune analysis results showed fractions of monocytes, activated mast cells, M1 macrophages and neutrophils referred to innate immune response increased obviously in the EA stage compared with the C stage; significant increments can be observed in fractions of plasma cells and activated memory CD4+ T cells comparing the LA stage with the C stage; increasing fractions of plasma cells, CD8+ T cells, activated memory CD4+ T cells, follicular helper T cells, regulatory T cells (Tregs) and gamma delta T cells can be discovered in the LA stage compared with the EA stage. Above results were suitable for any serotype, suggesting different serotypes did not have obvious immune differences. Increasing immune response eliminated DENV which avoided Dengue symptoms [27] and immune response kinetics dependent on initial lymphocyte numbers, were observed to correspond with illness severity [53]. Our studies showed neutrophil and humoral immune response are activated in DHF in the LA stage, but the whole immune system was damaged in DHF compared with DF, which was a possible reason leading to DHF.
Interestingly, KEGG enrichment showed DEGs (between the EA stage and the C stage, and between the EA stage and the LA stage) were enriched in Coronavirus disease -COVID-19, Influenza A, Hepatitis C and Measles pathways, implying Coronavirus [54], Alphainfluenzavirus influenzae [55], Hepatitis C virus [56] and Measles morbillivirus also up regulated these genes during infection and had shared pathogenic mechanism.
Gene expression profiles from public databases (GEO) have been applied to explore Dengue biomarkers by researchers. Several studies have used multiple genes to distinguish Dengue patients from healthy samples [39,40], DHF from DF [41] and different stages [57]. However, multi-gene lists can limit the sensitivity and specificity of DEGs as disease biomarkers [58,59].
Compared with previous studies [39][40][41], our study had several advantages. First of all, this study was based on single gene analyses which allowed us to identify a stable and robust biomarker and these biomarkers were identified and verified on independent datasets which increased the accuracy of our study. Secondly, we combined three methods, including DEG, WGCNA and ROC to screen, verify and test biomarkers for Dengue diagnoses which increase the accuracy of our results, differing from precious studies only depending on the DEG analysis. Thirdly, we analyzed 22 types of immune cells including less attention immune cells based on RNA-sequence which contributed to increase understand of the whole immune response during DENV infection and explore correlations between genes and immune cells. A previous study showed that peripheral lymphocyte subset alteration can be independent predictors for clinical characteristics and treatment efficacy of COVID-19 [60]. Interestingly, we found that the fraction of Plasma cells, activated memory CD4+ T cells and Monocytes in Dengue patients also had clinical characteristics and can distinguish these clinical stages for Dengue patients. Finally, not just for single serotype, our study included four serotypes which was involved in separate and combined analysis.
There is still a shortcoming of this study, because the study only uses public datasets to analyze, verify and test, without clinical verification and test. Our research has provided help in distinguishing the stage and severity of Dengue infection and understanding pathogenic mechanism of different serotypes, and will help analyze mechanisms of DHF and benefit to clinical treatment in the future.

Conclusions
In conclusion, based on expression levels of CD38 and fractions of Plasma cells, activated memory CD4+ T cells and Monocytes, we can discriminate admirably three clinical stages for Dengue patients (results are suitable for any serotype); ZNF595 can better distinguish dengue hemorrhagic fever (DHF) from Dengue fever (DF). Upregulated autophagy-related genes contribute to understand mechanisms of DHF.