Skip to main content

Explainable machine learning model for prediction of 28-day all-cause mortality in immunocompromised patients in the intensive care unit: a retrospective cohort study based on MIMIC-IV database

Abstract

Objectives

This study aimed to develop and validate an explainable machine learning (ML) model to predict 28-day all-cause mortality in immunocompromised patients admitted to the intensive care unit (ICU). Accurate and interpretable mortality prediction is crucial for clinical decision-making and optimal allocation of critical care resources for this vulnerable patient population.

Methods

We utilized retrospective clinical data from the MIMIC-IV (version 2.2) database, encompassing ICU admissions at Beth Israel Deaconess Medical Center from 2008 to 2019. Eligible immunocompromised patients, including those with primary immunodeficiencies and chronic acquired conditions, such as hematological malignancies, solid tumors, and organ transplantation, were selected. Data were randomly split into training (80%) and testing (20%) cohorts. Ten ML models (logistic regression, XGBoost, LightGBM, AdaBoost, Random Forest, Gradient Boosting, Gaussian Naive Bayes, Complement Naive Bayes, Multilayer Perceptron, and Support Vector Machine) were developed and evaluated using area under the receiver operating characteristic curve (AUROC), area under the precision–recall curve (AUPRC), sensitivity, specificity, accuracy, and F1 score. Model explainability was achieved through SHapley Additive exPlanations (SHAP), and decision curve analysis (DCA) assessed clinical utility. In addition, Cox proportional hazards regression was conducted to evaluate the impact of predictive factors on time-to-event outcomes.

Results

Among the evaluated models, the Support Vector Machine (SVM) demonstrated the highest AUROC of 0.863 (95% CI 0.834–0.890) and a notable AUPRC of 0.678 (95% CI 0.624–0.736). Key predictive factors consistently identified across multiple ML models included 24-h urine output, blood urea nitrogen (BUN) levels, presence of metastatic solid tumors, Charlson Comorbidity Index (CCI), and international normalized ratio (INR). SHAP analyses provided detailed insights into how these features influenced model predictions.

Conclusions

The explainable ML models based on various artificial intelligence methods demonstrated promising clinical applicability in predicting 28-day mortality risk among immunocompromised ICU patients. Factors such as urine output, BUN, metastatic solid tumors, CCI, and INR significantly contributed to prediction outcomes and may serve as important predictors in clinical practice.

Introduction

The last few decades have witnessed significant advancements in therapeutic interventions for conditions, such as cancer, hematologic malignancies, solid organ transplantation, and autoimmune diseases [1,2,3,4]. These advancements have markedly improved the survival rates of patients with these conditions, leading to a growing number of immunocompromised patients in intensive care units (ICUs) [5]. The proportion of critically ill patients with compromised immune systems has risen to approximately one-third of all ICU admissions [6]. These patients face specific challenges and risks, including higher mortality rates and increased complications.

Immunocompromised patients may require ICU admission for various reasons, including severe infections, immune-mediated organ dysfunction, acute gastrointestinal bleeding, acute hypoxemic respiratory failure, and complications related to their primary diseases or treatments [7,8,9,10,11,12,13]. Acute respiratory failure (ARF) is a leading cause of ICU admission among immunocompromised patients, often resulting from bacterial (51.2%) and viral infections (25%) [14]. Gastrointestinal bleeding (GIB) is another critical reason for ICU admission, prevalent among those with hematological malignancies and undergoing chemotherapy [7]. Immunocompromised patients’ outcomes are influenced not only by their underlying medical conditions but also by factors specific to their ICU stay. Baseline immune dysfunction, the use of broad-spectrum antibiotics, invasive devices, and additional immune-modulating therapies increase their risk of acquiring new infections during their ICU stay [15,16,17,18]. These patients are at high risk for secondary complications, such as ventilator-associated pneumonia (VAP) [19, 20], surgical site infections (SSI) [21, 22], and bloodstream infections [23,24,25]. These complications lead to higher mortality rates, prolonged ICU and hospital stays, and increased healthcare costs. Comprehensive management and effective preventive measures are essential to improve outcomes and reduce mortality in this vulnerable population.

Accurate mortality prediction for immunocompromised ICU patients is crucial for effective patient management and optimal healthcare resource allocation [26]. Traditional predictive models, though beneficial, often fail to account for the intricate factors affecting outcomes in this high-risk group. Established severity scoring systems such as APACHE, SOFA, or SAPS have demonstrated utility in general ICU populations [27], while specialized models tailored to the unique characteristics of immunocompromised patients may offer additional insights given their distinct clinical presentations and risk profiles. Recent advancements in machine learning (ML) have shown great potential in predicting clinical outcomes in critically ill patients [28]. ML models can integrate a vast array of clinical variables, offering superior predictive accuracy compared to traditional methods.

However, the complexity of these models often results in a “black-box” phenomenon, hindering their acceptance in clinical practice [29]. To address this issue, explainable AI techniques, such as SHapley Additive exPlanations (SHAP) values [30], have emerged to elucidate the contribution of each feature to the model's predictions. The clinical application potential of such explainable prediction models is substantial. Studies have demonstrated that early identification of high-risk immunocompromised patients can positively influence treatment trajectories, with timely interventions potentially improving outcomes in this vulnerable population [31]. In addition, mortality prediction tools have been shown to enhance resource allocation efficiency in ICU settings by helping identify patients who may benefit most from intensive interventions [32], a consideration particularly relevant for immunocompromised patients who often require specialized resources. Beyond direct patient care, prediction models serve as valuable tools for hospital performance evaluation and quality improvement initiatives. Research has shown that risk-adjusted mortality metrics based on accurate prediction models can identify variation in care quality across institutions, leading to targeted improvement efforts and better outcomes [33]. Furthermore, studies examining physician decision-making patterns indicate that objective prognostic information influences treatment choices and management strategies, enabling more personalized and appropriate care for these complex patients [28].

This study aims to develop and validate an explainable ML model to predict 28-day mortality risk in immunocompromised ICU patients. Utilizing data from the MIMIC-IV database, we seek to create a robust tool that not only predicts outcomes accurately but also offers interpretability through SHAP values. This approach aims to improve clinical decision-making and patient care in critical care settings.

Materials and methods

Study design and data source

The data utilized in this study were derived from the MIMIC-IV (version 2.2) database [34], an extensive repository of clinical information for patients admitted to the ICU at Beth Israel Deaconess Medical Center in the United States between 2008 and 2019. This database encompasses a broad spectrum of data, including demographics, vital signs, laboratory tests, medications, and follow-up information. The MIMIC-IV database is freely accessible to researchers globally upon receiving joint approval from the ethics review boards of MIT and Harvard Medical School. Given the retrospective nature of the study, informed consent was waived. Ethical permission to use the MIMIC-IV database has been secured by the research team (certification no.: 48061114, 38,118,593).

Study population

The study included patients from the MIMIC-IV database who met the criteria for overt immunosuppressive conditions at ICU admissions. We defined the immunosuppressed population to include those with primary immunodeficiencies and chronic acquired immunodeficiencies. Specifically, primary immunodeficiencies encompassed conditions, such as antibody deficiency, cellular deficiency, combined antibody and cellular immune deficiency, phagocytic defects, and complement defects. Chronic acquired immunodeficiencies included hematological malignancies, solid tumors, solid organ transplantation, corticosteroids and other immunosuppressive therapies, hematopoietic stem cell transplantation (HSCT), and HIV [35]. The detailed information will be further elucidated in the supplementary materials. Exclusions were made for patients younger than 18 years and those with ICU stays shorter than 6 h. Only the data from the first ICU treatment during the same hospital admission were included in the analysis.

Feature extraction and data preprocessing

Structured Query Language was used to extract data. We collected variables including demographics, vital signs, laboratory tests, pre-ICU comorbidities, mechanical ventilation, Charlson Comorbidity Index (CCI), Glasgow Coma Scale (GCS) scores, Sequential Organ Failure Assessment (SOFA) scores, and 28-day all-cause mortality. Laboratory tests and vital signs were collected within the first 24 h after ICU admission. For time-varying measurements during this period, we employed a comprehensive approach to capture different aspects of physiological status. For vital signs (heart rate, respiratory rate, temperature, blood pressure, and oxygen saturation) and laboratory measurements, we calculated three summary statistics: maximum values to identify acute deterioration, minimum values to detect physiological compromise, and mean values to reflect overall trends. Clinical severity scores (GCS and SOFA) were calculated using the worst values of their components within the first 24-h window. For cumulative measurements, such as urine output, we calculated the total sum over the 24-h period. Mechanical ventilation status was defined as any ventilatory support required during the first 24 h of ICU admission. The outcome variable was 28-day all-cause mortality after ICU admission.

Variables with missing values exceeding 20% were excluded from the analysis. For the remaining variables, we applied nearest neighbor imputation algorithms to address any missing values (missing data statistics are provided in Supplementary Table 1). Our feature selection process followed a two-step approach. First, we performed univariate logistic regression analysis for all variables, eliminating features with P > 0.05 as they were deemed less likely to be relevant for 28-day mortality prediction. Second, we conducted correlation analyses to reduce multicollinearity: for pairs of variables with correlation coefficients exceeding 0.75 (Spearman correlation for numeric pairs, Cramer’s V for categorical pairs, and correlation ratio for mixed pairs), we retained only the variable with the stronger association with mortality (lower p value in univariate logistic regression) and removed the other (correlation patterns are shown in Supplementary Fig. 1). For the final data preprocessing steps, we employed a model-specific approach. For severely skewed numeric features (|skewness|> 1.5), we selectively applied the Yeo–Johnson transformation to improve the performance of linear models, while preserving the original distribution for tree-based models that are naturally less sensitive to feature distributions. All categorical variables were converted to numeric format using one-hot encoding. This tailored preprocessing strategy allowed us to optimize each algorithm's performance while maintaining methodological rigor.

Model development and validation

We utilized ten ML models to construct the prediction model, including logistic regression, eXtreme Gradient Boosting (XGBoost), LightGBM, AdaBoost, Random Forest (RF), Gradient Boosting, Gaussian Naive Bayes (GNB), multilayer perceptron neural network (MLP), Complement Naive Bayes (CNB), and support vector machine (SVM). The overall data set was randomly divided into two groups, with 80% in the training cohort and 20% in the testing cohort. To minimize overfitting and identify the optimal hyperparameters, fivefold cross-validation (CV) was performed. Grid search was used to find the optimal hyperparameters for each machine learning model. The models were evaluated using several performance metrics, including the area under the receiver operating characteristic curve (AUROC), area under the precision–recall curve (AUPRC), accuracy, sensitivity (recall), specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score. For each model, the optimal probability threshold was determined using the Youden index method, which maximizes the sum of sensitivity and specificity minus one. The 95% confidence intervals for all metrics were derived using bootstrap resampling with 500 iterations. To further interpret the results and understand the contribution of each feature to the predictions, we employed SHapley Additive exPlanations (SHAP) to enhance the transparency and interpretability of the best machine learning model. To assess potential clinical utility, we performed decision curve analysis (DCA) to evaluate net benefits across different threshold probabilities. In addition to the machine learning models, we performed Cox proportional hazards regression to account for the time-to-event nature of survival data. Both univariable and multivariable Cox regression analyses were conducted, with variables showing significant associations in univariable analysis (p < 0.05) included in the multivariable model.

Statistical methods

Continuous variables are presented as median with interquartile ranges (IQR) due to their non-normal distribution, and categorical variables are presented as numbers (percentages). Appropriate statistical tests such as the Mann–Whitney U test, Student's t test, chi-square test, or Fisher's exact test were used to compare baseline characteristic variables. All statistical tests were two-tailed, and a p value less than 0.05 was considered statistically significant. The performance metrics used in this study were calculated as follows:

$$\text{Accuracy }= (\text{TP }+\text{ TN})/(\text{TP }+\text{ TN }+\text{ FP }+\text{ FN})$$
$$\text{Sensitivity }(\text{Recall}) =\text{ TP}/(\text{TP }+\text{ FN})$$
$$\text{Specificity }=\text{ TN}/(\text{TN }+\text{ FP})$$
$$\text{Positive predictive value }(\text{PPV},\text{ precision}) =\text{ TP}/(\text{TP }+\text{ FP})$$
$$\text{Negative predictive value }(\text{NPV}) =\text{ TN}/(\text{TN }+\text{ FN})$$
$$\text{F}1\text{ score }= 2 \times (\text{precision }\times \text{ recall})/(\text{precision }+\text{ recall})$$

where TP = true positives, TN = true negatives, FP = false positives, and FN = false negatives.

For all model performance metrics (AUC, PRC, accuracy, PPV, sensitivity, specificity, NPV, and F1 score), we estimated 95% confidence intervals using bootstrap resampling with 500 iterations. This involved randomly sampling with replacement from the test data set, calculating each performance metric on these bootstrap samples, and determining the 2.5 th and 97.5 th percentiles of the resulting distribution. Data analysis and model establishment were conducted in Python (version 3.10.0). The complete code used for data preprocessing, model development, and analysis is available at https://github.com/leanqon/immunocompromised-mortality-ml.

Results

Baseline characteristics

In this study, we extracted 8,782 eligible patients from MIMIC-IV database, dividing the patients into two groups: a survival group (6,805 cases) and a mortality group (1,977 cases) according to all-cause 28-day mortality. A flowchart of the study cohort selection process is presented in Fig. 1. The baseline characteristics are presented in Table 1. Among the entire patient population, the median age was 68.00 years (IQR: 58.00, 77.00). The median age was higher in the mortality group (71.00 vs. 67.00 years, p < 0.001). Gender distribution was similar between groups, with females comprising 41.8% of the overall population (p = 0.392). White patients constituted the majority (72.1%) of the study population, followed by other (11.3%), African–American (9.6%), Asian (4.1%), and Hispanic–American (2.9%).

Fig. 1
figure 1

Flowchart of the study

Table 1 Baseline characteristics of the patients

Vital signs analysis showed that the mortality group had significantly higher heart rate (HR) (mean: 92.89 vs. 84.34 beats per minute, p < 0.001), higher respiratory rate (RR) (mean: 20.42 vs. 18.43 breaths per minute, p < 0.001), and lower systolic blood pressure (SBP) (mean: 110.05 vs. 116.30 mmHg, p < 0.001). Laboratory indicators revealed that the mortality group had higher white blood cell (WBC) counts (mean: 11.48 vs. 10.00 × 10⁹/L, p < 0.001), lower hemoglobin (mean: 9.52 vs. 10.20 g/dL, p < 0.001), and higher blood urea nitrogen (BUN) and creatinine levels (BUN mean: 27.50 vs. 18.50 mg/dL; creatinine mean: 1.10 vs. 0.90 mg/dL, both p < 0.001). The mortality group also showed higher international normalized ratio (INR) values (mean: 1.40 vs. 1.23, p < 0.001) and significantly lower 24-h urine output (911.00 vs. 1510.00 mL, p < 0.001). Comorbidities were more prevalent in the mortality group, including higher incidences of myocardial infarction (14.4% vs. 11.1%, p < 0.001), congestive heart failure (24.7% vs. 19.8%, p < 0.001), cerebrovascular disease (13.5% vs. 9.5%, p < 0.001), chronic pulmonary disease (28.8% vs. 25.1%, p = 0.001), and renal disease (22.6% vs. 18.6%, p < 0.001). Notably, metastatic solid tumor was significantly more common in the mortality group (52.9% vs. 32.2%, p < 0.001). Mechanical ventilation was also more common in the mortality group (29.4% vs. 22.0%, p < 0.001).

Selected predictive features

Our systematic feature selection process yielded a final set of 44 features that demonstrated significant association with 28-day mortality. These included demographic characteristics (age, weight, ethnicity), clinical severity and comorbidity indices (Charlson Comorbidity Index [CCI], GCS, SOFA score), vital sign measurements (HR mean, RR minimum and mean, temperature minimum and mean, SBP minimum, mean arterial pressure [MAP] minimum and maximum, SpO2 minimum, maximum and mean), laboratory values (BUN minimum, chloride minimum, calcium minimum, sodium minimum, potassium maximum, bicarbonate minimum, hemoglobin minimum, platelet count [PLT] minimum, WBC count minimum, MCV maximum, mean corpuscular hemoglobin concentration [MCHC] minimum, INR mean, magnesium mean, phosphate maximum, glucose [GLU] maximum), urine output, and comorbidities (myocardial infarction, congestive heart failure, cerebrovascular disease, chronic pulmonary disease, mild and severe liver disease, paraplegia, renal disease, metastatic solid tumor, acquired immune deficiency syndrome [AIDS]), and mechanical ventilation status. The correlation structure among these features is visualized in Supplementary Fig. 1, demonstrating the effective reduction of multicollinearity while preserving clinically informative variables. Detailed information about these final features is provided in the supplementary materials.

Model performance

The performance of all models on the testing data set is summarized in Table 2 and Fig. 2. The Logistic Regression model achieved an AUC of 0.857 (95% CI 0.826–0.887) and PRC of 0.662 (95% CI 0.597–0.727), while the support vector machine (SVM) model showed an AUC of 0.863 (95% CI 0.834–0.890) and PRC of 0.678 (95% CI 0.624–0.736). The Multilayer Perceptron (MLP) model performed similar to an AUC of 0.859 (95% CI 0.826–0.887) and the highest PRC of 0.687 (95% CI 0.622–0.744).

Table 2 Model performance metrics of ML models in validation data set (95% CI)
Fig. 2
figure 2

Area under the receiver operating characteristic curve of models, precision–recall curve, and calibration plot in testing set. A ROC curves of models. B PRC curves of models. C Calibration curves of models

All models showed varying strengths across different metrics. The SVM model demonstrated the highest sensitivity (0.787, 95% CI 0.729–0.839) but moderate precision. The Complement Naive Bayes (CNB) model achieved the highest specificity (0.889, 95% CI 0.865–0.909) but lower sensitivity. The Gaussian Naive Bayes (GNB) model, while having a lower overall AUC (0.772, 95% CI 0.734–0.808), showed high sensitivity (0.751, 95% CI 0.695–0.806) that could be valuable in scenarios where identifying all potential high-risk patients is prioritized. Calibration plots of all models in different data sets are shown in Fig. 2.

To better understand the importance of features in prediction, we analyzed three of our tree-based models (Gradient Boosting, LightGBM, and XGBoost) using SHAP values, as shown in Fig. 3. Across all three models, similar key predictors emerged. The 24-h urine output consistently appeared as the most important predictor, followed by blood urea nitrogen minimum value (bun_min) and presence of metastatic solid tumors. Additional important predictors included the Charlson Comorbidity Index (CCI), international normalized ratio mean value (inr_pt_mean), and heart rate mean (hr_mean). Figure 3 not only ranks these features by importance but also illustrates how each feature’s values impact predictions. For example, lower urine output (blue values in the SHAP summary plots) is associated with higher predicted mortality risk, while the presence of metastatic solid tumors (red values) similarly increases predicted risk.

Fig. 3
figure 3figure 3

SHAP analysis of key predictive models. Feature importance and SHAP summary plots for (A) gradient boosting, (B) LightGBM, and (C) XGBoost models. Left panels show feature importance ranked by mean absolute SHAP value. Right panels illustrate the impact of feature values on model output, with red indicating higher feature values and blue indicating lower values. The horizontal position shows whether the effect of that value is associated with higher or lower prediction of mortality risk urine_sum, total 24-h urine output; bun_min, minimum blood urea nitrogen (BUN) level; metastatic_solid_tumor, presence of metastatic solid tumor (binary indicator); inr_pt_mean, mean international normalized ratio (INR); charlson_comorbidity_index, Charlson Comorbidity Index; resp_mean, mean respiratory rate; hr_mean, mean heart rate; bicar_min, minimum bicarbonate level; plt_min, minimum platelet count; cl_min, minimum chloride level; o2 sat_min, minimum oxygen saturation (SpO₂); resp_min, minimum respiratory rate; sbp_min, minimum systolic blood pressure; wbc_min, minimum white blood cell (WBC) count; o2 sat_mean, mean oxygen saturation (SpO₂); gcs, Glasgow Coma Scale; map_min, minimum mean arterial pressure (MAP); na_min, minimum sodium level; sofa_score, SOFA score (sequential organ failure assessment); temp_mean, mean body temperature

Cox regression analysis further supported our findings from machine learning models. The multivariable Cox analysis revealed that metastatic solid tumor (HR 1.146, 95% CI 1.097–1.198, p < 0.001), severe liver disease (HR 1.127, 95% CI 1.029–1.234, p = 0.01), and paraplegia (HR 1.105, 95% CI 0.993–1.230, p = 0.068) were associated with increased mortality risk. Several physiological parameters, including heart rate (HR 1.074, 95% CI 1.051–1.097, p < 0.001) and respiratory rate (HR 1.066, 95% CI 1.043–1.089, p < 0.001), also showed significant associations with mortality. Detailed results of the Cox regression analysis are presented in Supplementary Table 2.

Decision curve analysis

To evaluate the clinical utility of our prediction models, we performed decision curve analysis (Fig. 4). As shown, all models demonstrated positive net benefits across a wide range of threshold probabilities (10–90%) compared to the default strategies of treating all patients or treating none. The greatest incremental benefit over the reference strategies was observed in the lower threshold probability range (10–40%), where the vertical separation between model curves and reference lines was maximal. All models performed similar to one another, with comparable net benefit curves throughout most threshold ranges, suggesting equivalent clinical utility for mortality risk prediction in immunocompromised ICU patients. These findings indicate that implementing any of these models could support clinical decision-making by helping identify high-risk patients who might benefit from more intensive monitoring or interventions.

Fig. 4
figure 4

Decision curve analysis comparing net benefits of different models across threshold probabilities. The horizontal dotted line represents treating no patients, while the sloped dashed line represents treating all patients. All models showed positive net benefits compared to default strategies. XGBoost, extreme gradient boosting

Discussion

Compared with other ICU patients, immunocompromised patients face unique challenges that significantly impact their prognosis. This study leverages a large data set and machine learning algorithms with the primary objective of developing and validating an explainable ML model to predict 28-day all-cause mortality risk in immunocompromised patients admitted to the ICU. Based on large amounts of data and ML algorithms, this study had the following new findings: (i) several factors were consistently associated with the mortality of immunocompromised ICU patients across multiple models, including 24-h urine output, BUN levels, INR levels, the presence of metastatic solid tumors, and CCI and (ii) multiple ML models demonstrated promising performance in predicting mortality, with several models achieving comparable results.

Our findings align with existing research emphasizing the importance of comorbidities and laboratory values in predicting mortality among critically ill patients. In immunosuppressed patients, the presence of metastatic solid tumors significantly worsens prognosis. Vigneron et al. [36] demonstrated in their multivariate analysis that metastatic disease is an independent factor associated with increased ICU mortality, with a cause-specific hazard (CSH) ratio of 1.78 (95% CI 1.38–2.30, p < 0.001), and patients with cancer in progression have a similarly increased risk, with a CSH ratio of 1.62 (95% CI 1.28–2.05, p < 0.001). Among the 1279 patients with complete follow-up, the 1-year survival rate was 33.2%, with lung and gastrointestinal cancers being the most common tumor sites. Patients requiring intensive care interventions, such as mechanical ventilation or vasopressors, have higher in-hospital mortality rates due to severe complications of metastatic cancer [37]. Immunosuppression complicates management further by impairing the body's ability to combat infections and recover from critical illnesses [38]. In addition, previous studies have shown a consistent association between elevated BUN levels and poor prognosis in critically ill patients [39,40,41]. In immunosuppressed patients, higher BUN levels, indicative of renal dysfunction, are associated with poorer outcomes. Xia et al. demonstrated that higher BUN/ALB ratio is positively related to 30-day mortality in pneumonia patients receiving glucocorticoids [42]. INR levels, reflecting coagulation abnormalities, also correlate with higher mortality rates among critically ill and immunosuppressed patients [43]. INR is an important predictor for the formation of microthrombi at an advanced stage of septic shock [44, 45]. A retrospective observational study showed that a higher INR was associated with a higher risk of mortality after ICU admission in patients with rheumatoid arthritis [46]. The Charlson Comorbidity Index [47], an assessment tool designed specifically to predict long-term mortality, is a well-established and validated tool to evaluate comorbidities, enabling the early identification of a constellation of symptoms and syndromes in individual patients, and improving prognostic estimations of health risks [48]. CCI is strongly associated with both mortality and length of stay and is used as a prognostic marker in the ICU [49, 50]. In immunosuppressed patients, CCI also shows good performance, such as in cases of kidney transplant [51], inflammatory bowel diseases [52], systemic lupus erythematosus [53], colorectal cancer [54]. Furthermore, urine output within the first 24 h is a vital indicator of renal function and fluid status, with decreased urine output associated with higher mortality in this patient population [55].

Machine learning has increasingly been integrated into ICU settings to enhance predictive modeling. Numerous studies have shown that ML models can effectively predict critical outcomes, such as sepsis, mortality, and hemodynamic deterioration [56]. ML models significantly outperform traditional methods by leveraging complex interactions between various clinical variables to provide early warnings and improve patient management [57, 58]. The ability of ML models to handle high-dimensional and nonlinear relationships among clinical features makes them particularly suited for the ICU environment, where patient data are vast and complex.

In this study, we observed that several ML models, including Logistic Regression, SVM, and MLP, demonstrated comparable performance in mortality prediction with overlapping confidence intervals in their AUC and PRC values. The use of SHAP values allowed us to decompose the predictions and understand the influence of individual features, thereby addressing the “black-box” issue commonly associated with ML models. The SHAP analysis across multiple models consistently revealed that lower 24-h urine output, elevated BUN levels, presence of metastatic solid tumors, higher CCI, and elevated INR values were associated with increased mortality risk. This consistency across different modeling approaches strengthens our confidence in these findings and can guide clinicians in prioritizing patients who may benefit from more intensive monitoring and early interventions. The decision curve analysis further confirmed the clinical utility of our models, demonstrating positive net benefits across a wide range of threshold probabilities. This analysis complements traditional performance metrics by directly quantifying the clinical value of implementing these models in decision-making processes. The comparable performance of different models in terms of net benefit aligns with their similar discriminative ability shown by conventional metrics, reinforcing the robustness of our findings.

Despite the promising results, several limitations need to be acknowledged. First, the retrospective nature of the study and reliance on the MIMIC-IV database may limit the generalizability of the findings. The patient population in this database may not fully represent all immunocompromised patients in diverse clinical settings. Second, while SHAP values enhance the interpretability of the ML models, they are still subject to the limitations of the underlying model's assumptions and the quality of the input data. Another limitation is the potential for selection bias due to the exclusion of certain patients based on missing data or other criteria. This could result in a data set that is not fully representative of the broader patient population. In addition, some important clinical variables with a high proportion of missing values were not included in the analysis, which might affect the model's predictive performance. We also acknowledge that while we implemented fivefold cross-validation and rigorous evaluation methods, there remains a degree of performance difference between training and test sets, particularly in tree-based models, highlighting the challenge of developing models that generalize perfectly. Finally, external validation of the models was not performed, which is crucial for assessing their applicability in different clinical environments. Future studies should include external validation cohorts to enhance the robustness and generalizability of the findings.

Conclusions

The development of explainable machine learning models for predicting 28-day all-cause mortality in immunocompromised ICU patients represents a significant advancement in the field of critical care. Our comprehensive evaluation of multiple modeling approaches revealed that several models demonstrate promising and comparable performance in mortality prediction. The interpretability of these models through SHAP values consistently identified key predictors of mortality risk, with 24-h urine output, BUN levels, and metastatic solid tumors emerging as particularly important factors. These insights can help guide clinical decision-making and resource allocation in the management of this vulnerable patient population. Continued research, including external validation studies across diverse clinical settings, is essential to further refine these models and fully realize their potential benefits in improving patient outcomes.

Competing interests

The authors declare no competing interests.

Data availability

The data that support the findings of this study are derived from the MIMIC-IV database (https://physionet.org/content/mimic-iv), a publicly available critical care dataset maintained by the Massachusetts Institute of Technology (MIT). Access to MIMIC-IV requires completion of the Collaborative Institutional Training Initiative (CITI) program and adherence to the data use agreement (DUA).

Code availability

The code and analysis used in this study are available at https://github.com/leanqon/immunocompromised-mortality-ml.

References

  1. Mereiter S, Balmaña M, Campos D, Gomes J, Reis CA. Glycosylation in the era of cancer-targeted therapy: where are we heading? Cancer Cell. 2019;36(1):6–16. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.ccell.2019.06.006.

    Article  CAS  PubMed  Google Scholar 

  2. Tang L, Huang Z, Mei H, Hu Y. Immunotherapy in hematologic malignancies: achievements, challenges and future prospects. Signal Transduct Target Ther. 2023;8(1):306. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41392-023-01521-5.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Limaye AP, Babu TM, Boeckh M. Progress and Challenges in the Prevention, Diagnosis, and Management of Cytomegalovirus Infection in Transplantation. Clin Microbiol Rev. 2020. https://doiorg.publicaciones.saludcastillayleon.es/10.1128/cmr.00043-19.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Schett G, Mackensen A, Mougiakakos D. CAR T-cell therapy in autoimmune diseases. Lancet. 2023;402(10416):2034–44. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/s0140-6736(23)01126-1.

    Article  CAS  PubMed  Google Scholar 

  5. Azoulay E, Schellongowski P, Darmon M, et al. The intensive care medicine research agenda on critically ill oncology and hematology patients. Intensive Care Med. 2017;43(9):1366–82. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s00134-017-4884-z.

    Article  PubMed  Google Scholar 

  6. Azoulay E, Russell L, Van de Louw A, et al. Diagnosis of severe respiratory infections in immunocompromised patients. Intensive Care Med. 2020;46(2):298–314. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s00134-019-05906-5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Catano J, Sacleux SC, Gornet JM, et al. Gastrointestinal bleeding in critically ill immunocompromised patients. Ann Intensive Care. 2021;11(1):130. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13613-021-00913-6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Baucher L, Lemiale V, Joseph A, et al. Severe infections requiring intensive care unit admission in patients receiving ibrutinib for hematological malignancies: a groupe de recherche respiratoire en réanimation onco-hématologique (GRRR-OH) study. Ann Intensive Care. 2023;13(1):123. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13613-023-01219-5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Dumas G, Bertrand M, Lemiale V, et al. Prognosis of critically ill immunocompromised patients with virus-detected acute respiratory failure. Ann Intensive Care. 2023;13(1):101.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Teh BW, Mikulska M, Averbuch D, et al. Consensus position statement on advancing the standardised reporting of infection events in immunocompromised patients. Lancet Infect Dis. 2024;24(1):e59–68.

    Article  PubMed  Google Scholar 

  11. Fizza Haider S, Sloss R, Jhanji S, Nicholson E, Creagh-Brown B. Management of adult patients with haematological malignancies in critical care. Anaesthesia. 2023;78(7):874–83.

    Article  CAS  PubMed  Google Scholar 

  12. Mallick S, Anila K, Sivaprasadan S, Sudhindran S. Immunosuppression in liver transplant recipients in the setting of sepsis. J Clin Exp Hepatol. 2023;13(4):682–90.

    Article  CAS  PubMed  Google Scholar 

  13. Giacobbe DR, Dettori S, Di Pilato V, et al. Pneumocystis jirovecii pneumonia in intensive care units: a multicenter study by ESGCIP and EFISG. Crit Care. 2023;27(1):323. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13054-023-04608-1.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Secreto C, Chean D, Van de Louw A, et al. Characteristics and outcomes of patients with acute myeloid leukemia admitted to intensive care unit with acute respiratory failure: a post-hoc analysis of a prospective multicenter study. Ann Intensive Care. 2023;13(1):79.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Feys S, Lagrou K, Lauwers HM, et al. High burden of COVID-19-associated pulmonary aspergillosis in severely immunocompromised patients requiring mechanical ventilation. Clin Infect Dis. 2024;78(2):361–70.

    Article  CAS  PubMed  Google Scholar 

  16. Dumas G, Arabi YM, Bartz R, et al. Diagnosis and management of autoimmune diseases in the ICU. Intensive Care Med. 2024;50(1):17–35.

    Article  CAS  PubMed  Google Scholar 

  17. Mokrani D, Chommeloux J, Pineton de Chambrun M, Hékimian G, Luyt C-E. Antibiotic stewardship in the ICU: time to shift into overdrive. Ann Intensive Care. 2023;13(1):39.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Dumas G, Lemiale V, Rathi N, et al. Survival in immunocompromised patients ultimately requiring invasive mechanical ventilation: a pooled individual patient data analysis. Am J Respir Crit Care Med. 2021;204(2):187–96. https://doiorg.publicaciones.saludcastillayleon.es/10.1164/rccm.202009-3575OC.

    Article  PubMed  Google Scholar 

  19. Luo W, Xing R, Wang C. The effect of ventilator-associated pneumonia on the prognosis of intensive care unit patients within 90 days and 180 days. BMC Infect Dis. 2021;21(1):684. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12879-021-06383-2.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Kreitmann L, Gaudet A, Nseir S. Ventilator-associated pneumonia in immunosuppressed patients. Antibiotics. 2023. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/antibiotics12020413.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Seidelman JL, Mantyh CR, Anderson DJ. Surgical site infection prevention: a review. JAMA. 2023;329(3):244–52.

    Article  PubMed  Google Scholar 

  22. Alverdy JC, Hyman N, Gilbert J. Re-examining causes of surgical site infections following elective surgery in the era of asepsis. Lancet Infect Dis. 2020;20(3):e38–43.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Timsit J-F, Ruppé E, Barbier F, Tabah A, Bassetti M. Bloodstream infections in critically ill patients: an expert statement. Intensive Care Med. 2020;46(2):266–84.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Li Z, Zhuang H, Wang G, Wang H, Dong Y. Prevalence, predictors, and mortality of bloodstream infections due to methicillin-resistant Staphylococcus aureus in patients with malignancy: systemic review and meta-analysis. BMC Infect Dis. 2021;21:1–10.

    Google Scholar 

  25. Zhang L, Zhen S, Shen Y, et al. Bloodstream infections due to Carbapenem-resistant enterobacteriaceae in hematological patients: assessment of risk factors for mortality and treatment options. Ann Clin Microbiol Antimicrob. 2023;22(1):41.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Lindell RB, Nishisaki A, Weiss SL, Traynor DM, Fitzgerald JC. Risk of mortality in immunocompromised children with severe sepsis and septic shock. Crit Care Med. 2020;48(7):1026–33. https://doiorg.publicaciones.saludcastillayleon.es/10.1097/ccm.0000000000004329.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Reddy V, Reddy H, Gemnani R, Kumar S, Acharya S, Reddy V. Navigating the complexity of scoring systems in sepsis management: a comprehensive review. Cureus. 2024. https://doiorg.publicaciones.saludcastillayleon.es/10.7759/cureus.54030.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Hong N, Liu C, Gao J, et al. State of the art of machine learning–enabled clinical decision support in intensive care units: literature review. JMIR Med Inform. 2022;10(3): e28781.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Quinn TP, Jacobs S, Senadeera M, Le V, Coghlan S. The three ghosts of medical AI: can the black-box present deliver? Artif Intell Med. 2022;124: 102158.

    Article  PubMed  Google Scholar 

  30. Nohara Y, Matsumoto K, Soejima H, Nakashima N. Explanation of machine learning models using improved shapley additive explanation. 2019:546-546.

  31. Azoulay E, Russell L, Van de Louw A, et al. Diagnosis of severe respiratory infections in immunocompromised patients. Intensive Care Med. 2020;46:298–314.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Johnson AE, Ghassemi MM, Nemati S, Niehaus KE, Clifton DA, Clifford GD. Machine learning and decision support in critical care. Proc IEEE. 2016;104(2):444–66.

    Article  Google Scholar 

  33. Endo H, Uchino S, Hashimoto S, et al. Development and validation of the predictive risk of death model for adult patients admitted to intensive care units in Japan: an approach to improve the accuracy of healthcare quality measures. J Intensive Care. 2021;9:1–11.

    Article  Google Scholar 

  34. Johnson AE, Bulgarelli L, Shen L, et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data. 2023;10(1):1.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Kreitmann L, Helms J, Martin-Loeches I, et al. ICU-acquired infections in immunocompromised patients. Intensive Care Med. 2024;50:1–18.

    Article  Google Scholar 

  36. Vigneron C, Charpentier J, Valade S, et al. Patterns of ICU admissions and outcomes in patients with solid malignancies over the revolution of cancer treatment. Ann Intensive Care. 2021;11:1–10.

    Article  Google Scholar 

  37. Geijteman ECT, Kuip EJM, Oskam J, Lees D, Bruera E. Illness trajectories of incurable solid cancers. Bmj. 2024;384: e076625. https://doiorg.publicaciones.saludcastillayleon.es/10.1136/bmj-2023-076625.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Gonzalez F, Starka R, Ducros L, et al. Critically ill metastatic cancer patients returning home after unplanned ICU stay: an observational, multicentre retrospective study. Ann Intensive Care. 2023;13(1):73. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13613-023-01170-5.

    Article  PubMed  PubMed Central  Google Scholar 

  39. Weng J, Hou R, Zhou X, et al. Development and validation of a score to predict mortality in ICU patients with sepsis: a multicenter retrospective study. J Transl Med. 2021;19:1–12.

    Article  Google Scholar 

  40. Huang D, Yang H, Yu H, et al. Blood urea nitrogen to serum albumin ratio (BAR) predicts critical illness in patients with coronavirus disease 2019 (COVID-19). Int J General Med. 2021: 4711-4721.

  41. Yue S, Li S, Huang X, et al. Machine learning for the prediction of acute kidney injury in patients with sepsis. J Transl Med. 2022;20(1):215.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Xia B, Song B, Zhang J, Zhu T, Hu H. Prognostic value of blood urea nitrogen-to-serum albumin ratio for mortality of pneumonia in patients receiving glucocorticoids: secondary analysis based on a retrospective cohort study. J Infect Chemother. 2022;28(6):767–73.

    Article  CAS  PubMed  Google Scholar 

  43. Fei A, Lin Q, Liu J, Wang F, Wang H, Pan S. The relationship between coagulation abnormality and mortality in ICU patients: a prospective, observational study. Sci Rep. 2015;5:9391. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/srep09391.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Peltan ID, Vande Vusse LK, Maier RV, Watkins TR. An international normalized ratio-based definition of acute traumatic coagulopathy is associated with mortality, venous thromboembolism, and multiple organ failure after injury. Crit Care Med. 2015;43(7):1429–38. https://doiorg.publicaciones.saludcastillayleon.es/10.1097/ccm.0000000000000981.

    Article  PubMed  PubMed Central  Google Scholar 

  45. Kanda N, Ohbe H, Nakamura K. Effects of antithrombin on persistent inflammation, immunosuppression, and catabolism syndrome among patients with sepsis-induced disseminated intravascular coagulation. J Clin Med. 2023;12(11):3822.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Fujiwara T, Tokuda K, Momii K, et al. Prognostic factors for the short-term mortality of patients with rheumatoid arthritis admitted to intensive care units. BMC Rheumatol. 2020;4(1):64. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s41927-020-00164-1.

    Article  PubMed  PubMed Central  Google Scholar 

  47. Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis. 1987;40(5):373–83.

    Article  CAS  PubMed  Google Scholar 

  48. Charlson ME, Carrozzino D, Guidi J, Patierno C. Charlson comorbidity index: a critical review of clinimetric properties. Psychother Psychosom. 2022;91(1):8–35.

    Article  PubMed  Google Scholar 

  49. Yıldız A, Yiğt A, Benli AR. The prognostic role of Charlson comorbidity index for critically ill elderly patients. Eur Res J. 2020. https://doiorg.publicaciones.saludcastillayleon.es/10.18621/eurj.451391.

    Article  Google Scholar 

  50. Aronsson Dannewitz A, Svennblad B, Michaëlsson K, Lipcsey M, Gedeborg R. Optimized diagnosis-based comorbidity measures for all-cause mortality prediction in a national population-based ICU population. Crit Care. 2022;26(1):306.

    Article  PubMed  PubMed Central  Google Scholar 

  51. Levine MA, Schuler T, Gourishankar S. Complications in the 90-day postoperative period following kidney transplant and the relationship of the Charlson comorbidity index. Can Urol Assoc J. 2017;11(12):388–93. https://doiorg.publicaciones.saludcastillayleon.es/10.5489/cuaj.4378.

    Article  PubMed  PubMed Central  Google Scholar 

  52. Kochar B, Cai W, Cagan A, Ananthakrishnan AN. Pretreatment frailty is independently associated with increased risk of infections after immunosuppression in patients with inflammatory bowel diseases. Gastroenterology. 2020;158(8): e2.

    Article  Google Scholar 

  53. Kim S-K, Choe J-Y, Lee S-S. Charlson comorbidity index is related to organ damage in systemic lupus erythematosus: data from KORean lupus Network (KORNET) registry. J Rheumatol. 2017;44(4):452–8.

    Article  PubMed  Google Scholar 

  54. Marventano S, Grosso G, Mistretta A, et al. Evaluation of four comorbidity indices and Charlson comorbidity index adjustment for colorectal cancer patients. Int J Colorectal Dis. 2014;29:1159–69.

    Article  PubMed  Google Scholar 

  55. Vincent J-L, Ferguson A, Pickkers P, et al. The clinical relevance of oliguria in the critically ill patient: analysis of a large observational database. Crit Care. 2020;24:1–14.

    Article  Google Scholar 

  56. Bomrah S, Uddin M, Upadhyay U, et al. A scoping review of machine learning for sepsis prediction- feature engineering strategies and model performance: a step towards explainability. Crit Care. 2024;28(1):180. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13054-024-04948-6.

    Article  PubMed  PubMed Central  Google Scholar 

  57. Lim L, Gim U, Cho K, Yoo D, Ryu HG, Lee H-C. Real-time machine learning model to predict short-term mortality in critically ill patients: development and international validation. Crit Care. 2024;28(1):76.

    Article  PubMed  PubMed Central  Google Scholar 

  58. Deng H-F, Sun M-W, Wang Y, et al. Evaluating machine learning models for sepsis prediction: a systematic review of methodologies. Iscience. 2022;25(1):103651.

    Article  CAS  PubMed  Google Scholar 

Download references

Funding

Not Applicable.

Author information

Authors and Affiliations

Authors

Contributions

Z.Y. and L.F. wrote the main manuscript text and prepared all figures. Y.D. reviewed the manuscript. All authors reviewed and approved the final version of the manuscript.

Corresponding author

Correspondence to Yueping Ding.

Ethics declarations

Ethics approval and consent to participate

The study protocol was approved by the Institutional Review Board of MIT and Harvard Medical School. Given the retrospective nature of the study using the MIMIC-IV database, the requirement for informed consent was waived. The researchers obtained necessary permissions to access the MIMIC-IV database (certification numbers: 48061114, 38118593).

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, Z., Fang, L. & Ding, Y. Explainable machine learning model for prediction of 28-day all-cause mortality in immunocompromised patients in the intensive care unit: a retrospective cohort study based on MIMIC-IV database. Eur J Med Res 30, 358 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s40001-025-02622-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s40001-025-02622-3

Keywords