Extending SORT to Predict Morbidity using the SOuRCe Cohort

Summary

After cleaning the data, I fitted regression models with POMS (binary: TRUE or FALSE) as the outcome measure with predictor variables derived from the original SORT model.
Regression Model 1: POMS Day 7 or 8 outcome, Age continuous predictor, no internal validation.
Regression Model 2: POMS Day 7 or 8 outcome, predictor variables exactly as in original SORT model, no internal validation.
Regression Model 3: POMS Day 7 or 8 outcome, predictor variables exactly as in original SORT model, internal validation.
Regression Model 4: POMS Day 7 or 8 outcome, predictor variables pre-weighted with the coefficients used in the original SORT model before fitting model, internal validation.
Regression Model 5: Alternative outcome measure - Readmission within 30 days, predictor variables as in Model 3, internal validation.
Regression Model 6: POMS Day 14 or 15 outcome, predictor variables exactly as in Model 3, internal validation.
Regression Model 7: POMS Day 7 or 8 outcome, restricted predictor variable with only Colorectal surgery as the dummy variable for high-risk surgical specialties, otherwise same as Model 3, internal validation.
Regression Model 8: POMS Day 7 or 8 outcome, Surgical Specialty as factors (6 categories, 5 dummy variables), otherwise same as Model 7, internal validation.
Regression Model 9: POMS Day 7 or 8 outcome, ASA = 2 or 3 or >=4, otherwise same as Model 8, internal validation.
Regression Model 10: POMS Day 7 or 8 outcome, predictor variable for opspecialty is now a factor with 6 categories (5 dummy variables) instead of “high-risk” surgical specialties, and ASA is now a continuous variable, otherwise same as Model 7, internal validation.
Regression Model 11: POMS Day 7 or 8 outcome, predictor variable for opspecialty is now a factor with 6 categories (5 dummy variables) instead of “high-risk” surgical specialties, and ASA and Age are now continuous variables, and added interaction between age and specialty, otherwise same as Model 10, internal validation.
NB: Where the ROC curves were plotted, the dotted lines in red signify the ROC curve for p-POSSUM(morbidity), overlaid for comparison, the solid lines signify the fitted model being tested. For the sake of brevity, I do not show the source code after Model 4.

Introduction

Matt Oliver has kindly provided some data that he has cleaned from the SOuRCe database of >1900 patients who underwent surgery. The purpose of our project is to calibrate and validate the risk calculation tools. He has already done this with mortality using SORT and with pPOSSUM. We are going to see if we can also do this for morbidity using POMS.

Data tidying and cleaning

First we load the data that has been cleaned in the SOuRCeDataCleaning.Rmd file

library(knitr)
purl("SOuRCeDataCleaning.Rmd")
read_chunk("SOuRCeDataCleaning.R")
unlink("SOuRCeDataCleaning.R")

Analysis

Mortality external validation

The first step is to externally validate the SORT model for mortality against the data

#Load the data, this is a workaround because there're problems with running 
#plotCalibration with using the existing data_clean dataframe, must be something
#to do with how the data is stored in R (vector or matrix or otherwise)
data_clean <- read.csv("data/SOuRCedata_clean.csv")

#Test SORT with the dataset
library(PredictABEL)

HL1 <- plotCalibration(data_clean, 
                       cOutcome = match("mort30", colnames(data_clean)),
                       predRisk = data_clean$sort_mort,
                       rangeaxis = c(0,0.1),
                       plottitle = "Calibration plot for SORT")

HL2 <- plotCalibration(data_clean, 
                       cOutcome = match("mort30", colnames(data_clean)),
                       predRisk = data_clean$p_possmort,
                       rangeaxis = c(0,0.1),
                       plottitle = "Calibration plot for p-POSSUM")

HL3 <- plotCalibration(data_clean, 
                       cOutcome = match("mort30", colnames(data_clean)),
                       predRisk = data_clean$srs_mort,
                       rangeaxis = c(0,0.1),
                       plottitle = "Calibration plot for SRS")

#Let's overlay the ROC curve over the ROC curve for p-POSSUM
plotROC(data = data_clean, 
        cOutcome = match("mort30", colnames(data_clean)), 
        predrisk = cbind(data_clean$sort_mort,
                         data_clean$p_possmort,
                         data_clean$srs_mort),
        plottitle = "SORT vs p-POSSUM & SRS" 
        #labels = c("SORT", "p-POSSUM", "SRS")
        )

## AUC [95% CI] for the model 1 :  0.793 [ 0.554  -  1.031 ] 
## AUC [95% CI] for the model 2 :  0.791 [ 0.509  -  1.073 ] 
## AUC [95% CI] for the model 3 :  0.639 [ 0.491  -  0.787 ]

legend("bottomright", inset = c(0.02,0.02), legend = c("SORT","p-POSSUM","SRS"),
       col = c(17,18,19), lty = c(1,2,3),
       horiz = TRUE, cex = 0.5)

#Let's compare whether the new model is as good as POSSUM in discrimination
library(pROC)

#Create the ROC objects
roc_SORT <- roc(data_clean$mort30 ~ data_clean$sort_mort, plot=FALSE, auc=TRUE, ci=TRUE)
roc_pPOSSUM <- roc(data_clean$mort30 ~ data_clean$p_possmort, plot=FALSE, auc=TRUE, ci=TRUE)
roc_SRS <- roc(data_clean$mort30 ~ data_clean$srs_mort, plot=FALSE, auc=TRUE, ci=TRUE)

#Test the ROC objects against each other
roc.test(roc_SORT, roc_pPOSSUM)

## 
##  DeLong's test for two correlated ROC curves
## 
## data:  roc_SORT and roc_pPOSSUM
## Z = 0.028709, p-value = 0.9771
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2 
##   0.7925357   0.7910263

roc.test(roc_SORT, roc_SRS)

## 
##  DeLong's test for two correlated ROC curves
## 
## data:  roc_SORT and roc_SRS
## Z = 1.2121, p-value = 0.2255
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2 
##   0.7925357   0.6389956

Day 7 or 8 POMS

Now it looks like we are ready to do some analysis looking at Day 7 or 8 POMS as outcome measures.

#Let's get an overview of the data
describe(join7)

## join7 
## 
##  53  Variables      1583  Observations
## ---------------------------------------------------------------------------
## poms_id 
##        n  missing distinct 
##      632      951      632 
## 
## lowest : 004e6778-08cc-4985-9e1b-fc85622de277 0055f57f-bda3-4f77-b8fa-cf126fdc814a 01028b80-1d58-45ed-b4ea-6fea7ac5d935 0248aa0b-38ba-4eef-bbe8-7e79c1bb7e1a 024be601-de89-488b-b35f-2a85fdb630dc
## highest: fec3f147-5f60-4c4b-8a71-a2358ea2f35b fee6a4a2-6ce1-4995-8fa6-1df682442a53 ff4c36ef-a972-45bb-a334-80431cba81c8 ff8c6d7c-1924-42e5-a9b3-ceeffb6366ff ff9890a7-bec1-499d-afa1-00e5f189bca0 
## ---------------------------------------------------------------------------
## adm_id 
##        n  missing distinct 
##     1583        0     1583 
## 
## lowest : 00422cff-ddbe-49e6-a2d9-f4680c0945c2  00bea38f-af7b-4f5f-9655-1ab134c95571  00cea437-fd5a-45d4-859d-ddd7f48f3199  01125a5f-814c-4120-a298-ae6f27d43ef5  012d095b-6cf9-4b06-b7de-50b6d72277de 
## highest: ff7964aa-a0c5-49c3-99dc-b39df7d021f2  ff8455e0-44e3-470d-8e86-0893cc101bc0  ff962ad5-22e6-42e4-b35b-3e6cc03ea022  ff9d4b2f-08b0-4a74-8c24-af45b0472a18  ffd1beee-09b8-4203-836b-2bd08ac472f5  
## ---------------------------------------------------------------------------
## poms_day 
##        n  missing distinct     Info     Mean      Gmd 
##      632      951        2     0.44    7.179   0.2941 
## 
## 7 (519, 0.821), 8 (113, 0.179)
## ---------------------------------------------------------------------------
## pulm 
##        n  missing distinct 
##     1583        0        2 
## 
## FALSE (1509, 0.953), TRUE (74, 0.047)
## ---------------------------------------------------------------------------
## inf 
##        n  missing distinct 
##     1583        0        2 
## 
## FALSE (1455, 0.919), TRUE (128, 0.081)
## ---------------------------------------------------------------------------
## renal 
##        n  missing distinct 
##     1583        0        2 
## 
## FALSE (1387, 0.876), TRUE (196, 0.124)
## ---------------------------------------------------------------------------
## gastro 
##        n  missing distinct 
##     1583        0        2 
## 
## FALSE (1361, 0.86), TRUE (222, 0.14)
## ---------------------------------------------------------------------------
## cardio 
##        n  missing distinct 
##     1583        0        2 
## 
## FALSE (1547, 0.977), TRUE (36, 0.023)
## ---------------------------------------------------------------------------
## neuro 
##        n  missing distinct 
##     1583        0        2 
## 
## FALSE (1562, 0.987), TRUE (21, 0.013)
## ---------------------------------------------------------------------------
## wound 
##        n  missing distinct 
##     1583        0        2 
## 
## FALSE (1502, 0.949), TRUE (81, 0.051)
## ---------------------------------------------------------------------------
## haem 
##        n  missing distinct 
##     1583        0        2 
## 
## FALSE (1572, 0.993), TRUE (11, 0.007)
## ---------------------------------------------------------------------------
## pain 
##        n  missing distinct 
##     1583        0        2 
## 
## FALSE (1499, 0.947), TRUE (84, 0.053)
## ---------------------------------------------------------------------------
## poms 
##        n  missing distinct 
##     1583        0        2 
## 
## FALSE (1180, 0.745), TRUE (403, 0.255)
## ---------------------------------------------------------------------------
## poms_lo 
##        n  missing distinct 
##     1583        0        2 
## 
## FALSE (1262, 0.797), TRUE (321, 0.203)
## ---------------------------------------------------------------------------
## poms_hi 
##        n  missing distinct 
##     1583        0        2 
## 
## FALSE (1344, 0.849), TRUE (239, 0.151)
## ---------------------------------------------------------------------------
## sex 
##        n  missing distinct 
##     1580        3        2 
## 
## Female (917, 0.58), Male (663, 0.42)
## ---------------------------------------------------------------------------
## age 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1583        0     1516        1    58.99    18.63    28.75    34.79 
##      .25      .50      .75      .90      .95 
##    47.63    61.64    71.23    79.10    82.70 
## 
## lowest : 17.01027 17.24572 18.54620 18.59822 18.74880
## highest: 90.96235 91.05270 91.26626 92.24914 95.44695 
## ---------------------------------------------------------------------------
## asa 
##        n  missing distinct     Info     Mean      Gmd 
##     1583        0        5    0.772    2.078     0.67 
## 
## lowest : 1 2 3 4 5, highest: 1 2 3 4 5 
## 
## 1 (269, 0.170), 2 (945, 0.597), 3 (346, 0.219), 4 (22, 0.014), 5 (1,
## 0.001)
## ---------------------------------------------------------------------------
## opspecialty 
##        n  missing distinct     Info     Mean      Gmd 
##     1583        0        6    0.877    2.102    1.401 
## 
## lowest : 1 2 3 4 5, highest: 2 3 4 5 6 
## 
## 1 (713, 0.450), 2 (493, 0.311), 3 (102, 0.064), 4 (103, 0.065), 5 (122,
## 0.077), 6 (50, 0.032)
## ---------------------------------------------------------------------------
## op_severity 
##        n  missing distinct     Info     Mean      Gmd 
##     1583        0        4    0.521    6.838    1.884 
## 
## 1 (97, 0.061), 2 (82, 0.052), 4 (167, 0.105), 8 (1237, 0.781)
## ---------------------------------------------------------------------------
## ncepod 
##        n  missing distinct 
##     1580        3        2 
## 
## E (1579, 0.999), U (1, 0.001)
## ---------------------------------------------------------------------------
## malignancy 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1583        0        2    0.146       81  0.05117  0.09716 
## ---------------------------------------------------------------------------
## smoking 
##        n  missing distinct     Info     Mean      Gmd 
##     1581        2        3    0.807   0.6192   0.7763 
## 
## 0 (869, 0.550), 1 (445, 0.281), 2 (267, 0.169)
## ---------------------------------------------------------------------------
## alcohol 
##        n  missing distinct     Info     Mean      Gmd 
##     1579        4        3    0.762   0.9854        1 
## 
## 0 (788, 0.499), 1 (26, 0.016), 2 (765, 0.484)
## ---------------------------------------------------------------------------
## ecgposs 
##        n  missing distinct     Info     Mean      Gmd 
##     1576        7        3    0.143    1.254   0.4887 
## 
## 1 (1497, 0.950), 4 (38, 0.024), 8 (41, 0.026)
## ---------------------------------------------------------------------------
## micharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1368      215        2    0.138       66  0.04825   0.0919 
## ---------------------------------------------------------------------------
## ccfcharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1367      216        2    0.035       16   0.0117  0.02315 
## ---------------------------------------------------------------------------
## pvdcharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1368      215        2    0.161       78  0.05702   0.1076 
## ---------------------------------------------------------------------------
## cvdcharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1368      215        2    0.144       69  0.05044  0.09586 
## ---------------------------------------------------------------------------
## dementiacharls 
##        n  missing distinct     Info     Mean      Gmd 
##     1368      215        1        0        0        0 
## 
## 0 (1368, 1)
## ---------------------------------------------------------------------------
## copdcharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1368      215        2    0.333      174   0.1272   0.2222 
## ---------------------------------------------------------------------------
## ctdcharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1368      215        2    0.062       29   0.0212  0.04153 
## ---------------------------------------------------------------------------
## pudcharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1368      215        2    0.069       32  0.02339  0.04572 
## ---------------------------------------------------------------------------
## hepdxcharls 
##        n  missing distinct     Info     Mean      Gmd 
##     1368      215        3    0.054  0.03143  0.06207 
## 
## 0 (1343, 0.982), 1 (16, 0.012), 3 (9, 0.007)
## ---------------------------------------------------------------------------
## dmcharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1367      216        2    0.308      159   0.1163   0.2057 
## ---------------------------------------------------------------------------
## hemiplegiacharls 
##        n  missing distinct     Info     Mean      Gmd 
##     1367      216        2    0.011 0.007315  0.01459 
## 
## 0 (1362, 0.996), 2 (5, 0.004)
## ---------------------------------------------------------------------------
## renalcharls 
##        n  missing distinct     Info     Mean      Gmd 
##     1368      215        2    0.026  0.01754  0.03481 
## 
## 0 (1356, 0.991), 2 (12, 0.009)
## ---------------------------------------------------------------------------
## dmendorgancharls 
##        n  missing distinct     Info     Mean      Gmd 
##     1368      215        1        0        0        0 
## 
## 0 (1368, 1)
## ---------------------------------------------------------------------------
## tumourcharls 
##        n  missing distinct     Info     Mean      Gmd 
##     1174      409        3    0.157   0.1789   0.3435 
## 
## 0 (1109, 0.945), 2 (45, 0.038), 6 (20, 0.017)
## ---------------------------------------------------------------------------
## aidscharls 
##        n  missing distinct     Info     Mean      Gmd 
##     1368      215        2    0.004 0.008772  0.01753 
## 
## 0 (1366, 0.999), 6 (2, 0.001)
## ---------------------------------------------------------------------------
## charlsoncomorb 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1368      215       11    0.753    0.674   0.9978        0        0 
##      .25      .50      .75      .90      .95 
##        0        0        1        2        3 
## 
## lowest :  0  1  2  3  4, highest:  6  7  8  9 10 
##                                                                       
## Value          0     1     2     3     4     5     6     7     8     9
## Frequency    843   315   136    34    12     4    12     7     2     2
## Proportion 0.616 0.230 0.099 0.025 0.009 0.003 0.009 0.005 0.001 0.001
##                 
## Value         10
## Frequency      1
## Proportion 0.001
## ---------------------------------------------------------------------------
## charlsonage 
##        n  missing distinct     Info     Mean      Gmd 
##     1368      215        6    0.947    1.662    1.477 
## 
## lowest : 0 1 2 3 4, highest: 1 2 3 4 5 
## 
## 0 (378, 0.276), 1 (232, 0.170), 2 (357, 0.261), 3 (283, 0.207), 4 (111,
## 0.081), 5 (7, 0.005)
## ---------------------------------------------------------------------------
## charlsontotal 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1368      215       14    0.969    2.336    2.068        0        0 
##      .25      .50      .75      .90      .95 
##        1        2        3        5        5 
## 
## lowest :  0  1  2  3  4, highest:  9 10 11 12 13 
##                                                                       
## Value          0     1     2     3     4     5     6     7     8     9
## Frequency    278   210   300   253   168    91    27    19     7     7
## Proportion 0.203 0.154 0.219 0.185 0.123 0.067 0.020 0.014 0.005 0.005
##                                   
## Value         10    11    12    13
## Frequency      5     1     1     1
## Proportion 0.004 0.001 0.001 0.001
## ---------------------------------------------------------------------------
## status_at_dc 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1580        3        2    0.017     1571   0.9943  0.01133 
## ---------------------------------------------------------------------------
## mortip 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1583        0        2    0.023       12 0.007581  0.01506 
## ---------------------------------------------------------------------------
## mort30 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1583        0        2    0.006        3 0.001895 0.003785 
## ---------------------------------------------------------------------------
## los 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1583        0       78     0.99    10.52    11.15      2.0      2.0 
##      .25      .50      .75      .90      .95 
##      4.0      6.0     11.0     19.0     33.9 
## 
## lowest :   0   1   2   3   4, highest: 158 165 167 185 201 
## ---------------------------------------------------------------------------
## Followup_Readmission_30Days 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1503       80        2    0.059       30  0.01996  0.03915 
## ---------------------------------------------------------------------------
## risk_score 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1583        0       44    0.985     1.43    1.024    0.381    0.381 
##      .25      .50      .75      .90      .95 
##    0.712    1.158    1.972    2.684    3.380 
## 
## lowest : 0.000 0.381 0.712 0.777 1.048
## highest: 4.360 4.381 4.762 5.072 5.951 
## ---------------------------------------------------------------------------
## sort_mort 
##         n   missing  distinct      Info      Mean       Gmd       .05 
##      1583         0        44     0.985  0.004785  0.005607 0.0009248 
##       .10       .25       .50       .75       .90       .95 
## 0.0009248 0.0012872 0.0020092 0.0045232 0.0091755 0.0182324 
## 
## lowest : 0.0006319930 0.0009248072 0.0012871973 0.0013735255 0.0018003001
## highest: 0.0471555481 0.0481081419 0.0688814304 0.0916210990 0.1954466227 
## ---------------------------------------------------------------------------
## morb_poss 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1583        0      147    0.997     0.32   0.1839  0.08549  0.11712 
##      .25      .50      .75      .90      .95 
##  0.20424  0.29318  0.40131  0.57200  0.67261 
## 
## lowest : 0.05468132 0.06356602 0.06537533 0.07378165 0.07585818
## highest: 0.88079708 0.90114393 0.93819653 0.96373628 0.96544377 
## ---------------------------------------------------------------------------
## p_possmort 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1583        0      147    0.997  0.02006  0.01938 0.003643 0.004961 
##      .25      .50      .75      .90      .95 
## 0.007875 0.012833 0.021140 0.040762 0.065940 
## 
## lowest : 0.002227601 0.002600105 0.002637192 0.003077977 0.003121859
## highest: 0.198356194 0.214636630 0.288393341 0.447148173 0.514096264 
## ---------------------------------------------------------------------------
## srs_mort 
##        n  missing distinct     Info     Mean      Gmd 
##     1583        0        8    0.852  0.02048  0.01469 
## 
## lowest : 0.0006818628 0.0015780281 0.0036477148 0.0084090681 0.0192652326
## highest: 0.0084090681 0.0192652326 0.0435216350 0.0953494649 0.1962340564 
##                                                               
## Value      0.0006818628 0.0015780281 0.0036477148 0.0084090681
## Frequency            38           60          103          274
## Proportion        0.024        0.038        0.065        0.173
##                                                               
## Value      0.0192652326 0.0435216350 0.0953494649 0.1962340564
## Frequency           816          271           20            1
## Proportion        0.515        0.171        0.013        0.001
## ---------------------------------------------------------------------------

#Let's plot POMS vs the individual predictors add smoothing lines to get a general feel of the data

scatter.smooth(join7$poms ~ join7$age, family = "gaussian", 
               lpars = list(col = "red"))

scatter.smooth(join7$poms ~ join7$asa, family = "gaussian", 
               lpars = list(col = "red"))

scatter.smooth(join7$poms ~ join7$opspecialty, family = "gaussian", 
               lpars = list(col = "red"))

scatter.smooth(join7$poms ~ join7$malignancy, family = "gaussian", 
               lpars = list(col = "red"))

#Might be easier to understand some of the plots with the points jittered
scatter.smooth(jitter(as.numeric(join7$poms)) ~ jitter(join7$asa), 
               family = "gaussian", lpars = list(col = "red"))

scatter.smooth(jitter(as.numeric(join7$poms)) ~ jitter(join7$opspecialty), 
               family = "gaussian", lpars = list(col = "red"))

scatter.smooth(jitter(as.numeric(join7$poms)) ~ jitter(join7$malignancy), 
               family = "gaussian", lpars = list(col = "red"))

Regression Models 1 & 2

We will fit some Regression Models. Firstly we will look at the model with age as a continuous variable. In the SORT tool, the risk score binned age into a categorical variable.

#Fit a logistic regression model with the same predictor variables as in SORT, but with any POMS outcome the dependent variable
lr_poms <- glm(poms ~ age + 
                 as.factor(asa) + 
                 as.factor(opspecialty) + 
                 as.factor(op_severity) + 
                 as.factor(malignancy), 
               data = join7, family = binomial)

summary(lr_poms)

## 
## Call:
## glm(formula = poms ~ age + as.factor(asa) + as.factor(opspecialty) + 
##     as.factor(op_severity) + as.factor(malignancy), family = binomial, 
##     data = join7)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9548  -0.6554  -0.5603   0.6924   2.9745  
## 
## Coefficients:
##                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              -4.983043   0.555480  -8.971  < 2e-16 ***
## age                       0.005609   0.004520   1.241 0.214611    
## as.factor(asa)2           0.277849   0.191197   1.453 0.146167    
## as.factor(asa)3           1.074435   0.218332   4.921 8.61e-07 ***
## as.factor(asa)4           1.427331   0.504454   2.829 0.004663 ** 
## as.factor(asa)5          12.654213 324.743789   0.039 0.968917    
## as.factor(opspecialty)2   1.641679   0.186901   8.784  < 2e-16 ***
## as.factor(opspecialty)3  -0.568368   0.322841  -1.761 0.078320 .  
## as.factor(opspecialty)4   0.180380   0.258682   0.697 0.485612    
## as.factor(opspecialty)5  -1.612067   0.442105  -3.646 0.000266 ***
## as.factor(opspecialty)6  -0.270881   0.463594  -0.584 0.559013    
## as.factor(op_severity)2   1.298928   0.514030   2.527 0.011506 *  
## as.factor(op_severity)4   2.390115   0.459783   5.198 2.01e-07 ***
## as.factor(op_severity)8   2.836190   0.451474   6.282 3.34e-10 ***
## as.factor(malignancy)1    0.750509   0.281624   2.665 0.007700 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1796.1  on 1582  degrees of freedom
## Residual deviance: 1532.9  on 1568  degrees of freedom
## AIC: 1562.9
## 
## Number of Fisher Scoring iterations: 11

#Fit another logistic regression model, but this time with the predictor variables in SORT as the discrete variables, eg specific ASA-PS == 3, High risk specialty TRUE, and so on
lr_poms2 <- glm(poms ~ (asa == 3) + 
                  (asa == 4) + 
                  (asa == 5) + 
                  (opspecialty >= 2 & opspecialty < 6) + 
                  (op_severity > 7) + 
                  malignancy + 
                  (age >= 65 & age <80) + 
                  (age >= 80), 
                data = join7, family = binomial)

summary(lr_poms2)

## 
## Call:
## glm(formula = poms ~ (asa == 3) + (asa == 4) + (asa == 5) + (opspecialty >= 
##     2 & opspecialty < 6) + (op_severity > 7) + malignancy + (age >= 
##     65 & age < 80) + (age >= 80), family = binomial, data = join7)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5899  -0.7378  -0.6327   0.8149   2.0457  
## 
## Coefficients:
##                                        Estimate Std. Error z value
## (Intercept)                             -1.9608     0.1948 -10.066
## asa == 3TRUE                             0.6488     0.1371   4.734
## asa == 4TRUE                             0.8416     0.4467   1.884
## asa == 5TRUE                            13.5054   324.7437   0.042
## opspecialty >= 2 & opspecialty < 6TRUE   0.5676     0.1419   4.001
## op_severity > 7TRUE                      0.2312     0.1581   1.462
## malignancy                               1.2223     0.2439   5.011
## age >= 65 & age < 80TRUE                 0.2227     0.1387   1.606
## age >= 80TRUE                            0.6880     0.2050   3.355
##                                        Pr(>|z|)    
## (Intercept)                             < 2e-16 ***
## asa == 3TRUE                           2.21e-06 ***
## asa == 4TRUE                           0.059575 .  
## asa == 5TRUE                           0.966827    
## opspecialty >= 2 & opspecialty < 6TRUE 6.31e-05 ***
## op_severity > 7TRUE                    0.143653    
## malignancy                             5.40e-07 ***
## age >= 65 & age < 80TRUE               0.108381    
## age >= 80TRUE                          0.000793 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1796.1  on 1582  degrees of freedom
## Residual deviance: 1698.1  on 1574  degrees of freedom
## AIC: 1716.1
## 
## Number of Fisher Scoring iterations: 11

Regression Model 3

Theoretically we can do a Logistic Regression with POMS as the outcome variable, however all the cases in the SOuRCe dataset are Elective cases, so we lose one of the dependent variables from the original SORT regression equation.

Also, we need to have a way of testing how well our model performs, so we need to refit the model with a training dataset and then test it against a testing dataset. In the Protopapa et al paper describing the method for developing SORT, the dataset was split into approx. 2/3 for training and 1/3 for testing.

Whilst we want to stay as close as possible to the original SORT predictor variables, we need to adjust our regression equation to account for the fact that we only have 1 level of Operative urgency (this is a purely elective cohort), furthermore there are insufficient numbers in some of the categories for there to be useful information to add to the model adjustments, e.g. there is only 1 patient ASA == 5.

#Split the data set into training and testing
library(caret)
#join7$poms <- as.integer(join7$poms) %>% as.factor()
set.seed(123)
x <- createDataPartition(join7$poms, p = (2/3), list = FALSE, times = 1)
training <- join7[x, ]
testing <- join7[-x, ]

nrow(training)

## [1] 1056

nrow(testing)

## [1] 527

#Fit a model with the training dataset
lr_poms3 <- glm(poms ~ (asa == 3) + 
                  (asa >= 4) + 
                  (opspecialty >= 2 & opspecialty < 6) + 
                  (op_severity > 7) + 
                  malignancy + 
                  (age >= 65 & age <80) + 
                  (age >= 80), 
                data = training, 
                family = binomial)

summary(lr_poms3)

## 
## Call:
## glm(formula = poms ~ (asa == 3) + (asa >= 4) + (opspecialty >= 
##     2 & opspecialty < 6) + (op_severity > 7) + malignancy + (age >= 
##     65 & age < 80) + (age >= 80), family = binomial, data = training)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6413  -0.7476  -0.6329   0.7761   1.9626  
## 
## Coefficients:
##                                        Estimate Std. Error z value
## (Intercept)                             -2.1426     0.2437  -8.794
## asa == 3TRUE                             0.6597     0.1673   3.943
## asa >= 4TRUE                             0.4849     0.6722   0.721
## opspecialty >= 2 & opspecialty < 6TRUE   0.6364     0.1729   3.681
## op_severity > 7TRUE                      0.3742     0.1960   1.909
## malignancy                               1.2545     0.2913   4.307
## age >= 65 & age < 80TRUE                 0.2635     0.1718   1.534
## age >= 80TRUE                            0.7390     0.2451   3.015
##                                        Pr(>|z|)    
## (Intercept)                             < 2e-16 ***
## asa == 3TRUE                           8.04e-05 ***
## asa >= 4TRUE                           0.470641    
## opspecialty >= 2 & opspecialty < 6TRUE 0.000232 ***
## op_severity > 7TRUE                    0.056290 .  
## malignancy                             1.65e-05 ***
## age >= 65 & age < 80TRUE               0.125131    
## age >= 80TRUE                          0.002572 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1198.5  on 1055  degrees of freedom
## Residual deviance: 1126.3  on 1048  degrees of freedom
## AIC: 1142.3
## 
## Number of Fisher Scoring iterations: 4

#I've had to remove NA values from the testing dataset in order for the Hosmer-Lemeshow test to run
testing <- filter(testing, !is.na(poms))

#Test the model with the testing dataset
library(PredictABEL)

(HL_lrpoms3 <- plotCalibration(data = testing, 
                cOutcome = match("poms", colnames(testing)), 
                predRisk = predict(lr_poms3, newdata = testing, type = "response")))

## $Table_HLtest
##               total meanpred meanobs predicted observed
## [0.105,0.181)    96    0.143   0.125     13.73       12
## 0.181            68    0.181   0.265     12.34       18
## 0.182            92    0.182   0.185     16.72       17
## [0.217,0.228)    17    0.224   0.235      3.80        4
## [0.228,0.248)    78    0.244   0.282     19.00       22
## [0.248,0.265)    34    0.257   0.265      8.74        9
## [0.265,0.344)    53    0.298   0.264     15.79       14
## [0.344,0.405)    41    0.378   0.268     15.50       11
## [0.405,0.740]    48    0.506   0.562     24.29       27
## 
## $Chi_square
## [1] 7.089
## 
## $df
## [1] 8
## 
## $p_value
## [1] 0.527

#Calculate AUROC and plot the ROC curve
plotROC(data = testing,
        cOutcome = match("poms", colnames(testing)),
        predrisk = predict(lr_poms3, newdata = testing, type = "response"))

## AUC [95% CI] for the model 1 :  0.628 [ 0.573  -  0.683 ]

#Let's overlay the ROC curve over the ROC curve for POSSUM-morbidity
plotROC(data = testing, 
        cOutcome = match("poms", colnames(testing)), 
        predrisk = cbind(predict(lr_poms3, newdata = testing, type = "response"),
                         testing$morb_poss))

## AUC [95% CI] for the model 1 :  0.628 [ 0.573  -  0.683 ] 
## AUC [95% CI] for the model 2 :  0.656 [ 0.601  -  0.712 ]

#Let's compare whether the new model is as good as POSSUM in discrimination
library(pROC)

#Create the ROC objects
x <- predict(lr_poms3, newdata = testing, type = "response") #Temp variable
roc_model3 <- roc(testing$poms ~ x, plot=FALSE, auc=TRUE, ci=TRUE)
roc_POSSUM <- roc(testing$poms ~ testing$morb_poss, plot=FALSE, auc=TRUE, ci=TRUE)

#Test the ROC objects against each other
roc.test(roc_model3, roc_POSSUM)

## 
##  DeLong's test for two correlated ROC curves
## 
## data:  roc_model3 and roc_POSSUM
## Z = -0.97032, p-value = 0.3319
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2 
##   0.6276917   0.6562322

#library(rms) #This package contains many functions for regression modelling
#
#val.prob(p = predict(lr_poms3, newdata = testing, type = "response"), y = testing$poms, smooth = TRUE)

The AUROC for the Model 3 was 0.628 which implies the model doesn’t perform very well.

Regression Model 4

We can attempt another model by calculating the SORT risk score as a single regression variable.

#We will use the same training and testing partitions as the previous model but this time create a new column called risk score
set.seed(123)
x <- createDataPartition(join7$poms, p = (2/3), list = FALSE, times = 1)
training <- join7[x, ]
testing <- join7[-x, ]

#Fit a model with the training dataset
lr_poms4 <- glm(poms ~ risk_score, 
                data = training, 
                family = binomial)

summary(lr_poms4)

## 
## Call:
## glm(formula = poms ~ risk_score, family = binomial, data = training)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5186  -0.7025  -0.6327   1.0336   1.9240  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.87920    0.13870  -13.55  < 2e-16 ***
## risk_score   0.52308    0.07254    7.21 5.58e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1198.5  on 1055  degrees of freedom
## Residual deviance: 1145.3  on 1054  degrees of freedom
## AIC: 1149.3
## 
## Number of Fisher Scoring iterations: 4

#I've had to remove NA values from the testing dataset in order for the Hosmer-Lemeshow test to run
testing <- filter(testing, !is.na(poms))

#Test the model with the testing dataset
plotCalibration(data = testing, 
                cOutcome = match("poms", colnames(testing)), 
                predRisk = predict(lr_poms4, newdata = testing, type = "response"))

## $Table_HLtest
##               total meanpred meanobs predicted observed
## [0.132,0.181)    95    0.156   0.126     14.78       12
## 0.181            68    0.181   0.265     12.34       18
## [0.187,0.219)    78    0.213   0.269     16.58       21
## 0.219            92    0.219   0.185     20.12       17
## [0.239,0.284)    39    0.265   0.333     10.34       13
## [0.284,0.361)    50    0.302   0.200     15.11       10
## [0.361,0.383)    54    0.365   0.278     19.74       15
## [0.383,0.774]    51    0.477   0.549     24.35       28
## 
## $Chi_square
## [1] 12.16
## 
## $df
## [1] 8
## 
## $p_value
## [1] 0.1442

#Calculate AUROC and plot the ROC curve
plotROC(data = testing, 
        cOutcome = match("poms", colnames(testing)), 
        predrisk = predict(lr_poms4, newdata = testing, type = "response"))

## AUC [95% CI] for the model 1 :  0.616 [ 0.56  -  0.671 ]

#Let's compare whether the new model is as good as POSSUM in discrimination
library(pROC)

#Create the ROC objects
x <- predict(lr_poms4, newdata = testing, type = "response") #Temp variable
roc_model <- roc(testing$poms ~ x, plot=FALSE, auc=TRUE, ci=TRUE)
roc_POSSUM <- roc(testing$poms ~ testing$morb_poss, plot=FALSE, auc=TRUE, ci=TRUE)

#Test the ROC objects against each other
roc.test(roc_model, roc_POSSUM)

## 
##  DeLong's test for two correlated ROC curves
## 
## data:  roc_model and roc_POSSUM
## Z = -1.4241, p-value = 0.1544
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2 
##   0.6158995   0.6562322

The AUC doesn’t seem much better for this 4th model

Regression Model 5 - An alternative outcome measure

We have thus far been looking at POMS as an outcome measure and seeing how well the SORT variables predict morbidity as defined by that. What if we pick an alternative measure of morbidity. Readmission rates could be one such example

## 
## Call:
## glm(formula = Followup_Readmission_30Days ~ (asa == 3) + (asa >= 
##     4) + (opspecialty >= 2 & opspecialty < 6) + (op_severity > 
##     7) + malignancy + (age >= 65 & age < 80) + (age >= 80), family = binomial, 
##     data = training)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.3686  -0.2432  -0.1859  -0.1401   3.0424  
## 
## Coefficients:
##                                         Estimate Std. Error z value
## (Intercept)                              -3.1125     0.6908  -4.506
## asa == 3TRUE                              0.4574     0.4970   0.920
## asa >= 4TRUE                            -14.1722  2014.9983  -0.007
## opspecialty >= 2 & opspecialty < 6TRUE   -1.1125     0.5705  -1.950
## op_severity > 7TRUE                      -0.3933     0.6485  -0.607
## malignancy                              -14.2487   868.7154  -0.016
## age >= 65 & age < 80TRUE                  0.1781     0.4608   0.386
## age >= 80TRUE                            -1.0008     1.0673  -0.938
##                                        Pr(>|z|)    
## (Intercept)                            6.62e-06 ***
## asa == 3TRUE                             0.3574    
## asa >= 4TRUE                             0.9944    
## opspecialty >= 2 & opspecialty < 6TRUE   0.0512 .  
## op_severity > 7TRUE                      0.5442    
## malignancy                               0.9869    
## age >= 65 & age < 80TRUE                 0.6992    
## age >= 80TRUE                            0.3484    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 211.89  on 1009  degrees of freedom
## Residual deviance: 202.45  on 1002  degrees of freedom
##   (46 observations deleted due to missingness)
## AIC: 218.45
## 
## Number of Fisher Scoring iterations: 17

## $Table_HLtest
##                   total meanpred meanobs predicted observed
## [2.35e-09,0.0109)   112    0.007   0.018      0.73        2
## [1.09e-02,0.0154)    92    0.013   0.022      1.23        2
## [1.54e-02,0.0226)    52    0.017   0.038      0.87        2
## [2.26e-02,0.0346)   104    0.028   0.010      2.94        1
## 3.46e-02             89    0.035   0.000      3.08        0
## [4.26e-02,0.0775]    44    0.050   0.023      2.21        1
## 
## $Chi_square
## [1] 9.23
## 
## $df
## [1] 8
## 
## $p_value
## [1] 0.3233

## AUC [95% CI] for the model 1 :  0.42 [ 0.222  -  0.619 ]

## AUC [95% CI] for the model 1 :  0.42 [ 0.222  -  0.619 ] 
## AUC [95% CI] for the model 2 :  0.666 [ 0.492  -  0.841 ]

## 
##  Bootstrap test for two correlated ROC curves
## 
## data:  roc_model and roc_POSSUM
## D = -2.1118, boot.n = 2000, boot.stratified = 1, p-value = 0.0347
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2 
##   0.5737062   0.6604708

It may be that POMS at Day 7 or 8 may be too sensitive an outcome measure, as suggested by Ramani. We may therefore need to look at Day 14 or 15 POMS instead.

Day 14 or 15 POMS

We will repeat what we did for Regression Model 3 above, but this time with Day 14 or 15 POMS as an outcome measure. This time I will not print the code that went into processing the data, to save space.

## join14 
## 
##  53  Variables      1753  Observations
## ---------------------------------------------------------------------------
## poms_id 
##        n  missing distinct 
##      215     1538      215 
## 
## lowest : 004178a3-755b-4653-9d06-f12cbcf77459 00709167-8651-4c95-be07-e6fe29734d26 00b165e3-c415-465c-97d6-2ee6f144d104 00eafe16-478e-4b49-8914-6ab567d52bad 0105ed9c-b450-4b7f-8698-c9568e9dcc02
## highest: fc934cb1-42ca-40f6-8a0b-cbf501eae1c4 fd41f555-f83f-4645-b7a7-471f368e7154 fd8a2ce9-ed7a-413b-861d-fdb87255b645 ff761e92-fa68-4bba-abc9-1eb0bf713146 ffe4409b-8bb1-4919-b4c4-a86a7d9b2411 
## ---------------------------------------------------------------------------
## adm_id 
##        n  missing distinct 
##     1753        0     1753 
## 
## lowest : 00422cff-ddbe-49e6-a2d9-f4680c0945c2  00bea38f-af7b-4f5f-9655-1ab134c95571  00cea437-fd5a-45d4-859d-ddd7f48f3199  01125a5f-814c-4120-a298-ae6f27d43ef5  012d095b-6cf9-4b06-b7de-50b6d72277de 
## highest: ff8455e0-44e3-470d-8e86-0893cc101bc0  ff962ad5-22e6-42e4-b35b-3e6cc03ea022  ff9d4b2f-08b0-4a74-8c24-af45b0472a18  ffb0f43c-c710-43b2-8f7b-620d68522a57  ffd1beee-09b8-4203-836b-2bd08ac472f5  
## ---------------------------------------------------------------------------
## poms_day 
##        n  missing distinct     Info     Mean      Gmd 
##      215     1538        2    0.505    14.21   0.3379 
## 
## 14 (169, 0.786), 15 (46, 0.214)
## ---------------------------------------------------------------------------
## pulm 
##        n  missing distinct 
##     1753        0        2 
## 
## FALSE (1724, 0.983), TRUE (29, 0.017)
## ---------------------------------------------------------------------------
## inf 
##        n  missing distinct 
##     1753        0        2 
## 
## FALSE (1695, 0.967), TRUE (58, 0.033)
## ---------------------------------------------------------------------------
## renal 
##        n  missing distinct 
##     1753        0        2 
## 
## FALSE (1687, 0.962), TRUE (66, 0.038)
## ---------------------------------------------------------------------------
## gastro 
##        n  missing distinct 
##     1753        0        2 
## 
## FALSE (1669, 0.952), TRUE (84, 0.048)
## ---------------------------------------------------------------------------
## cardio 
##        n  missing distinct 
##     1753        0        2 
## 
## FALSE (1738, 0.991), TRUE (15, 0.009)
## ---------------------------------------------------------------------------
## neuro 
##        n  missing distinct 
##     1753        0        2 
## 
## FALSE (1746, 0.996), TRUE (7, 0.004)
## ---------------------------------------------------------------------------
## wound 
##        n  missing distinct 
##     1753        0        2 
## 
## FALSE (1698, 0.969), TRUE (55, 0.031)
## ---------------------------------------------------------------------------
## haem 
##        n  missing distinct 
##     1753        0        2 
## 
## FALSE (1749, 0.998), TRUE (4, 0.002)
## ---------------------------------------------------------------------------
## pain 
##        n  missing distinct 
##     1753        0        2 
## 
## FALSE (1720, 0.981), TRUE (33, 0.019)
## ---------------------------------------------------------------------------
## poms 
##        n  missing distinct 
##     1753        0        2 
## 
## FALSE (1598, 0.912), TRUE (155, 0.088)
## ---------------------------------------------------------------------------
## poms_lo 
##        n  missing distinct 
##      215     1538        2 
## 
## FALSE (99, 0.46), TRUE (116, 0.54)
## ---------------------------------------------------------------------------
## poms_hi 
##        n  missing distinct 
##      215     1538        2 
## 
## FALSE (101, 0.47), TRUE (114, 0.53)
## ---------------------------------------------------------------------------
## sex 
##        n  missing distinct 
##     1751        2        2 
## 
## Female (1030, 0.588), Male (721, 0.412)
## ---------------------------------------------------------------------------
## age 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1753        0     1670        1    59.32     18.6    28.43    34.70 
##      .25      .50      .75      .90      .95 
##    48.23    62.47    71.70    79.10    82.47 
## 
## lowest : 17.01027 17.24572 17.34702 18.54620 18.59822
## highest: 90.96235 91.05270 91.26626 92.24914 95.44695 
## ---------------------------------------------------------------------------
## asa 
##        n  missing distinct     Info     Mean      Gmd 
##     1753        0        5    0.767     2.08   0.6632 
## 
## lowest : 1 2 3 4 5, highest: 1 2 3 4 5 
## 
## 1 (292, 0.167), 2 (1055, 0.602), 3 (381, 0.217), 4 (24, 0.014), 5 (1,
## 0.001)
## ---------------------------------------------------------------------------
## opspecialty 
##        n  missing distinct     Info     Mean      Gmd 
##     1753        0        6    0.861     2.03    1.351 
## 
## lowest : 1 2 3 4 5, highest: 2 3 4 5 6 
## 
## 1 (843, 0.481), 2 (523, 0.298), 3 (102, 0.058), 4 (111, 0.063), 5 (124,
## 0.071), 6 (50, 0.029)
## ---------------------------------------------------------------------------
## op_severity 
##        n  missing distinct     Info     Mean      Gmd 
##     1753        0        4    0.509    6.871    1.844 
## 
## 1 (102, 0.058), 2 (95, 0.054), 4 (174, 0.099), 8 (1382, 0.788)
## ---------------------------------------------------------------------------
## ncepod 
##        n  missing distinct 
##     1749        4        2 
## 
## E (1748, 0.999), U (1, 0.001)
## ---------------------------------------------------------------------------
## malignancy 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1753        0        2    0.151       93  0.05305   0.1005 
## ---------------------------------------------------------------------------
## smoking 
##        n  missing distinct     Info     Mean      Gmd 
##     1751        2        3    0.812   0.6311   0.7844 
## 
## 0 (951, 0.543), 1 (495, 0.283), 2 (305, 0.174)
## ---------------------------------------------------------------------------
## alcohol 
##        n  missing distinct     Info     Mean      Gmd 
##     1748        5        3    0.762   0.9874        1 
## 
## 0 (871, 0.498), 1 (28, 0.016), 2 (849, 0.486)
## ---------------------------------------------------------------------------
## ecgposs 
##        n  missing distinct     Info     Mean      Gmd 
##     1746        7        3    0.148    1.253   0.4845 
## 
## 1 (1655, 0.948), 4 (49, 0.028), 8 (42, 0.024)
## ---------------------------------------------------------------------------
## micharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1492      261        2    0.132       69  0.04625  0.08827 
## ---------------------------------------------------------------------------
## ccfcharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1491      262        2    0.036       18  0.01207  0.02387 
## ---------------------------------------------------------------------------
## pvdcharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1492      261        2    0.163       86  0.05764   0.1087 
## ---------------------------------------------------------------------------
## cvdcharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1492      261        2    0.156       82  0.05496   0.1039 
## ---------------------------------------------------------------------------
## dementiacharls 
##        n  missing distinct     Info     Mean      Gmd 
##     1492      261        1        0        0        0 
## 
## 0 (1492, 1)
## ---------------------------------------------------------------------------
## copdcharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1492      261        2    0.332      189   0.1267   0.2214 
## ---------------------------------------------------------------------------
## ctdcharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1492      261        2    0.076       39  0.02614  0.05095 
## ---------------------------------------------------------------------------
## pudcharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1492      261        2    0.065       33  0.02212  0.04329 
## ---------------------------------------------------------------------------
## hepdxcharls 
##        n  missing distinct     Info     Mean      Gmd 
##     1492      261        3    0.053  0.03016  0.05956 
## 
## 0 (1465, 0.982), 1 (18, 0.012), 3 (9, 0.006)
## ---------------------------------------------------------------------------
## dmcharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1491      262        2    0.298      167    0.112   0.1991 
## ---------------------------------------------------------------------------
## hemiplegiacharls 
##        n  missing distinct     Info     Mean      Gmd 
##     1491      262        2    0.012 0.008048  0.01604 
## 
## 0 (1485, 0.996), 2 (6, 0.004)
## ---------------------------------------------------------------------------
## renalcharls 
##        n  missing distinct     Info     Mean      Gmd 
##     1492      261        2    0.024  0.01609  0.03193 
## 
## 0 (1480, 0.992), 2 (12, 0.008)
## ---------------------------------------------------------------------------
## dmendorgancharls 
##        n  missing distinct     Info     Mean      Gmd 
##     1492      261        2    0.002  0.00134 0.002681 
## 
## 0 (1491, 0.999), 2 (1, 0.001)
## ---------------------------------------------------------------------------
## tumourcharls 
##        n  missing distinct     Info     Mean      Gmd 
##     1291      462        3    0.156   0.1596    0.306 
## 
## 0 (1220, 0.945), 2 (55, 0.043), 6 (16, 0.012)
## ---------------------------------------------------------------------------
## aidscharls 
##        n  missing distinct     Info     Mean      Gmd 
##     1492      261        2    0.006  0.01206   0.0241 
## 
## 0 (1489, 0.998), 6 (3, 0.002)
## ---------------------------------------------------------------------------
## charlsoncomorb 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1492      261       11    0.754   0.6635   0.9762        0        0 
##      .25      .50      .75      .90      .95 
##        0        0        1        2        3 
## 
## lowest :  0  1  2  3  4, highest:  6  7  8  9 10 
##                                                                       
## Value          0     1     2     3     4     5     6     7     8     9
## Frequency    917   349   145    40    15     5    10     6     2     2
## Proportion 0.615 0.234 0.097 0.027 0.010 0.003 0.007 0.004 0.001 0.001
##                 
## Value         10
## Frequency      1
## Proportion 0.001
## ---------------------------------------------------------------------------
## charlsonage 
##        n  missing distinct     Info     Mean      Gmd 
##     1492      261        6    0.947    1.692    1.471 
## 
## lowest : 0 1 2 3 4, highest: 1 2 3 4 5 
## 
## 0 (399, 0.267), 1 (247, 0.166), 2 (396, 0.265), 3 (321, 0.215), 4 (123,
## 0.082), 5 (6, 0.004)
## ---------------------------------------------------------------------------
## charlsontotal 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1492      261       14    0.969    2.355    2.046        0        0 
##      .25      .50      .75      .90      .95 
##        1        2        3        5        5 
## 
## lowest :  0  1  2  3  4, highest:  9 10 11 12 13 
##                                                                       
## Value          0     1     2     3     4     5     6     7     8     9
## Frequency    291   229   330   280   193    95    30    22     8     6
## Proportion 0.195 0.153 0.221 0.188 0.129 0.064 0.020 0.015 0.005 0.004
##                                   
## Value         10    11    12    13
## Frequency      5     1     1     1
## Proportion 0.003 0.001 0.001 0.001
## ---------------------------------------------------------------------------
## status_at_dc 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1750        3        2    0.014     1742   0.9954 0.009106 
## ---------------------------------------------------------------------------
## mortip 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1753        0        2    0.019       11 0.006275  0.01248 
## ---------------------------------------------------------------------------
## mort30 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1753        0        2    0.003        2 0.001141 0.002281 
## ---------------------------------------------------------------------------
## los 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1753        0       77    0.992    9.876    9.902        2        3 
##      .25      .50      .75      .90      .95 
##        4        6        9       16       30 
## 
## lowest :   0   1   2   3   4, highest: 158 165 167 185 201 
## ---------------------------------------------------------------------------
## Followup_Readmission_30Days 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1673       80        2    0.058       33  0.01973   0.0387 
## ---------------------------------------------------------------------------
## risk_score 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1753        0       45    0.984    1.424    1.017    0.381    0.381 
##      .25      .50      .75      .90      .95 
##    0.712    1.158    1.972    2.675    3.309 
## 
## lowest : 0.000 0.381 0.712 0.777 1.048
## highest: 4.360 4.381 4.762 5.072 5.951 
## ---------------------------------------------------------------------------
## sort_mort 
##         n   missing  distinct      Info      Mean       Gmd       .05 
##      1753         0        45     0.984  0.004691  0.005438 0.0009248 
##       .10       .25       .50       .75       .90       .95 
## 0.0009248 0.0012872 0.0020092 0.0045232 0.0090955 0.0170162 
## 
## lowest : 0.0006319930 0.0009248072 0.0012871973 0.0013735255 0.0018003001
## highest: 0.0471555481 0.0481081419 0.0688814304 0.0916210990 0.1954466227 
## ---------------------------------------------------------------------------
## morb_poss 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1753        0      150    0.997   0.3202   0.1814  0.08786  0.12347 
##      .25      .50      .75      .90      .95 
##  0.20424  0.29318  0.40131  0.55971  0.67261 
## 
## lowest : 0.05468132 0.06356602 0.06537533 0.07378165 0.07585818
## highest: 0.88079708 0.90114393 0.93819653 0.96373628 0.96544377 
## ---------------------------------------------------------------------------
## p_possmort 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1753        0      150    0.997  0.01981  0.01878 0.003643 0.004961 
##      .25      .50      .75      .90      .95 
## 0.008215 0.012833 0.021140 0.039128 0.060426 
## 
## lowest : 0.002227601 0.002600105 0.002637192 0.003077977 0.003121859
## highest: 0.198356194 0.214636630 0.288393341 0.447148173 0.514096264 
## ---------------------------------------------------------------------------
## srs_mort 
##        n  missing distinct     Info     Mean      Gmd 
##     1753        0        8     0.85  0.02056  0.01459 
## 
## lowest : 0.0006818628 0.0015780281 0.0036477148 0.0084090681 0.0192652326
## highest: 0.0084090681 0.0192652326 0.0435216350 0.0953494649 0.1962340564 
##                                                               
## Value      0.0006818628 0.0015780281 0.0036477148 0.0084090681
## Frequency            41           65          110          301
## Proportion        0.023        0.037        0.063        0.172
##                                                               
## Value      0.0192652326 0.0435216350 0.0953494649 0.1962340564
## Frequency           909          305           21            1
## Proportion        0.519        0.174        0.012        0.001
## ---------------------------------------------------------------------------

Plots - Day 14 or 15 POMS Outcome

Regression Model 6

We now want to fit the model used in Regression Model 3, but to the Day 14 or 15 POMS outcomes.

## [1] 1056

## [1] 697

## 
## Call:
## glm(formula = poms ~ (asa == 3) + (asa >= 4) + (opspecialty >= 
##     2 & opspecialty < 6) + (op_severity > 7) + malignancy + (age >= 
##     65 & age < 80) + (age >= 80), family = binomial, data = training)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7676  -0.4610  -0.2896  -0.2714   2.8951  
## 
## Coefficients: (1 not defined because of singularities)
##                                        Estimate Std. Error z value
## (Intercept)                             -3.4096     0.3997  -8.531
## asa == 3TRUE                             0.2609     0.3282   0.795
## asa >= 4TRUE                                 NA         NA      NA
## opspecialty >= 2 & opspecialty < 6TRUE   0.9842     0.3061   3.215
## op_severity > 7TRUE                      0.2591     0.3038   0.853
## malignancy                               1.0949     0.3911   2.800
## age >= 65 & age < 80TRUE                -0.1327     0.3035  -0.437
## age >= 80TRUE                           -1.0251     1.0324  -0.993
##                                        Pr(>|z|)    
## (Intercept)                             < 2e-16 ***
## asa == 3TRUE                            0.42665    
## asa >= 4TRUE                                 NA    
## opspecialty >= 2 & opspecialty < 6TRUE  0.00130 ** 
## op_severity > 7TRUE                     0.39362    
## malignancy                              0.00511 ** 
## age >= 65 & age < 80TRUE                0.66189    
## age >= 80TRUE                           0.32073    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 541.26  on 1055  degrees of freedom
## Residual deviance: 512.17  on 1049  degrees of freedom
## AIC: 526.17
## 
## Number of Fisher Scoring iterations: 6

## $Table_HLtest
##                 total meanpred meanobs predicted observed
## [0.0117,0.0362)    72    0.020   0.069      1.42        5
## 0.0362            107    0.036   0.037      3.87        4
## [0.0395,0.0411)   125    0.041   0.064      5.10        8
## [0.0411,0.0719)    54    0.049   0.167      2.67        9
## [0.0719,0.0912)    84    0.079   0.071      6.66        6
## [0.0912,0.1028)    46    0.092   0.261      4.21       12
## 0.1028             88    0.103   0.159      9.05       14
## [0.1030,0.1270)    62    0.111   0.210      6.89       13
## [0.1270,0.3078]    59    0.196   0.153     11.56        9
## 
## $Chi_square
## [1] 52.405
## 
## $df
## [1] 8
## 
## $p_value
## [1] 0

## AUC [95% CI] for the model 1 :  0.639 [ 0.579  -  0.7 ]

## AUC [95% CI] for the model 1 :  0.639 [ 0.579  -  0.7 ] 
## AUC [95% CI] for the model 2 :  0.571 [ 0.497  -  0.645 ]

## 
##  DeLong's test for two correlated ROC curves
## 
## data:  roc_model and roc_POSSUM
## Z = 1.3028, p-value = 0.1926
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2 
##   0.6393841   0.5709481

Further refining the Regression Models

If we look at the scatter plots in the earlier parts of the analysis, particularly the one of POMS vs Surgical Specialty, the relationship between POMS and the surgical specialty doesn’t seem very strong for the patients who underwent Upper GI, Vascular, Bariatric or Other surgery (opspecialty 3-6). The strongest predictor appears to be Colorectal surgery out of those specialties. Therefore it may be better to fit a more restricted model.

Regression Model 7

We will fit a more restricted model with Day 7 or 8 POMS as the outcome, and the only surgical specialty that functions as a predictor is Colorectal surgery (opspecialty == 2).

## [1] 1056

## [1] 527

## 
## Call:
## glm(formula = poms ~ (asa == 3) + (asa >= 4) + (opspecialty == 
##     2 | opspecialty == 3) + (op_severity > 7) + malignancy + 
##     (age >= 65 & age < 80) + (age >= 80), family = binomial, 
##     data = training)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7757  -0.7483  -0.6270   0.6805   2.3167  
## 
## Coefficients:
##                                         Estimate Std. Error z value
## (Intercept)                              -2.6129     0.2536 -10.302
## asa == 3TRUE                              0.7520     0.1693   4.442
## asa >= 4TRUE                              1.1225     0.6668   1.684
## opspecialty == 2 | opspecialty == 3TRUE   1.1705     0.1852   6.320
## op_severity > 7TRUE                       0.7310     0.2091   3.496
## malignancy                                0.9492     0.3033   3.129
## age >= 65 & age < 80TRUE                  0.3550     0.1738   2.043
## age >= 80TRUE                             0.8318     0.2471   3.366
##                                         Pr(>|z|)    
## (Intercept)                              < 2e-16 ***
## asa == 3TRUE                            8.93e-06 ***
## asa >= 4TRUE                            0.092267 .  
## opspecialty == 2 | opspecialty == 3TRUE 2.61e-10 ***
## op_severity > 7TRUE                     0.000472 ***
## malignancy                              0.001754 ** 
## age >= 65 & age < 80TRUE                0.041045 *  
## age >= 80TRUE                           0.000763 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1198.5  on 1055  degrees of freedom
## Residual deviance: 1099.4  on 1048  degrees of freedom
## AIC: 1115.4
## 
## Number of Fisher Scoring iterations: 4

## $Table_HLtest
##                total meanpred meanobs predicted observed
## [0.0683,0.135)   131    0.127   0.099     16.67       13
## [0.1346,0.182)    95    0.178   0.179     16.91       17
## [0.1816,0.244)    68    0.191   0.309     12.97       21
## 0.2442            28    0.244   0.179      6.84        5
## [0.2521,0.319)    70    0.282   0.271     19.73       19
## [0.3188,0.334)    49    0.329   0.429     16.10       21
## [0.3340,0.421)    35    0.378   0.200     13.22        7
## [0.4207,0.793]    51    0.550   0.608     28.04       31
## 
## $Chi_square
## [1] 15.373
## 
## $df
## [1] 8
## 
## $p_value
## [1] 0.0523

## AUC [95% CI] for the model 1 :  0.689 [ 0.638  -  0.74 ]

## AUC [95% CI] for the model 1 :  0.689 [ 0.638  -  0.74 ] 
## AUC [95% CI] for the model 2 :  0.656 [ 0.601  -  0.712 ]

## 
##  DeLong's test for two correlated ROC curves
## 
## data:  roc_model7 and roc_POSSUM
## Z = 1.0148, p-value = 0.3102
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2 
##   0.6888933   0.6562322

It seems that the AUROC has improved with this more restricted model.

Regression Model 8

We will now do the same as Regression Model 7, but changing surgical specialty into factor variable

## [1] 1056

## [1] 527

## 
## Call:
## glm(formula = poms ~ (asa == 3) + (asa >= 4) + factor(opspecialty) + 
##     (op_severity > 7) + malignancy + (age >= 65 & age < 80) + 
##     (age >= 80), family = binomial, data = training)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9873  -0.6763  -0.5706   0.6051   2.8527  
## 
## Coefficients:
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               -3.0150     0.2858 -10.551  < 2e-16 ***
## asa == 3TRUE               0.8756     0.1794   4.882 1.05e-06 ***
## asa >= 4TRUE               0.9391     0.7171   1.310  0.19034    
## factor(opspecialty)2       1.6562     0.2212   7.489 6.95e-14 ***
## factor(opspecialty)3      -0.9048     0.4390  -2.061  0.03928 *  
## factor(opspecialty)4       0.3053     0.3170   0.963  0.33545    
## factor(opspecialty)5      -1.0367     0.4537  -2.285  0.02232 *  
## factor(opspecialty)6       0.1778     0.5294   0.336  0.73702    
## op_severity > 7TRUE        1.2822     0.2314   5.541 3.01e-08 ***
## malignancy                 0.8847     0.3267   2.708  0.00677 ** 
## age >= 65 & age < 80TRUE   0.1415     0.1818   0.778  0.43637    
## age >= 80TRUE              0.5735     0.2547   2.252  0.02433 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1198.5  on 1055  degrees of freedom
## Residual deviance: 1044.5  on 1044  degrees of freedom
## AIC: 1068.5
## 
## Number of Fisher Scoring iterations: 5

## AUC [95% CI] for the model 1 :  0.717 [ 0.667  -  0.766 ]

## AUC [95% CI] for the model 1 :  0.717 [ 0.667  -  0.766 ] 
## AUC [95% CI] for the model 2 :  0.656 [ 0.601  -  0.712 ]

## 
##  DeLong's test for two correlated ROC curves
## 
## data:  roc_model8 and roc_POSSUM
## Z = 1.8385, p-value = 0.06599
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2 
##   0.7165129   0.6562322

Regression Model 9

Regression Model 8, but change the ASA to dummy variables, ASA == 2, ASA == 3 or ASA >= 4.

## [1] 1056

## [1] 527

## 
## Call:
## glm(formula = poms ~ (asa == 2) + (asa == 3) + (asa >= 4) + factor(opspecialty) + 
##     (op_severity > 7) + malignancy + (age >= 65 & age < 80) + 
##     (age >= 80), family = binomial, data = training)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9860  -0.7118  -0.5906   0.5962   2.8211  
## 
## Coefficients:
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               -3.2281     0.3250  -9.932  < 2e-16 ***
## asa == 2TRUE               0.3323     0.2287   1.453  0.14626    
## asa == 3TRUE               1.1395     0.2574   4.426 9.58e-06 ***
## asa >= 4TRUE               1.2231     0.7431   1.646  0.09978 .  
## factor(opspecialty)2       1.6579     0.2213   7.491 6.83e-14 ***
## factor(opspecialty)3      -0.9294     0.4392  -2.116  0.03431 *  
## factor(opspecialty)4       0.2962     0.3163   0.937  0.34898    
## factor(opspecialty)5      -1.0647     0.4541  -2.345  0.01904 *  
## factor(opspecialty)6       0.1807     0.5303   0.341  0.73324    
## op_severity > 7TRUE        1.2378     0.2332   5.308 1.11e-07 ***
## malignancy                 0.8968     0.3279   2.735  0.00624 ** 
## age >= 65 & age < 80TRUE   0.1182     0.1826   0.647  0.51753    
## age >= 80TRUE              0.5500     0.2552   2.155  0.03115 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1198.5  on 1055  degrees of freedom
## Residual deviance: 1042.3  on 1043  degrees of freedom
## AIC: 1068.3
## 
## Number of Fisher Scoring iterations: 5

## AUC [95% CI] for the model 1 :  0.717 [ 0.668  -  0.766 ]

## AUC [95% CI] for the model 1 :  0.717 [ 0.668  -  0.766 ] 
## AUC [95% CI] for the model 2 :  0.656 [ 0.601  -  0.712 ]

## 
##  DeLong's test for two correlated ROC curves
## 
## data:  roc_model9 and roc_POSSUM
## Z = 1.8521, p-value = 0.06401
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2 
##   0.7169971   0.6562322

Regression Model 10

## [1] 1056

## [1] 527

## 
## Call:
## glm(formula = poms ~ asa + factor(opspecialty) + (op_severity > 
##     7) + malignancy + (age >= 65 & age < 80) + (age >= 80), family = binomial, 
##     data = training)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9263  -0.7321  -0.5780   0.6158   2.7757  
## 
## Coefficients:
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               -3.9524     0.3792 -10.424  < 2e-16 ***
## asa                        0.5936     0.1245   4.767 1.87e-06 ***
## factor(opspecialty)2       1.6539     0.2210   7.484 7.23e-14 ***
## factor(opspecialty)3      -0.9114     0.4372  -2.084  0.03713 *  
## factor(opspecialty)4       0.2805     0.3036   0.924  0.35545    
## factor(opspecialty)5      -1.0656     0.4531  -2.352  0.01869 *  
## factor(opspecialty)6       0.1732     0.5299   0.327  0.74380    
## op_severity > 7TRUE        1.1917     0.2306   5.169 2.36e-07 ***
## malignancy                 0.8980     0.3280   2.737  0.00619 ** 
## age >= 65 & age < 80TRUE   0.1132     0.1823   0.621  0.53477    
## age >= 80TRUE              0.5719     0.2542   2.250  0.02447 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1198.5  on 1055  degrees of freedom
## Residual deviance: 1044.8  on 1045  degrees of freedom
## AIC: 1066.8
## 
## Number of Fisher Scoring iterations: 5

## AUC [95% CI] for the model 1 :  0.715 [ 0.665  -  0.765 ] 
## AUC [95% CI] for the model 2 :  0.656 [ 0.601  -  0.712 ]

## 
##  DeLong's test for two correlated ROC curves
## 
## data:  roc_model10 and roc_POSSUM
## Z = 1.7727, p-value = 0.07628
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2 
##   0.7145665   0.6562322

library(car)

#Check for multicollinearity
vif(lr_poms10)

##                          GVIF Df GVIF^(1/(2*Df))
## asa                  1.166086  1        1.079855
## factor(opspecialty)  2.107648  5        1.077407
## op_severity > 7      1.556094  1        1.247435
## malignancy           1.109619  1        1.053384
## age >= 65 & age < 80 1.244317  1        1.115489
## age >= 80            1.200709  1        1.095769

# Cook's D plot
# identify D values > 4/(n-k-1) 
cutoff <- 4/((nrow(training)-length(lr_poms10$coefficients)-2)) 
plot(lr_poms10, which=4, cook.levels=cutoff)

# Influence Plot 
influencePlot(lr_poms10,    id.method="identify", main="Influence Plot", sub="Circle size is proportial to Cook's Distance" )

Regression Model 11

Let’s introduce some interaction terms.

## [1] 1056

## [1] 527

## 
## Call:
## glm(formula = poms ~ asa + factor(opspecialty) + (op_severity > 
##     7) + malignancy + (age >= 65 & age < 80) + (age >= 80) + 
##     (age >= 65 & age < 80):factor(opspecialty) + (age >= 80):factor(opspecialty), 
##     family = binomial, data = training)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9083  -0.7207  -0.5921   0.6530   2.8344  
## 
## Coefficients: (1 not defined because of singularities)
##                                                Estimate Std. Error z value
## (Intercept)                                    -4.04600    0.39428 -10.262
## asa                                             0.58728    0.12539   4.684
## factor(opspecialty)2                            1.82386    0.26691   6.833
## factor(opspecialty)3                           -1.52219    0.62589  -2.432
## factor(opspecialty)4                            0.78533    0.54858   1.432
## factor(opspecialty)5                           -1.12719    0.50173  -2.247
## factor(opspecialty)6                            0.71612    0.60089   1.192
## op_severity > 7TRUE                             1.21916    0.23006   5.299
## malignancy                                      0.76600    0.33553   2.283
## age >= 65 & age < 80TRUE                        0.10168    0.26442   0.385
## age >= 80TRUE                                   1.03659    0.33007   3.141
## factor(opspecialty)2:age >= 65 & age < 80TRUE   0.01765    0.39677   0.044
## factor(opspecialty)3:age >= 65 & age < 80TRUE   2.83845    1.14021   2.489
## factor(opspecialty)4:age >= 65 & age < 80TRUE  -0.45035    0.68520  -0.657
## factor(opspecialty)5:age >= 65 & age < 80TRUE   1.88693    1.38483   1.363
## factor(opspecialty)6:age >= 65 & age < 80TRUE  -1.01787    1.28099  -0.795
## factor(opspecialty)2:age >= 80TRUE             -1.37305    0.59846  -2.294
## factor(opspecialty)3:age >= 80TRUE             15.35067  882.74369   0.017
## factor(opspecialty)4:age >= 80TRUE             -1.23980    0.83168  -1.491
## factor(opspecialty)5:age >= 80TRUE                   NA         NA      NA
## factor(opspecialty)6:age >= 80TRUE            -14.71122  591.29701  -0.025
##                                               Pr(>|z|)    
## (Intercept)                                    < 2e-16 ***
## asa                                           2.82e-06 ***
## factor(opspecialty)2                          8.30e-12 ***
## factor(opspecialty)3                           0.01501 *  
## factor(opspecialty)4                           0.15227    
## factor(opspecialty)5                           0.02466 *  
## factor(opspecialty)6                           0.23335    
## op_severity > 7TRUE                           1.16e-07 ***
## malignancy                                     0.02243 *  
## age >= 65 & age < 80TRUE                       0.70058    
## age >= 80TRUE                                  0.00169 ** 
## factor(opspecialty)2:age >= 65 & age < 80TRUE  0.96452    
## factor(opspecialty)3:age >= 65 & age < 80TRUE  0.01280 *  
## factor(opspecialty)4:age >= 65 & age < 80TRUE  0.51102    
## factor(opspecialty)5:age >= 65 & age < 80TRUE  0.17302    
## factor(opspecialty)6:age >= 65 & age < 80TRUE  0.42685    
## factor(opspecialty)2:age >= 80TRUE             0.02177 *  
## factor(opspecialty)3:age >= 80TRUE             0.98613    
## factor(opspecialty)4:age >= 80TRUE             0.13603    
## factor(opspecialty)5:age >= 80TRUE                  NA    
## factor(opspecialty)6:age >= 80TRUE             0.98015    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1198.5  on 1055  degrees of freedom
## Residual deviance: 1025.1  on 1036  degrees of freedom
## AIC: 1065.1
## 
## Number of Fisher Scoring iterations: 13

## AUC [95% CI] for the model 1 :  0.72 [ 0.67  -  0.77 ]

## AUC [95% CI] for the model 1 :  0.72 [ 0.67  -  0.77 ] 
## AUC [95% CI] for the model 2 :  0.656 [ 0.601  -  0.712 ]

## 
##  DeLong's test for two correlated ROC curves
## 
## data:  roc_model11 and roc_POSSUM
## Z = 1.9328, p-value = 0.05325
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2 
##   0.7203391   0.6562322

Regression Model Cross Validation

Calibration of POSSUM-morbidity

POMS as a ordinal outcome with 9 levels?

We may need to consider that POMS as a binary outcome may not be the best measure for “morbidity”. Instead the number of POMS domains may provide a better way of scoring morbidity. We can create a scale 0-9 where 0 is no POMS domains positive on the day it is tested, and each domain that is positive increases the score by 1 to a maximum of 9, since POMS has 9 organ-system domains.

join7 <- join7 %>% mutate(poms_score = pulm + inf + renal + gastro + cardio + neuro + wound + haem + pain) %>% mutate(poms_score = replace(poms_score, which(los < 7 & is.na(poms_score)), 0))

describe(join7$poms_score)

## join7$poms_score 
##        n  missing distinct     Info     Mean      Gmd 
##     1583        0        9    0.584   0.5389   0.8962 
## 
## lowest : 0 1 2 3 4, highest: 4 5 6 7 8 
## 
## 0 (1180, 0.745), 1 (185, 0.117), 2 (105, 0.066), 3 (51, 0.032), 4 (24,
## 0.015), 5 (25, 0.016), 6 (9, 0.006), 7 (2, 0.001), 8 (2, 0.001)

join14 <- join14 %>% mutate(poms_score = pulm + inf + renal + gastro + cardio + neuro + wound + haem + pain) %>% mutate(poms_score = replace(poms_score, which(los < 14 & is.na(poms_score)), 0))

describe(join14$poms_score)

## join14$poms_score 
##        n  missing distinct     Info     Mean      Gmd 
##     1753        0        9    0.242   0.2002   0.3775 
## 
## lowest : 0 1 2 3 4, highest: 4 5 6 7 8 
## 
## 0 (1598, 0.912), 1 (64, 0.037), 2 (44, 0.025), 3 (20, 0.011), 4 (10,
## 0.006), 5 (9, 0.005), 6 (4, 0.002), 7 (2, 0.001), 8 (2, 0.001)

Sensitivity analysis

library(dplyr)
library(Hmisc)

#Examine the groups excluded because missing day 7 POMS data
missing_index <- which(!(data_clean$adm_id %in% join7$adm_id))
missing <- data_clean[missing_index,]
describe(missing)

## missing 
## 
##  40  Variables      243  Observations
## ---------------------------------------------------------------------------
## X 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      243        0      243        1    976.6    603.2     89.0    183.0 
##      .25      .50      .75      .90      .95 
##    526.5   1050.0   1394.5   1652.2   1718.5 
## 
## lowest :   21   26   27   30   34, highest: 1786 1788 1799 1808 1817 
## ---------------------------------------------------------------------------
## adm_id 
##        n  missing distinct 
##      243        0      243 
## 
## lowest : 019a8341-e77b-4724-9a5c-b89a4cfa7c65 0230c088-b321-40c7-b3c6-40318a817e14 0426657e-faf1-4c03-b160-cf55c0c61b6b 046d5abd-2e8a-4285-b178-04c3d6c09f37 04772ade-a7af-4e3c-b682-2e3389240a35
## highest: faeae1bf-a225-4b63-b1be-1e7ae98e3403 faeb4542-f64c-4590-b285-dddcc28d3a86 fd0cd879-eb76-4f2b-b121-367801294323 fddeb7e1-fdf4-409c-a2f3-ca77293933bf ffb0f43c-c710-43b2-8f7b-620d68522a57 
## ---------------------------------------------------------------------------
## sex 
##        n  missing distinct 
##      243        0        2 
## 
## Female (153, 0.63), Male (90, 0.37)
## ---------------------------------------------------------------------------
## age 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      243        0      241        1    62.41    18.66    26.55    31.47 
##      .25      .50      .75      .90      .95 
##    55.28    66.94    74.50    80.87    82.83 
## 
## lowest : 17.34702 18.68036 19.38672 19.86585 21.77687
## highest: 87.14853 87.99179 88.36413 89.34976 89.93840 
## ---------------------------------------------------------------------------
## asa 
##        n  missing distinct     Info     Mean      Gmd 
##      243        0        4    0.741    2.123   0.6233 
## 
## 1 (32, 0.132), 2 (152, 0.626), 3 (56, 0.230), 4 (3, 0.012)
## ---------------------------------------------------------------------------
## opspecialty 
##        n  missing distinct     Info     Mean      Gmd 
##      243        0        6    0.712    1.519   0.7627 
## 
## lowest : 1 2 3 4 5, highest: 2 3 4 5 6 
## 
## 1 (156, 0.642), 2 (69, 0.284), 3 (1, 0.004), 4 (14, 0.058), 5 (2, 0.008),
## 6 (1, 0.004)
## ---------------------------------------------------------------------------
## op_severity 
##        n  missing distinct     Info     Mean      Gmd 
##      243        0        4    0.416     7.16    1.443 
## 
## 1 (6, 0.025), 2 (13, 0.053), 4 (21, 0.086), 8 (203, 0.835)
## ---------------------------------------------------------------------------
## ncepod 
##        n  missing distinct    value 
##      242        1        1        E 
## 
## E (242, 1)
## ---------------------------------------------------------------------------
## malignancy 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##      243        0        2    0.227       20   0.0823   0.1517 
## ---------------------------------------------------------------------------
## smoking 
##        n  missing distinct     Info     Mean      Gmd 
##      243        0        3    0.823   0.6708   0.8235 
## 
## 0 (129, 0.531), 1 (65, 0.267), 2 (49, 0.202)
## ---------------------------------------------------------------------------
## alcohol 
##        n  missing distinct     Info     Mean      Gmd 
##      242        1        3    0.756   0.9091   0.9956 
## 
## 0 (130, 0.537), 1 (4, 0.017), 2 (108, 0.446)
## ---------------------------------------------------------------------------
## ecgposs 
##        n  missing distinct     Info     Mean      Gmd 
##      243        0        3    0.174    1.235   0.4469 
## 
## 1 (228, 0.938), 4 (12, 0.049), 8 (3, 0.012)
## ---------------------------------------------------------------------------
## micharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##      184       59        2     0.14        9  0.04891  0.09355 
## ---------------------------------------------------------------------------
## ccfcharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##      184       59        2    0.032        2  0.01087  0.02162 
## ---------------------------------------------------------------------------
## pvdcharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##      184       59        2    0.183       12  0.06522   0.1226 
## ---------------------------------------------------------------------------
## cvdcharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##      184       59        2    0.225       15  0.08152   0.1506 
## ---------------------------------------------------------------------------
## dementiacharls 
##        n  missing distinct     Info     Mean      Gmd 
##      184       59        1        0        0        0 
## 
## 0 (184, 1)
## ---------------------------------------------------------------------------
## copdcharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##      184       59        2    0.303       21   0.1141   0.2033 
## ---------------------------------------------------------------------------
## ctdcharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##      184       59        2    0.183       12  0.06522   0.1226 
## ---------------------------------------------------------------------------
## pudcharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##      184       59        2    0.048        3   0.0163  0.03225 
## ---------------------------------------------------------------------------
## hepdxcharls 
##        n  missing distinct     Info     Mean      Gmd 
##      184       59        3    0.048  0.02717  0.05399 
## 
## 0 (181, 0.984), 1 (2, 0.011), 3 (1, 0.005)
## ---------------------------------------------------------------------------
## dmcharls 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##      184       59        2    0.225       15  0.08152   0.1506 
## ---------------------------------------------------------------------------
## hemiplegiacharls 
##        n  missing distinct     Info     Mean      Gmd 
##      184       59        2    0.016  0.01087  0.02174 
## 
## 0 (183, 0.995), 2 (1, 0.005)
## ---------------------------------------------------------------------------
## renalcharls 
##        n  missing distinct     Info     Mean      Gmd 
##      184       59        1        0        0        0 
## 
## 0 (184, 1)
## ---------------------------------------------------------------------------
## dmendorgancharls 
##        n  missing distinct     Info     Mean      Gmd 
##      184       59        2    0.016  0.01087  0.02174 
## 
## 0 (183, 0.995), 2 (1, 0.005)
## ---------------------------------------------------------------------------
## tumourcharls 
##        n  missing distinct     Info     Mean      Gmd 
##      165       78        3    0.202   0.1697   0.3199 
## 
## 0 (153, 0.927), 2 (11, 0.067), 6 (1, 0.006)
## ---------------------------------------------------------------------------
## aidscharls 
##        n  missing distinct     Info     Mean      Gmd 
##      184       59        2    0.016  0.03261  0.06522 
## 
## 0 (183, 0.995), 6 (1, 0.005)
## ---------------------------------------------------------------------------
## charlsoncomorb 
##        n  missing distinct     Info     Mean      Gmd 
##      184       59        7    0.777   0.7174    1.038 
## 
## lowest : 0 1 2 3 4, highest: 2 3 4 5 6 
## 
## 0 (109, 0.592), 1 (45, 0.245), 2 (15, 0.082), 3 (8, 0.043), 4 (4, 0.022),
## 5 (1, 0.005), 6 (2, 0.011)
## ---------------------------------------------------------------------------
## charlsonage 
##        n  missing distinct     Info     Mean      Gmd 
##      184       59        5     0.94    2.043    1.439 
## 
## lowest : 0 1 2 3 4, highest: 0 1 2 3 4 
## 
## 0 (36, 0.196), 1 (19, 0.103), 2 (52, 0.283), 3 (55, 0.299), 4 (22, 0.120)
## ---------------------------------------------------------------------------
## charlsontotal 
##        n  missing distinct     Info     Mean      Gmd 
##      184       59        9     0.97    2.761    2.054 
## 
## lowest : 0 1 2 3 4, highest: 4 5 6 7 8 
## 
## 0 (24, 0.130), 1 (23, 0.125), 2 (37, 0.201), 3 (40, 0.217), 4 (35, 0.190),
## 5 (10, 0.054), 6 (7, 0.038), 7 (5, 0.027), 8 (3, 0.016)
## ---------------------------------------------------------------------------
## status_at_dc 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##      243        0        2    0.012      242   0.9959  0.00823 
## ---------------------------------------------------------------------------
## mortip 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##      243        0        2    0.012        1 0.004115  0.00823 
## ---------------------------------------------------------------------------
## mort30 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##      243        0        2    0.012        1 0.004115  0.00823 
## ---------------------------------------------------------------------------
## los 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      243        0       15    0.676    9.152    3.977      7.0      7.0 
##      .25      .50      .75      .90      .95 
##      7.0      7.0      8.0      9.0     13.8 
## 
## lowest :   7   8   9  10  11, highest:  26  29  39 104 144 
##                                                                       
## Value          7     8     9    10    11    12    14    16    18    21
## Frequency    166    41    15     2     4     2     1     2     1     1
## Proportion 0.683 0.169 0.062 0.008 0.016 0.008 0.004 0.008 0.004 0.004
##                                         
## Value         26    29    39   104   144
## Frequency      2     3     1     1     1
## Proportion 0.008 0.012 0.004 0.004 0.004
## ---------------------------------------------------------------------------
## Followup_Readmission_30Days 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##      239        4        2    0.061        5  0.02092  0.04114 
## ---------------------------------------------------------------------------
## risk_score 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      243        0       33    0.979    1.505    1.021    0.381    0.381 
##      .25      .50      .75      .90      .95 
##    0.712    1.158    1.972    2.857    3.281 
## 
## lowest : 0.000 0.381 0.712 1.048 1.093
## highest: 3.546 3.714 3.877 4.095 4.360 
## ---------------------------------------------------------------------------
## sort_mort 
##         n   missing  distinct      Info      Mean       Gmd       .05 
##       243         0        33     0.979  0.004688  0.005073 0.0009248 
##       .10       .25       .50       .75       .90       .95 
## 0.0009248 0.0012872 0.0020092 0.0045232 0.0109252 0.0165448 
## 
## lowest : 0.0006319930 0.0009248072 0.0012871973 0.0018003001 0.0018830080
## highest: 0.0214572897 0.0252833682 0.0296268398 0.0365795704 0.0471555481 
## ---------------------------------------------------------------------------
## morb_poss 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      243        0       62    0.997   0.3417   0.1725   0.1171   0.1795 
##      .25      .50      .75      .90      .95 
##   0.2315   0.3122   0.4403   0.5597   0.6348 
## 
## lowest : 0.07378165 0.07585818 0.08548914 0.08786391 0.09279295
## highest: 0.76852478 0.80376594 0.81457258 0.82053848 0.82491373 
## ---------------------------------------------------------------------------
## p_possmort 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      243        0       62    0.997  0.02043  0.01721 0.005103 0.007109 
##      .25      .50      .75      .90      .95 
## 0.009183 0.014540 0.024941 0.039994 0.050874 
## 
## lowest : 0.003077977 0.003121859 0.003541702 0.003643356 0.003695268
## highest: 0.095989688 0.104968369 0.116006857 0.139553908 0.141267768 
## ---------------------------------------------------------------------------
## srs_mort 
##        n  missing distinct     Info     Mean      Gmd 
##      243        0        7    0.823  0.02182  0.01388 
## 
## lowest : 0.0006818628 0.0015780281 0.0036477148 0.0084090681 0.0192652326
## highest: 0.0036477148 0.0084090681 0.0192652326 0.0435216350 0.0953494649 
##                                                               
## Value      0.0006818628 0.0015780281 0.0036477148 0.0084090681
## Frequency             3            5           12           38
## Proportion        0.012        0.021        0.049        0.156
##                                                  
## Value      0.0192652326 0.0435216350 0.0953494649
## Frequency           133           50            2
## Proportion        0.547        0.206        0.008
## ---------------------------------------------------------------------------

t.test(missing$age, join7$age)

## 
##  Welch Two Sample t-test
## 
## data:  missing$age and join7$age
## t = 2.9138, df = 313.75, p-value = 0.003827
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.112344 5.738259
## sample estimates:
## mean of x mean of y 
##  62.41368  58.98838

t.test(missing$morb_poss, join7$morb_poss)

## 
##  Welch Two Sample t-test
## 
## data:  missing$morb_poss and join7$morb_poss
## t = 1.9808, df = 335.66, p-value = 0.04843
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.0001503437 0.0431870073
## sample estimates:
## mean of x mean of y 
## 0.3416967 0.3200280

t.test(missing$p_possmort, join7$p_possmort)

## 
##  Welch Two Sample t-test
## 
## data:  missing$p_possmort and join7$p_possmort
## t = 0.25004, df = 406.46, p-value = 0.8027
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.002555180  0.003299924
## sample estimates:
##  mean of x  mean of y 
## 0.02043441 0.02006204

t.test(missing$srs_mort, join7$srs_mort)

## 
##  Welch Two Sample t-test
## 
## data:  missing$srs_mort and join7$srs_mort
## t = 1.3428, df = 339.15, p-value = 0.1802
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.0006233685  0.0033055852
## sample estimates:
## mean of x mean of y 
## 0.0218202 0.0204791

t.test(missing$sort_mort, join7$sort_mort)

## 
##  Welch Two Sample t-test
## 
## data:  missing$sort_mort and join7$sort_mort
## t = -0.2111, df = 425.91, p-value = 0.8329
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.0010063686  0.0008111643
## sample estimates:
##   mean of x   mean of y 
## 0.004687562 0.004785164

wilcox.test(missing$asa, join7$asa)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  missing$asa and join7$asa
## W = 199400, p-value = 0.2924
## alternative hypothesis: true location shift is not equal to 0

wilcox.test(missing$los, join7$los)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  missing$los and join7$los
## W = 245070, p-value = 4.64e-12
## alternative hypothesis: true location shift is not equal to 0

plotCalibration(data = join14, 
                cOutcome = match("poms", colnames(join14)), 
                predRisk = predict(lr_poms10, newdata = join14, type = "response"),
                plottitle = "Calibration plot for SORT-morbidity using Day 14 POMS")

## $Table_HLtest
##                 total meanpred meanobs predicted observed
## [0.0118,0.0769)   182    0.065   0.011     11.78        2
## [0.0769,0.1265)   169    0.107   0.024     18.06        4
## [0.1265,0.1864)   362    0.164   0.061     59.54       22
## [0.1864,0.1897)   261    0.188   0.031     49.17        8
## [0.1897,0.2660)   151    0.239   0.113     36.15       17
## [0.2660,0.2931)   137    0.270   0.066     37.05        9
## [0.2931,0.3734)   141    0.317   0.071     44.75       10
## [0.3734,0.5463)   238    0.449   0.206    106.80       49
## [0.5463,0.8951]   112    0.685   0.304     76.66       34
## 
## $Chi_square
## [1] 305.915
## 
## $df
## [1] 8
## 
## $p_value
## [1] 0

x <- predict(lr_poms10, newdata = join14, type = "response") #Temp variable
roc_join14model10 <- roc(join14$poms ~ x, plot=FALSE, auc=TRUE, ci=TRUE)
roc_join14POSSUM <- roc(join14$poms ~ join14$morb_poss, plot=FALSE, auc=TRUE, ci=TRUE)

#Test the ROC objects against each other
roc.test(roc_join14model10, roc_join14POSSUM)

## 
##  DeLong's test for two correlated ROC curves
## 
## data:  roc_join14model10 and roc_join14POSSUM
## Z = 4.4183, p-value = 9.947e-06
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2 
##   0.7332411   0.6003452

plot(roc_join14model10)

## 
## Call:
## roc.formula(formula = join14$poms ~ x, plot = FALSE, auc = TRUE,     ci = TRUE)
## 
## Data: x in 1598 controls (join14$poms FALSE) < 155 cases (join14$poms TRUE).
## Area under the curve: 0.7332
## 95% CI: 0.6908-0.7757 (DeLong)

The sensitivity analysis also shows that the SORT-morbidity model has superior discrimination compared to POSSUM when used to predict the presence of POMS-defined morbidity at Day 14 or 15 postoperatively (AUROC = 0.7332411, 0.6907762, 0.7332411, 0.7757061 vs. 0.6003452, 0.5497408, 0.6003452, 0.6509496; z = 4.4183297, p = 9.946662410^{-6}).

The BJA Manuscript reviewers have returned with some comments about a sensitivity analysis. This report is an attempt to conduct this.

library(rms)

We will use Frank Harrell’s rms package to do easily plot calibration curves and obtain AUROCS. The model we will examine is:

summary(lr_poms9)

## 
## Call:
## glm(formula = poms ~ (asa == 2) + (asa == 3) + (asa >= 4) + factor(opspecialty) + 
##     (op_severity > 7) + malignancy + (age >= 65 & age < 80) + 
##     (age >= 80), family = binomial, data = training)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9860  -0.7118  -0.5906   0.5962   2.8211  
## 
## Coefficients:
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               -3.2281     0.3250  -9.932  < 2e-16 ***
## asa == 2TRUE               0.3323     0.2287   1.453  0.14626    
## asa == 3TRUE               1.1395     0.2574   4.426 9.58e-06 ***
## asa >= 4TRUE               1.2231     0.7431   1.646  0.09978 .  
## factor(opspecialty)2       1.6579     0.2213   7.491 6.83e-14 ***
## factor(opspecialty)3      -0.9294     0.4392  -2.116  0.03431 *  
## factor(opspecialty)4       0.2962     0.3163   0.937  0.34898    
## factor(opspecialty)5      -1.0647     0.4541  -2.345  0.01904 *  
## factor(opspecialty)6       0.1807     0.5303   0.341  0.73324    
## op_severity > 7TRUE        1.2378     0.2332   5.308 1.11e-07 ***
## malignancy                 0.8968     0.3279   2.735  0.00624 ** 
## age >= 65 & age < 80TRUE   0.1182     0.1826   0.647  0.51753    
## age >= 80TRUE              0.5500     0.2552   2.155  0.03115 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1198.5  on 1055  degrees of freedom
## Residual deviance: 1042.3  on 1043  degrees of freedom
## AIC: 1068.3
## 
## Number of Fisher Scoring iterations: 5

POMS_lo

We have identified some low-grade complications for POMS (e.g. urinary catheter, pain).

val.prob(y = join7$poms_lo, logit = predict(lr_poms9, newdata = join7))

##           Dxy       C (ROC)            R2             D      D:Chi-sq 
##  4.950408e-01  7.475204e-01  1.710515e-01  1.143925e-01  1.820834e+02 
##           D:p             U      U:Chi-sq           U:p             Q 
##            NA  1.395459e-02  2.409012e+01  5.873509e-06  1.004379e-01 
##         Brier     Intercept         Slope          Emax           E90 
##  1.415803e-01 -3.158285e-01  1.007832e+00  1.779985e-01  7.696361e-02 
##          Eavg           S:z           S:p 
##  4.881226e-02 -3.523574e+00  4.257684e-04

HL_pomslo <- plotCalibration(join7, 
                       cOutcome = match("poms_lo", colnames(join7)),
                       predRisk = predict(lr_poms9, newdata = join7, "response"))

roc_pomslo <- roc(join7$poms_lo ~ predict(lr_poms9, newdata = join7, "response"), plot=FALSE, auc=TRUE, ci=TRUE)

POMS_hi

Conversely, there are some high-grade complications (e.g. cardiac ischaemia).

val.prob(y = join7$poms_hi, logit = predict(lr_poms9, newdata = join7))

##           Dxy       C (ROC)            R2             D      D:Chi-sq 
##  4.493923e-01  7.246962e-01  8.506971e-03  4.246795e-03  7.722676e+00 
##           D:p             U      U:Chi-sq           U:p             Q 
##            NA  7.106473e-02  1.144955e+02  0.000000e+00 -6.681794e-02 
##         Brier     Intercept         Slope          Emax           E90 
##  1.321966e-01 -8.742130e-01  8.272451e-01  4.544539e-01  1.909423e-01 
##          Eavg           S:z           S:p 
##  1.016634e-01 -5.301993e+00  1.145454e-07

HL_pomshi <- plotCalibration(join7, 
                       cOutcome = match("poms_hi", colnames(join7)),
                       predRisk = predict(lr_poms9, newdata = join7, "response"))

roc_pomshi <- roc(join7$poms_hi ~ predict(lr_poms9, newdata = join7, "response"), plot=FALSE, auc=TRUE, ci=TRUE)

POMS14

We can also see how well the model predicts POMS outcomes 2 weeks postoperatively.

val.prob(y = join14$poms, logit = predict(lr_poms9, newdata = join14))

##           Dxy       C (ROC)            R2             D      D:Chi-sq 
##  4.691388e-01  7.345694e-01 -3.320648e-01 -1.398194e-01 -2.441034e+02 
##           D:p             U      U:Chi-sq           U:p             Q 
##            NA  1.977038e-01  3.485748e+02  0.000000e+00 -3.375232e-01 
##         Brier     Intercept         Slope          Emax           E90 
##  1.085156e-01 -1.477501e+00  8.941401e-01  5.224268e-01  2.748685e-01 
##          Eavg           S:z           S:p 
##  1.609582e-01 -1.035485e+01  3.978216e-25

HL_poms14 <- plotCalibration(join14, 
                       cOutcome = match("poms", colnames(join14)),
                       predRisk = predict(lr_poms9, newdata = join14, "response"))

roc_poms14 <- roc(join14$poms ~ predict(lr_poms9, newdata = join14, "response"), plot=FALSE, auc=TRUE, ci=TRUE)

POMS21

Now we see how well the model predicts POMS outcomes 3 weeks postoperatively.

val.prob(y = join21$poms, logit = predict(lr_poms9, newdata = join21))

##           Dxy       C (ROC)            R2             D      D:Chi-sq 
##  5.848879e-01  7.924439e-01 -1.420562e+00 -3.349814e-01 -5.868924e+02 
##           D:p             U      U:Chi-sq           U:p             Q 
##            NA  3.774939e-01  6.645018e+02  0.000000e+00 -7.124753e-01 
##         Brier     Intercept         Slope          Emax           E90 
##  9.298428e-02 -2.326692e+00  1.081127e+00  6.603238e-01  3.795625e-01 
##          Eavg           S:z           S:p 
##  2.105892e-01 -1.342258e+01  4.459248e-41

HL_poms21 <- plotCalibration(join21, 
                       cOutcome = match("poms", colnames(join21)),
                       predRisk = predict(lr_poms9, newdata = join21, "response"))

roc_poms21 <- roc(join21$poms ~ predict(lr_poms9, newdata = join21, "response"), plot=FALSE, auc=TRUE, ci=TRUE)

POMS28

Lastly, we see how well the model predicts POMS outcomes 4 weeks postoperatively.

val.prob(y = join28$poms, logit = predict(lr_poms9, newdata = join28))

##           Dxy       C (ROC)            R2             D      D:Chi-sq 
##  6.223219e-01  8.111610e-01 -2.375358e+00 -4.164674e-01 -7.373967e+02 
##           D:p             U      U:Chi-sq           U:p             Q 
##            NA  4.449196e-01  7.908425e+02  0.000000e+00 -8.613870e-01 
##         Brier     Intercept         Slope          Emax           E90 
##  9.269795e-02 -2.770420e+00  1.048215e+00  7.618127e-01  4.151957e-01 
##          Eavg           S:z           S:p 
##  2.247681e-01 -1.363655e+01  2.427874e-42

HL_poms28 <- plotCalibration(join28, 
                       cOutcome = match("poms", colnames(join28)),
                       predRisk = predict(lr_poms9, newdata = join28, "response"))

roc_poms28 <- roc(join28$poms ~ predict(lr_poms9, newdata = join28, "response"), plot=FALSE, auc=TRUE, ci=TRUE)

Interpretation

It looks as though the model is well-calibrated for POMS_lo but mis-calibrated for all the outcomes. However it still remains good at discriminating all the outcomes. Therefore the sensible thing would be to fit another model for the high-grade morbidity outcomes:

#Split the data set into training and testing
set.seed(123)
x <- createDataPartition(join7$poms, p = (2/3), list = FALSE, times = 1)
training <- join7[x, ]
testing <- join7[-x, ]

#Fit a model with the training dataset
lr_poms_hi <- glm(poms_hi ~ (asa == 2) +
                  (asa == 3) + 
                  (asa >= 4) + 
                  factor(opspecialty) + 
                  (op_severity > 7) + 
                  malignancy + 
                  (age >= 65 & age <80) + 
                  (age >= 80), 
                data = training, 
                family = binomial)

summary(lr_poms9)

## 
## Call:
## glm(formula = poms ~ (asa == 2) + (asa == 3) + (asa >= 4) + factor(opspecialty) + 
##     (op_severity > 7) + malignancy + (age >= 65 & age < 80) + 
##     (age >= 80), family = binomial, data = training)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9860  -0.7118  -0.5906   0.5962   2.8211  
## 
## Coefficients:
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               -3.2281     0.3250  -9.932  < 2e-16 ***
## asa == 2TRUE               0.3323     0.2287   1.453  0.14626    
## asa == 3TRUE               1.1395     0.2574   4.426 9.58e-06 ***
## asa >= 4TRUE               1.2231     0.7431   1.646  0.09978 .  
## factor(opspecialty)2       1.6579     0.2213   7.491 6.83e-14 ***
## factor(opspecialty)3      -0.9294     0.4392  -2.116  0.03431 *  
## factor(opspecialty)4       0.2962     0.3163   0.937  0.34898    
## factor(opspecialty)5      -1.0647     0.4541  -2.345  0.01904 *  
## factor(opspecialty)6       0.1807     0.5303   0.341  0.73324    
## op_severity > 7TRUE        1.2378     0.2332   5.308 1.11e-07 ***
## malignancy                 0.8968     0.3279   2.735  0.00624 ** 
## age >= 65 & age < 80TRUE   0.1182     0.1826   0.647  0.51753    
## age >= 80TRUE              0.5500     0.2552   2.155  0.03115 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1198.5  on 1055  degrees of freedom
## Residual deviance: 1042.3  on 1043  degrees of freedom
## AIC: 1068.3
## 
## Number of Fisher Scoring iterations: 5

val.prob(y = testing$poms_hi, logit = predict(lr_poms_hi, newdata = testing))

##          Dxy      C (ROC)           R2            D     D:Chi-sq 
##  0.399186257  0.699593128  0.119703958  0.068834868 37.275975326 
##          D:p            U     U:Chi-sq          U:p            Q 
##           NA -0.002641733  0.607806771  0.737932158  0.071476601 
##        Brier    Intercept        Slope         Emax          E90 
##  0.118276423 -0.169306583  0.883451908  0.090627914  0.035036177 
##         Eavg          S:z          S:p 
##  0.019355148  0.522449243  0.601357577