After cleaning the data, I fitted regression models with POMS (binary: TRUE or FALSE) as the outcome measure with predictor variables derived from the original SORT model.
Regression Model 1: POMS Day 7 or 8 outcome, Age continuous predictor, no internal validation.
Regression Model 2: POMS Day 7 or 8 outcome, predictor variables exactly as in original SORT model, no internal validation.
Regression Model 3: POMS Day 7 or 8 outcome, predictor variables exactly as in original SORT model, internal validation.
Regression Model 4: POMS Day 7 or 8 outcome, predictor variables pre-weighted with the coefficients used in the original SORT model before fitting model, internal validation.
Regression Model 5: Alternative outcome measure - Readmission within 30 days, predictor variables as in Model 3, internal validation.
Regression Model 6: POMS Day 14 or 15 outcome, predictor variables exactly as in Model 3, internal validation.
Regression Model 7: POMS Day 7 or 8 outcome, restricted predictor variable with only Colorectal surgery as the dummy variable for high-risk surgical specialties, otherwise same as Model 3, internal validation.
Regression Model 8: POMS Day 7 or 8 outcome, Surgical Specialty as factors (6 categories, 5 dummy variables), otherwise same as Model 7, internal validation.
Regression Model 9: POMS Day 7 or 8 outcome, ASA = 2 or 3 or >=4, otherwise same as Model 8, internal validation.
Regression Model 10: POMS Day 7 or 8 outcome, predictor variable for opspecialty is now a factor with 6 categories (5 dummy variables) instead of “high-risk” surgical specialties, and ASA is now a continuous variable, otherwise same as Model 7, internal validation.
Regression Model 11: POMS Day 7 or 8 outcome, predictor variable for opspecialty is now a factor with 6 categories (5 dummy variables) instead of “high-risk” surgical specialties, and ASA and Age are now continuous variables, and added interaction between age and specialty, otherwise same as Model 10, internal validation.
NB: Where the ROC curves were plotted, the dotted lines in red signify the ROC curve for p-POSSUM(morbidity), overlaid for comparison, the solid lines signify the fitted model being tested. For the sake of brevity, I do not show the source code after Model 4.
Matt Oliver has kindly provided some data that he has cleaned from the SOuRCe database of >1900 patients who underwent surgery. The purpose of our project is to calibrate and validate the risk calculation tools. He has already done this with mortality using SORT and with pPOSSUM. We are going to see if we can also do this for morbidity using POMS.
First we load the data that has been cleaned in the SOuRCeDataCleaning.Rmd file
library(knitr)
purl("SOuRCeDataCleaning.Rmd")
read_chunk("SOuRCeDataCleaning.R")
unlink("SOuRCeDataCleaning.R")
The first step is to externally validate the SORT model for mortality against the data
#Load the data, this is a workaround because there're problems with running
#plotCalibration with using the existing data_clean dataframe, must be something
#to do with how the data is stored in R (vector or matrix or otherwise)
data_clean <- read.csv("data/SOuRCedata_clean.csv")
#Test SORT with the dataset
library(PredictABEL)
HL1 <- plotCalibration(data_clean,
cOutcome = match("mort30", colnames(data_clean)),
predRisk = data_clean$sort_mort,
rangeaxis = c(0,0.1),
plottitle = "Calibration plot for SORT")
HL2 <- plotCalibration(data_clean,
cOutcome = match("mort30", colnames(data_clean)),
predRisk = data_clean$p_possmort,
rangeaxis = c(0,0.1),
plottitle = "Calibration plot for p-POSSUM")
HL3 <- plotCalibration(data_clean,
cOutcome = match("mort30", colnames(data_clean)),
predRisk = data_clean$srs_mort,
rangeaxis = c(0,0.1),
plottitle = "Calibration plot for SRS")
#Let's overlay the ROC curve over the ROC curve for p-POSSUM
plotROC(data = data_clean,
cOutcome = match("mort30", colnames(data_clean)),
predrisk = cbind(data_clean$sort_mort,
data_clean$p_possmort,
data_clean$srs_mort),
plottitle = "SORT vs p-POSSUM & SRS"
#labels = c("SORT", "p-POSSUM", "SRS")
)
## AUC [95% CI] for the model 1 : 0.793 [ 0.554 - 1.031 ]
## AUC [95% CI] for the model 2 : 0.791 [ 0.509 - 1.073 ]
## AUC [95% CI] for the model 3 : 0.639 [ 0.491 - 0.787 ]
legend("bottomright", inset = c(0.02,0.02), legend = c("SORT","p-POSSUM","SRS"),
col = c(17,18,19), lty = c(1,2,3),
horiz = TRUE, cex = 0.5)
#Let's compare whether the new model is as good as POSSUM in discrimination
library(pROC)
#Create the ROC objects
roc_SORT <- roc(data_clean$mort30 ~ data_clean$sort_mort, plot=FALSE, auc=TRUE, ci=TRUE)
roc_pPOSSUM <- roc(data_clean$mort30 ~ data_clean$p_possmort, plot=FALSE, auc=TRUE, ci=TRUE)
roc_SRS <- roc(data_clean$mort30 ~ data_clean$srs_mort, plot=FALSE, auc=TRUE, ci=TRUE)
#Test the ROC objects against each other
roc.test(roc_SORT, roc_pPOSSUM)
##
## DeLong's test for two correlated ROC curves
##
## data: roc_SORT and roc_pPOSSUM
## Z = 0.028709, p-value = 0.9771
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2
## 0.7925357 0.7910263
roc.test(roc_SORT, roc_SRS)
##
## DeLong's test for two correlated ROC curves
##
## data: roc_SORT and roc_SRS
## Z = 1.2121, p-value = 0.2255
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2
## 0.7925357 0.6389956
Now it looks like we are ready to do some analysis looking at Day 7 or 8 POMS as outcome measures.
#Let's get an overview of the data
describe(join7)
## join7
##
## 53 Variables 1583 Observations
## ---------------------------------------------------------------------------
## poms_id
## n missing distinct
## 632 951 632
##
## lowest : 004e6778-08cc-4985-9e1b-fc85622de277 0055f57f-bda3-4f77-b8fa-cf126fdc814a 01028b80-1d58-45ed-b4ea-6fea7ac5d935 0248aa0b-38ba-4eef-bbe8-7e79c1bb7e1a 024be601-de89-488b-b35f-2a85fdb630dc
## highest: fec3f147-5f60-4c4b-8a71-a2358ea2f35b fee6a4a2-6ce1-4995-8fa6-1df682442a53 ff4c36ef-a972-45bb-a334-80431cba81c8 ff8c6d7c-1924-42e5-a9b3-ceeffb6366ff ff9890a7-bec1-499d-afa1-00e5f189bca0
## ---------------------------------------------------------------------------
## adm_id
## n missing distinct
## 1583 0 1583
##
## lowest : 00422cff-ddbe-49e6-a2d9-f4680c0945c2 00bea38f-af7b-4f5f-9655-1ab134c95571 00cea437-fd5a-45d4-859d-ddd7f48f3199 01125a5f-814c-4120-a298-ae6f27d43ef5 012d095b-6cf9-4b06-b7de-50b6d72277de
## highest: ff7964aa-a0c5-49c3-99dc-b39df7d021f2 ff8455e0-44e3-470d-8e86-0893cc101bc0 ff962ad5-22e6-42e4-b35b-3e6cc03ea022 ff9d4b2f-08b0-4a74-8c24-af45b0472a18 ffd1beee-09b8-4203-836b-2bd08ac472f5
## ---------------------------------------------------------------------------
## poms_day
## n missing distinct Info Mean Gmd
## 632 951 2 0.44 7.179 0.2941
##
## 7 (519, 0.821), 8 (113, 0.179)
## ---------------------------------------------------------------------------
## pulm
## n missing distinct
## 1583 0 2
##
## FALSE (1509, 0.953), TRUE (74, 0.047)
## ---------------------------------------------------------------------------
## inf
## n missing distinct
## 1583 0 2
##
## FALSE (1455, 0.919), TRUE (128, 0.081)
## ---------------------------------------------------------------------------
## renal
## n missing distinct
## 1583 0 2
##
## FALSE (1387, 0.876), TRUE (196, 0.124)
## ---------------------------------------------------------------------------
## gastro
## n missing distinct
## 1583 0 2
##
## FALSE (1361, 0.86), TRUE (222, 0.14)
## ---------------------------------------------------------------------------
## cardio
## n missing distinct
## 1583 0 2
##
## FALSE (1547, 0.977), TRUE (36, 0.023)
## ---------------------------------------------------------------------------
## neuro
## n missing distinct
## 1583 0 2
##
## FALSE (1562, 0.987), TRUE (21, 0.013)
## ---------------------------------------------------------------------------
## wound
## n missing distinct
## 1583 0 2
##
## FALSE (1502, 0.949), TRUE (81, 0.051)
## ---------------------------------------------------------------------------
## haem
## n missing distinct
## 1583 0 2
##
## FALSE (1572, 0.993), TRUE (11, 0.007)
## ---------------------------------------------------------------------------
## pain
## n missing distinct
## 1583 0 2
##
## FALSE (1499, 0.947), TRUE (84, 0.053)
## ---------------------------------------------------------------------------
## poms
## n missing distinct
## 1583 0 2
##
## FALSE (1180, 0.745), TRUE (403, 0.255)
## ---------------------------------------------------------------------------
## poms_lo
## n missing distinct
## 1583 0 2
##
## FALSE (1262, 0.797), TRUE (321, 0.203)
## ---------------------------------------------------------------------------
## poms_hi
## n missing distinct
## 1583 0 2
##
## FALSE (1344, 0.849), TRUE (239, 0.151)
## ---------------------------------------------------------------------------
## sex
## n missing distinct
## 1580 3 2
##
## Female (917, 0.58), Male (663, 0.42)
## ---------------------------------------------------------------------------
## age
## n missing distinct Info Mean Gmd .05 .10
## 1583 0 1516 1 58.99 18.63 28.75 34.79
## .25 .50 .75 .90 .95
## 47.63 61.64 71.23 79.10 82.70
##
## lowest : 17.01027 17.24572 18.54620 18.59822 18.74880
## highest: 90.96235 91.05270 91.26626 92.24914 95.44695
## ---------------------------------------------------------------------------
## asa
## n missing distinct Info Mean Gmd
## 1583 0 5 0.772 2.078 0.67
##
## lowest : 1 2 3 4 5, highest: 1 2 3 4 5
##
## 1 (269, 0.170), 2 (945, 0.597), 3 (346, 0.219), 4 (22, 0.014), 5 (1,
## 0.001)
## ---------------------------------------------------------------------------
## opspecialty
## n missing distinct Info Mean Gmd
## 1583 0 6 0.877 2.102 1.401
##
## lowest : 1 2 3 4 5, highest: 2 3 4 5 6
##
## 1 (713, 0.450), 2 (493, 0.311), 3 (102, 0.064), 4 (103, 0.065), 5 (122,
## 0.077), 6 (50, 0.032)
## ---------------------------------------------------------------------------
## op_severity
## n missing distinct Info Mean Gmd
## 1583 0 4 0.521 6.838 1.884
##
## 1 (97, 0.061), 2 (82, 0.052), 4 (167, 0.105), 8 (1237, 0.781)
## ---------------------------------------------------------------------------
## ncepod
## n missing distinct
## 1580 3 2
##
## E (1579, 0.999), U (1, 0.001)
## ---------------------------------------------------------------------------
## malignancy
## n missing distinct Info Sum Mean Gmd
## 1583 0 2 0.146 81 0.05117 0.09716
## ---------------------------------------------------------------------------
## smoking
## n missing distinct Info Mean Gmd
## 1581 2 3 0.807 0.6192 0.7763
##
## 0 (869, 0.550), 1 (445, 0.281), 2 (267, 0.169)
## ---------------------------------------------------------------------------
## alcohol
## n missing distinct Info Mean Gmd
## 1579 4 3 0.762 0.9854 1
##
## 0 (788, 0.499), 1 (26, 0.016), 2 (765, 0.484)
## ---------------------------------------------------------------------------
## ecgposs
## n missing distinct Info Mean Gmd
## 1576 7 3 0.143 1.254 0.4887
##
## 1 (1497, 0.950), 4 (38, 0.024), 8 (41, 0.026)
## ---------------------------------------------------------------------------
## micharls
## n missing distinct Info Sum Mean Gmd
## 1368 215 2 0.138 66 0.04825 0.0919
## ---------------------------------------------------------------------------
## ccfcharls
## n missing distinct Info Sum Mean Gmd
## 1367 216 2 0.035 16 0.0117 0.02315
## ---------------------------------------------------------------------------
## pvdcharls
## n missing distinct Info Sum Mean Gmd
## 1368 215 2 0.161 78 0.05702 0.1076
## ---------------------------------------------------------------------------
## cvdcharls
## n missing distinct Info Sum Mean Gmd
## 1368 215 2 0.144 69 0.05044 0.09586
## ---------------------------------------------------------------------------
## dementiacharls
## n missing distinct Info Mean Gmd
## 1368 215 1 0 0 0
##
## 0 (1368, 1)
## ---------------------------------------------------------------------------
## copdcharls
## n missing distinct Info Sum Mean Gmd
## 1368 215 2 0.333 174 0.1272 0.2222
## ---------------------------------------------------------------------------
## ctdcharls
## n missing distinct Info Sum Mean Gmd
## 1368 215 2 0.062 29 0.0212 0.04153
## ---------------------------------------------------------------------------
## pudcharls
## n missing distinct Info Sum Mean Gmd
## 1368 215 2 0.069 32 0.02339 0.04572
## ---------------------------------------------------------------------------
## hepdxcharls
## n missing distinct Info Mean Gmd
## 1368 215 3 0.054 0.03143 0.06207
##
## 0 (1343, 0.982), 1 (16, 0.012), 3 (9, 0.007)
## ---------------------------------------------------------------------------
## dmcharls
## n missing distinct Info Sum Mean Gmd
## 1367 216 2 0.308 159 0.1163 0.2057
## ---------------------------------------------------------------------------
## hemiplegiacharls
## n missing distinct Info Mean Gmd
## 1367 216 2 0.011 0.007315 0.01459
##
## 0 (1362, 0.996), 2 (5, 0.004)
## ---------------------------------------------------------------------------
## renalcharls
## n missing distinct Info Mean Gmd
## 1368 215 2 0.026 0.01754 0.03481
##
## 0 (1356, 0.991), 2 (12, 0.009)
## ---------------------------------------------------------------------------
## dmendorgancharls
## n missing distinct Info Mean Gmd
## 1368 215 1 0 0 0
##
## 0 (1368, 1)
## ---------------------------------------------------------------------------
## tumourcharls
## n missing distinct Info Mean Gmd
## 1174 409 3 0.157 0.1789 0.3435
##
## 0 (1109, 0.945), 2 (45, 0.038), 6 (20, 0.017)
## ---------------------------------------------------------------------------
## aidscharls
## n missing distinct Info Mean Gmd
## 1368 215 2 0.004 0.008772 0.01753
##
## 0 (1366, 0.999), 6 (2, 0.001)
## ---------------------------------------------------------------------------
## charlsoncomorb
## n missing distinct Info Mean Gmd .05 .10
## 1368 215 11 0.753 0.674 0.9978 0 0
## .25 .50 .75 .90 .95
## 0 0 1 2 3
##
## lowest : 0 1 2 3 4, highest: 6 7 8 9 10
##
## Value 0 1 2 3 4 5 6 7 8 9
## Frequency 843 315 136 34 12 4 12 7 2 2
## Proportion 0.616 0.230 0.099 0.025 0.009 0.003 0.009 0.005 0.001 0.001
##
## Value 10
## Frequency 1
## Proportion 0.001
## ---------------------------------------------------------------------------
## charlsonage
## n missing distinct Info Mean Gmd
## 1368 215 6 0.947 1.662 1.477
##
## lowest : 0 1 2 3 4, highest: 1 2 3 4 5
##
## 0 (378, 0.276), 1 (232, 0.170), 2 (357, 0.261), 3 (283, 0.207), 4 (111,
## 0.081), 5 (7, 0.005)
## ---------------------------------------------------------------------------
## charlsontotal
## n missing distinct Info Mean Gmd .05 .10
## 1368 215 14 0.969 2.336 2.068 0 0
## .25 .50 .75 .90 .95
## 1 2 3 5 5
##
## lowest : 0 1 2 3 4, highest: 9 10 11 12 13
##
## Value 0 1 2 3 4 5 6 7 8 9
## Frequency 278 210 300 253 168 91 27 19 7 7
## Proportion 0.203 0.154 0.219 0.185 0.123 0.067 0.020 0.014 0.005 0.005
##
## Value 10 11 12 13
## Frequency 5 1 1 1
## Proportion 0.004 0.001 0.001 0.001
## ---------------------------------------------------------------------------
## status_at_dc
## n missing distinct Info Sum Mean Gmd
## 1580 3 2 0.017 1571 0.9943 0.01133
## ---------------------------------------------------------------------------
## mortip
## n missing distinct Info Sum Mean Gmd
## 1583 0 2 0.023 12 0.007581 0.01506
## ---------------------------------------------------------------------------
## mort30
## n missing distinct Info Sum Mean Gmd
## 1583 0 2 0.006 3 0.001895 0.003785
## ---------------------------------------------------------------------------
## los
## n missing distinct Info Mean Gmd .05 .10
## 1583 0 78 0.99 10.52 11.15 2.0 2.0
## .25 .50 .75 .90 .95
## 4.0 6.0 11.0 19.0 33.9
##
## lowest : 0 1 2 3 4, highest: 158 165 167 185 201
## ---------------------------------------------------------------------------
## Followup_Readmission_30Days
## n missing distinct Info Sum Mean Gmd
## 1503 80 2 0.059 30 0.01996 0.03915
## ---------------------------------------------------------------------------
## risk_score
## n missing distinct Info Mean Gmd .05 .10
## 1583 0 44 0.985 1.43 1.024 0.381 0.381
## .25 .50 .75 .90 .95
## 0.712 1.158 1.972 2.684 3.380
##
## lowest : 0.000 0.381 0.712 0.777 1.048
## highest: 4.360 4.381 4.762 5.072 5.951
## ---------------------------------------------------------------------------
## sort_mort
## n missing distinct Info Mean Gmd .05
## 1583 0 44 0.985 0.004785 0.005607 0.0009248
## .10 .25 .50 .75 .90 .95
## 0.0009248 0.0012872 0.0020092 0.0045232 0.0091755 0.0182324
##
## lowest : 0.0006319930 0.0009248072 0.0012871973 0.0013735255 0.0018003001
## highest: 0.0471555481 0.0481081419 0.0688814304 0.0916210990 0.1954466227
## ---------------------------------------------------------------------------
## morb_poss
## n missing distinct Info Mean Gmd .05 .10
## 1583 0 147 0.997 0.32 0.1839 0.08549 0.11712
## .25 .50 .75 .90 .95
## 0.20424 0.29318 0.40131 0.57200 0.67261
##
## lowest : 0.05468132 0.06356602 0.06537533 0.07378165 0.07585818
## highest: 0.88079708 0.90114393 0.93819653 0.96373628 0.96544377
## ---------------------------------------------------------------------------
## p_possmort
## n missing distinct Info Mean Gmd .05 .10
## 1583 0 147 0.997 0.02006 0.01938 0.003643 0.004961
## .25 .50 .75 .90 .95
## 0.007875 0.012833 0.021140 0.040762 0.065940
##
## lowest : 0.002227601 0.002600105 0.002637192 0.003077977 0.003121859
## highest: 0.198356194 0.214636630 0.288393341 0.447148173 0.514096264
## ---------------------------------------------------------------------------
## srs_mort
## n missing distinct Info Mean Gmd
## 1583 0 8 0.852 0.02048 0.01469
##
## lowest : 0.0006818628 0.0015780281 0.0036477148 0.0084090681 0.0192652326
## highest: 0.0084090681 0.0192652326 0.0435216350 0.0953494649 0.1962340564
##
## Value 0.0006818628 0.0015780281 0.0036477148 0.0084090681
## Frequency 38 60 103 274
## Proportion 0.024 0.038 0.065 0.173
##
## Value 0.0192652326 0.0435216350 0.0953494649 0.1962340564
## Frequency 816 271 20 1
## Proportion 0.515 0.171 0.013 0.001
## ---------------------------------------------------------------------------
#Let's plot POMS vs the individual predictors add smoothing lines to get a general feel of the data
scatter.smooth(join7$poms ~ join7$age, family = "gaussian",
lpars = list(col = "red"))
scatter.smooth(join7$poms ~ join7$asa, family = "gaussian",
lpars = list(col = "red"))
scatter.smooth(join7$poms ~ join7$opspecialty, family = "gaussian",
lpars = list(col = "red"))
scatter.smooth(join7$poms ~ join7$malignancy, family = "gaussian",
lpars = list(col = "red"))
#Might be easier to understand some of the plots with the points jittered
scatter.smooth(jitter(as.numeric(join7$poms)) ~ jitter(join7$asa),
family = "gaussian", lpars = list(col = "red"))
scatter.smooth(jitter(as.numeric(join7$poms)) ~ jitter(join7$opspecialty),
family = "gaussian", lpars = list(col = "red"))
scatter.smooth(jitter(as.numeric(join7$poms)) ~ jitter(join7$malignancy),
family = "gaussian", lpars = list(col = "red"))
We will fit some Regression Models. Firstly we will look at the model with age as a continuous variable. In the SORT tool, the risk score binned age into a categorical variable.
#Fit a logistic regression model with the same predictor variables as in SORT, but with any POMS outcome the dependent variable
lr_poms <- glm(poms ~ age +
as.factor(asa) +
as.factor(opspecialty) +
as.factor(op_severity) +
as.factor(malignancy),
data = join7, family = binomial)
summary(lr_poms)
##
## Call:
## glm(formula = poms ~ age + as.factor(asa) + as.factor(opspecialty) +
## as.factor(op_severity) + as.factor(malignancy), family = binomial,
## data = join7)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9548 -0.6554 -0.5603 0.6924 2.9745
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.983043 0.555480 -8.971 < 2e-16 ***
## age 0.005609 0.004520 1.241 0.214611
## as.factor(asa)2 0.277849 0.191197 1.453 0.146167
## as.factor(asa)3 1.074435 0.218332 4.921 8.61e-07 ***
## as.factor(asa)4 1.427331 0.504454 2.829 0.004663 **
## as.factor(asa)5 12.654213 324.743789 0.039 0.968917
## as.factor(opspecialty)2 1.641679 0.186901 8.784 < 2e-16 ***
## as.factor(opspecialty)3 -0.568368 0.322841 -1.761 0.078320 .
## as.factor(opspecialty)4 0.180380 0.258682 0.697 0.485612
## as.factor(opspecialty)5 -1.612067 0.442105 -3.646 0.000266 ***
## as.factor(opspecialty)6 -0.270881 0.463594 -0.584 0.559013
## as.factor(op_severity)2 1.298928 0.514030 2.527 0.011506 *
## as.factor(op_severity)4 2.390115 0.459783 5.198 2.01e-07 ***
## as.factor(op_severity)8 2.836190 0.451474 6.282 3.34e-10 ***
## as.factor(malignancy)1 0.750509 0.281624 2.665 0.007700 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1796.1 on 1582 degrees of freedom
## Residual deviance: 1532.9 on 1568 degrees of freedom
## AIC: 1562.9
##
## Number of Fisher Scoring iterations: 11
#Fit another logistic regression model, but this time with the predictor variables in SORT as the discrete variables, eg specific ASA-PS == 3, High risk specialty TRUE, and so on
lr_poms2 <- glm(poms ~ (asa == 3) +
(asa == 4) +
(asa == 5) +
(opspecialty >= 2 & opspecialty < 6) +
(op_severity > 7) +
malignancy +
(age >= 65 & age <80) +
(age >= 80),
data = join7, family = binomial)
summary(lr_poms2)
##
## Call:
## glm(formula = poms ~ (asa == 3) + (asa == 4) + (asa == 5) + (opspecialty >=
## 2 & opspecialty < 6) + (op_severity > 7) + malignancy + (age >=
## 65 & age < 80) + (age >= 80), family = binomial, data = join7)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5899 -0.7378 -0.6327 0.8149 2.0457
##
## Coefficients:
## Estimate Std. Error z value
## (Intercept) -1.9608 0.1948 -10.066
## asa == 3TRUE 0.6488 0.1371 4.734
## asa == 4TRUE 0.8416 0.4467 1.884
## asa == 5TRUE 13.5054 324.7437 0.042
## opspecialty >= 2 & opspecialty < 6TRUE 0.5676 0.1419 4.001
## op_severity > 7TRUE 0.2312 0.1581 1.462
## malignancy 1.2223 0.2439 5.011
## age >= 65 & age < 80TRUE 0.2227 0.1387 1.606
## age >= 80TRUE 0.6880 0.2050 3.355
## Pr(>|z|)
## (Intercept) < 2e-16 ***
## asa == 3TRUE 2.21e-06 ***
## asa == 4TRUE 0.059575 .
## asa == 5TRUE 0.966827
## opspecialty >= 2 & opspecialty < 6TRUE 6.31e-05 ***
## op_severity > 7TRUE 0.143653
## malignancy 5.40e-07 ***
## age >= 65 & age < 80TRUE 0.108381
## age >= 80TRUE 0.000793 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1796.1 on 1582 degrees of freedom
## Residual deviance: 1698.1 on 1574 degrees of freedom
## AIC: 1716.1
##
## Number of Fisher Scoring iterations: 11
Theoretically we can do a Logistic Regression with POMS as the outcome variable, however all the cases in the SOuRCe dataset are Elective cases, so we lose one of the dependent variables from the original SORT regression equation.
Also, we need to have a way of testing how well our model performs, so we need to refit the model with a training dataset and then test it against a testing dataset. In the Protopapa et al paper describing the method for developing SORT, the dataset was split into approx. 2/3 for training and 1/3 for testing.
Whilst we want to stay as close as possible to the original SORT predictor variables, we need to adjust our regression equation to account for the fact that we only have 1 level of Operative urgency (this is a purely elective cohort), furthermore there are insufficient numbers in some of the categories for there to be useful information to add to the model adjustments, e.g. there is only 1 patient ASA == 5.
#Split the data set into training and testing
library(caret)
#join7$poms <- as.integer(join7$poms) %>% as.factor()
set.seed(123)
x <- createDataPartition(join7$poms, p = (2/3), list = FALSE, times = 1)
training <- join7[x, ]
testing <- join7[-x, ]
nrow(training)
## [1] 1056
nrow(testing)
## [1] 527
#Fit a model with the training dataset
lr_poms3 <- glm(poms ~ (asa == 3) +
(asa >= 4) +
(opspecialty >= 2 & opspecialty < 6) +
(op_severity > 7) +
malignancy +
(age >= 65 & age <80) +
(age >= 80),
data = training,
family = binomial)
summary(lr_poms3)
##
## Call:
## glm(formula = poms ~ (asa == 3) + (asa >= 4) + (opspecialty >=
## 2 & opspecialty < 6) + (op_severity > 7) + malignancy + (age >=
## 65 & age < 80) + (age >= 80), family = binomial, data = training)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6413 -0.7476 -0.6329 0.7761 1.9626
##
## Coefficients:
## Estimate Std. Error z value
## (Intercept) -2.1426 0.2437 -8.794
## asa == 3TRUE 0.6597 0.1673 3.943
## asa >= 4TRUE 0.4849 0.6722 0.721
## opspecialty >= 2 & opspecialty < 6TRUE 0.6364 0.1729 3.681
## op_severity > 7TRUE 0.3742 0.1960 1.909
## malignancy 1.2545 0.2913 4.307
## age >= 65 & age < 80TRUE 0.2635 0.1718 1.534
## age >= 80TRUE 0.7390 0.2451 3.015
## Pr(>|z|)
## (Intercept) < 2e-16 ***
## asa == 3TRUE 8.04e-05 ***
## asa >= 4TRUE 0.470641
## opspecialty >= 2 & opspecialty < 6TRUE 0.000232 ***
## op_severity > 7TRUE 0.056290 .
## malignancy 1.65e-05 ***
## age >= 65 & age < 80TRUE 0.125131
## age >= 80TRUE 0.002572 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1198.5 on 1055 degrees of freedom
## Residual deviance: 1126.3 on 1048 degrees of freedom
## AIC: 1142.3
##
## Number of Fisher Scoring iterations: 4
#I've had to remove NA values from the testing dataset in order for the Hosmer-Lemeshow test to run
testing <- filter(testing, !is.na(poms))
#Test the model with the testing dataset
library(PredictABEL)
(HL_lrpoms3 <- plotCalibration(data = testing,
cOutcome = match("poms", colnames(testing)),
predRisk = predict(lr_poms3, newdata = testing, type = "response")))
## $Table_HLtest
## total meanpred meanobs predicted observed
## [0.105,0.181) 96 0.143 0.125 13.73 12
## 0.181 68 0.181 0.265 12.34 18
## 0.182 92 0.182 0.185 16.72 17
## [0.217,0.228) 17 0.224 0.235 3.80 4
## [0.228,0.248) 78 0.244 0.282 19.00 22
## [0.248,0.265) 34 0.257 0.265 8.74 9
## [0.265,0.344) 53 0.298 0.264 15.79 14
## [0.344,0.405) 41 0.378 0.268 15.50 11
## [0.405,0.740] 48 0.506 0.562 24.29 27
##
## $Chi_square
## [1] 7.089
##
## $df
## [1] 8
##
## $p_value
## [1] 0.527
#Calculate AUROC and plot the ROC curve
plotROC(data = testing,
cOutcome = match("poms", colnames(testing)),
predrisk = predict(lr_poms3, newdata = testing, type = "response"))
## AUC [95% CI] for the model 1 : 0.628 [ 0.573 - 0.683 ]
#Let's overlay the ROC curve over the ROC curve for POSSUM-morbidity
plotROC(data = testing,
cOutcome = match("poms", colnames(testing)),
predrisk = cbind(predict(lr_poms3, newdata = testing, type = "response"),
testing$morb_poss))
## AUC [95% CI] for the model 1 : 0.628 [ 0.573 - 0.683 ]
## AUC [95% CI] for the model 2 : 0.656 [ 0.601 - 0.712 ]
#Let's compare whether the new model is as good as POSSUM in discrimination
library(pROC)
#Create the ROC objects
x <- predict(lr_poms3, newdata = testing, type = "response") #Temp variable
roc_model3 <- roc(testing$poms ~ x, plot=FALSE, auc=TRUE, ci=TRUE)
roc_POSSUM <- roc(testing$poms ~ testing$morb_poss, plot=FALSE, auc=TRUE, ci=TRUE)
#Test the ROC objects against each other
roc.test(roc_model3, roc_POSSUM)
##
## DeLong's test for two correlated ROC curves
##
## data: roc_model3 and roc_POSSUM
## Z = -0.97032, p-value = 0.3319
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2
## 0.6276917 0.6562322
#library(rms) #This package contains many functions for regression modelling
#
#val.prob(p = predict(lr_poms3, newdata = testing, type = "response"), y = testing$poms, smooth = TRUE)
The AUROC for the Model 3 was 0.628 which implies the model doesn’t perform very well.
We can attempt another model by calculating the SORT risk score as a single regression variable.
#We will use the same training and testing partitions as the previous model but this time create a new column called risk score
set.seed(123)
x <- createDataPartition(join7$poms, p = (2/3), list = FALSE, times = 1)
training <- join7[x, ]
testing <- join7[-x, ]
#Fit a model with the training dataset
lr_poms4 <- glm(poms ~ risk_score,
data = training,
family = binomial)
summary(lr_poms4)
##
## Call:
## glm(formula = poms ~ risk_score, family = binomial, data = training)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5186 -0.7025 -0.6327 1.0336 1.9240
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.87920 0.13870 -13.55 < 2e-16 ***
## risk_score 0.52308 0.07254 7.21 5.58e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1198.5 on 1055 degrees of freedom
## Residual deviance: 1145.3 on 1054 degrees of freedom
## AIC: 1149.3
##
## Number of Fisher Scoring iterations: 4
#I've had to remove NA values from the testing dataset in order for the Hosmer-Lemeshow test to run
testing <- filter(testing, !is.na(poms))
#Test the model with the testing dataset
plotCalibration(data = testing,
cOutcome = match("poms", colnames(testing)),
predRisk = predict(lr_poms4, newdata = testing, type = "response"))
## $Table_HLtest
## total meanpred meanobs predicted observed
## [0.132,0.181) 95 0.156 0.126 14.78 12
## 0.181 68 0.181 0.265 12.34 18
## [0.187,0.219) 78 0.213 0.269 16.58 21
## 0.219 92 0.219 0.185 20.12 17
## [0.239,0.284) 39 0.265 0.333 10.34 13
## [0.284,0.361) 50 0.302 0.200 15.11 10
## [0.361,0.383) 54 0.365 0.278 19.74 15
## [0.383,0.774] 51 0.477 0.549 24.35 28
##
## $Chi_square
## [1] 12.16
##
## $df
## [1] 8
##
## $p_value
## [1] 0.1442
#Calculate AUROC and plot the ROC curve
plotROC(data = testing,
cOutcome = match("poms", colnames(testing)),
predrisk = predict(lr_poms4, newdata = testing, type = "response"))
## AUC [95% CI] for the model 1 : 0.616 [ 0.56 - 0.671 ]
#Let's compare whether the new model is as good as POSSUM in discrimination
library(pROC)
#Create the ROC objects
x <- predict(lr_poms4, newdata = testing, type = "response") #Temp variable
roc_model <- roc(testing$poms ~ x, plot=FALSE, auc=TRUE, ci=TRUE)
roc_POSSUM <- roc(testing$poms ~ testing$morb_poss, plot=FALSE, auc=TRUE, ci=TRUE)
#Test the ROC objects against each other
roc.test(roc_model, roc_POSSUM)
##
## DeLong's test for two correlated ROC curves
##
## data: roc_model and roc_POSSUM
## Z = -1.4241, p-value = 0.1544
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2
## 0.6158995 0.6562322
The AUC doesn’t seem much better for this 4th model
We have thus far been looking at POMS as an outcome measure and seeing how well the SORT variables predict morbidity as defined by that. What if we pick an alternative measure of morbidity. Readmission rates could be one such example
##
## Call:
## glm(formula = Followup_Readmission_30Days ~ (asa == 3) + (asa >=
## 4) + (opspecialty >= 2 & opspecialty < 6) + (op_severity >
## 7) + malignancy + (age >= 65 & age < 80) + (age >= 80), family = binomial,
## data = training)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.3686 -0.2432 -0.1859 -0.1401 3.0424
##
## Coefficients:
## Estimate Std. Error z value
## (Intercept) -3.1125 0.6908 -4.506
## asa == 3TRUE 0.4574 0.4970 0.920
## asa >= 4TRUE -14.1722 2014.9983 -0.007
## opspecialty >= 2 & opspecialty < 6TRUE -1.1125 0.5705 -1.950
## op_severity > 7TRUE -0.3933 0.6485 -0.607
## malignancy -14.2487 868.7154 -0.016
## age >= 65 & age < 80TRUE 0.1781 0.4608 0.386
## age >= 80TRUE -1.0008 1.0673 -0.938
## Pr(>|z|)
## (Intercept) 6.62e-06 ***
## asa == 3TRUE 0.3574
## asa >= 4TRUE 0.9944
## opspecialty >= 2 & opspecialty < 6TRUE 0.0512 .
## op_severity > 7TRUE 0.5442
## malignancy 0.9869
## age >= 65 & age < 80TRUE 0.6992
## age >= 80TRUE 0.3484
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 211.89 on 1009 degrees of freedom
## Residual deviance: 202.45 on 1002 degrees of freedom
## (46 observations deleted due to missingness)
## AIC: 218.45
##
## Number of Fisher Scoring iterations: 17
## $Table_HLtest
## total meanpred meanobs predicted observed
## [2.35e-09,0.0109) 112 0.007 0.018 0.73 2
## [1.09e-02,0.0154) 92 0.013 0.022 1.23 2
## [1.54e-02,0.0226) 52 0.017 0.038 0.87 2
## [2.26e-02,0.0346) 104 0.028 0.010 2.94 1
## 3.46e-02 89 0.035 0.000 3.08 0
## [4.26e-02,0.0775] 44 0.050 0.023 2.21 1
##
## $Chi_square
## [1] 9.23
##
## $df
## [1] 8
##
## $p_value
## [1] 0.3233
## AUC [95% CI] for the model 1 : 0.42 [ 0.222 - 0.619 ]
## AUC [95% CI] for the model 1 : 0.42 [ 0.222 - 0.619 ]
## AUC [95% CI] for the model 2 : 0.666 [ 0.492 - 0.841 ]
##
## Bootstrap test for two correlated ROC curves
##
## data: roc_model and roc_POSSUM
## D = -2.1118, boot.n = 2000, boot.stratified = 1, p-value = 0.0347
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2
## 0.5737062 0.6604708
It may be that POMS at Day 7 or 8 may be too sensitive an outcome measure, as suggested by Ramani. We may therefore need to look at Day 14 or 15 POMS instead.
We will repeat what we did for Regression Model 3 above, but this time with Day 14 or 15 POMS as an outcome measure. This time I will not print the code that went into processing the data, to save space.
## join14
##
## 53 Variables 1753 Observations
## ---------------------------------------------------------------------------
## poms_id
## n missing distinct
## 215 1538 215
##
## lowest : 004178a3-755b-4653-9d06-f12cbcf77459 00709167-8651-4c95-be07-e6fe29734d26 00b165e3-c415-465c-97d6-2ee6f144d104 00eafe16-478e-4b49-8914-6ab567d52bad 0105ed9c-b450-4b7f-8698-c9568e9dcc02
## highest: fc934cb1-42ca-40f6-8a0b-cbf501eae1c4 fd41f555-f83f-4645-b7a7-471f368e7154 fd8a2ce9-ed7a-413b-861d-fdb87255b645 ff761e92-fa68-4bba-abc9-1eb0bf713146 ffe4409b-8bb1-4919-b4c4-a86a7d9b2411
## ---------------------------------------------------------------------------
## adm_id
## n missing distinct
## 1753 0 1753
##
## lowest : 00422cff-ddbe-49e6-a2d9-f4680c0945c2 00bea38f-af7b-4f5f-9655-1ab134c95571 00cea437-fd5a-45d4-859d-ddd7f48f3199 01125a5f-814c-4120-a298-ae6f27d43ef5 012d095b-6cf9-4b06-b7de-50b6d72277de
## highest: ff8455e0-44e3-470d-8e86-0893cc101bc0 ff962ad5-22e6-42e4-b35b-3e6cc03ea022 ff9d4b2f-08b0-4a74-8c24-af45b0472a18 ffb0f43c-c710-43b2-8f7b-620d68522a57 ffd1beee-09b8-4203-836b-2bd08ac472f5
## ---------------------------------------------------------------------------
## poms_day
## n missing distinct Info Mean Gmd
## 215 1538 2 0.505 14.21 0.3379
##
## 14 (169, 0.786), 15 (46, 0.214)
## ---------------------------------------------------------------------------
## pulm
## n missing distinct
## 1753 0 2
##
## FALSE (1724, 0.983), TRUE (29, 0.017)
## ---------------------------------------------------------------------------
## inf
## n missing distinct
## 1753 0 2
##
## FALSE (1695, 0.967), TRUE (58, 0.033)
## ---------------------------------------------------------------------------
## renal
## n missing distinct
## 1753 0 2
##
## FALSE (1687, 0.962), TRUE (66, 0.038)
## ---------------------------------------------------------------------------
## gastro
## n missing distinct
## 1753 0 2
##
## FALSE (1669, 0.952), TRUE (84, 0.048)
## ---------------------------------------------------------------------------
## cardio
## n missing distinct
## 1753 0 2
##
## FALSE (1738, 0.991), TRUE (15, 0.009)
## ---------------------------------------------------------------------------
## neuro
## n missing distinct
## 1753 0 2
##
## FALSE (1746, 0.996), TRUE (7, 0.004)
## ---------------------------------------------------------------------------
## wound
## n missing distinct
## 1753 0 2
##
## FALSE (1698, 0.969), TRUE (55, 0.031)
## ---------------------------------------------------------------------------
## haem
## n missing distinct
## 1753 0 2
##
## FALSE (1749, 0.998), TRUE (4, 0.002)
## ---------------------------------------------------------------------------
## pain
## n missing distinct
## 1753 0 2
##
## FALSE (1720, 0.981), TRUE (33, 0.019)
## ---------------------------------------------------------------------------
## poms
## n missing distinct
## 1753 0 2
##
## FALSE (1598, 0.912), TRUE (155, 0.088)
## ---------------------------------------------------------------------------
## poms_lo
## n missing distinct
## 215 1538 2
##
## FALSE (99, 0.46), TRUE (116, 0.54)
## ---------------------------------------------------------------------------
## poms_hi
## n missing distinct
## 215 1538 2
##
## FALSE (101, 0.47), TRUE (114, 0.53)
## ---------------------------------------------------------------------------
## sex
## n missing distinct
## 1751 2 2
##
## Female (1030, 0.588), Male (721, 0.412)
## ---------------------------------------------------------------------------
## age
## n missing distinct Info Mean Gmd .05 .10
## 1753 0 1670 1 59.32 18.6 28.43 34.70
## .25 .50 .75 .90 .95
## 48.23 62.47 71.70 79.10 82.47
##
## lowest : 17.01027 17.24572 17.34702 18.54620 18.59822
## highest: 90.96235 91.05270 91.26626 92.24914 95.44695
## ---------------------------------------------------------------------------
## asa
## n missing distinct Info Mean Gmd
## 1753 0 5 0.767 2.08 0.6632
##
## lowest : 1 2 3 4 5, highest: 1 2 3 4 5
##
## 1 (292, 0.167), 2 (1055, 0.602), 3 (381, 0.217), 4 (24, 0.014), 5 (1,
## 0.001)
## ---------------------------------------------------------------------------
## opspecialty
## n missing distinct Info Mean Gmd
## 1753 0 6 0.861 2.03 1.351
##
## lowest : 1 2 3 4 5, highest: 2 3 4 5 6
##
## 1 (843, 0.481), 2 (523, 0.298), 3 (102, 0.058), 4 (111, 0.063), 5 (124,
## 0.071), 6 (50, 0.029)
## ---------------------------------------------------------------------------
## op_severity
## n missing distinct Info Mean Gmd
## 1753 0 4 0.509 6.871 1.844
##
## 1 (102, 0.058), 2 (95, 0.054), 4 (174, 0.099), 8 (1382, 0.788)
## ---------------------------------------------------------------------------
## ncepod
## n missing distinct
## 1749 4 2
##
## E (1748, 0.999), U (1, 0.001)
## ---------------------------------------------------------------------------
## malignancy
## n missing distinct Info Sum Mean Gmd
## 1753 0 2 0.151 93 0.05305 0.1005
## ---------------------------------------------------------------------------
## smoking
## n missing distinct Info Mean Gmd
## 1751 2 3 0.812 0.6311 0.7844
##
## 0 (951, 0.543), 1 (495, 0.283), 2 (305, 0.174)
## ---------------------------------------------------------------------------
## alcohol
## n missing distinct Info Mean Gmd
## 1748 5 3 0.762 0.9874 1
##
## 0 (871, 0.498), 1 (28, 0.016), 2 (849, 0.486)
## ---------------------------------------------------------------------------
## ecgposs
## n missing distinct Info Mean Gmd
## 1746 7 3 0.148 1.253 0.4845
##
## 1 (1655, 0.948), 4 (49, 0.028), 8 (42, 0.024)
## ---------------------------------------------------------------------------
## micharls
## n missing distinct Info Sum Mean Gmd
## 1492 261 2 0.132 69 0.04625 0.08827
## ---------------------------------------------------------------------------
## ccfcharls
## n missing distinct Info Sum Mean Gmd
## 1491 262 2 0.036 18 0.01207 0.02387
## ---------------------------------------------------------------------------
## pvdcharls
## n missing distinct Info Sum Mean Gmd
## 1492 261 2 0.163 86 0.05764 0.1087
## ---------------------------------------------------------------------------
## cvdcharls
## n missing distinct Info Sum Mean Gmd
## 1492 261 2 0.156 82 0.05496 0.1039
## ---------------------------------------------------------------------------
## dementiacharls
## n missing distinct Info Mean Gmd
## 1492 261 1 0 0 0
##
## 0 (1492, 1)
## ---------------------------------------------------------------------------
## copdcharls
## n missing distinct Info Sum Mean Gmd
## 1492 261 2 0.332 189 0.1267 0.2214
## ---------------------------------------------------------------------------
## ctdcharls
## n missing distinct Info Sum Mean Gmd
## 1492 261 2 0.076 39 0.02614 0.05095
## ---------------------------------------------------------------------------
## pudcharls
## n missing distinct Info Sum Mean Gmd
## 1492 261 2 0.065 33 0.02212 0.04329
## ---------------------------------------------------------------------------
## hepdxcharls
## n missing distinct Info Mean Gmd
## 1492 261 3 0.053 0.03016 0.05956
##
## 0 (1465, 0.982), 1 (18, 0.012), 3 (9, 0.006)
## ---------------------------------------------------------------------------
## dmcharls
## n missing distinct Info Sum Mean Gmd
## 1491 262 2 0.298 167 0.112 0.1991
## ---------------------------------------------------------------------------
## hemiplegiacharls
## n missing distinct Info Mean Gmd
## 1491 262 2 0.012 0.008048 0.01604
##
## 0 (1485, 0.996), 2 (6, 0.004)
## ---------------------------------------------------------------------------
## renalcharls
## n missing distinct Info Mean Gmd
## 1492 261 2 0.024 0.01609 0.03193
##
## 0 (1480, 0.992), 2 (12, 0.008)
## ---------------------------------------------------------------------------
## dmendorgancharls
## n missing distinct Info Mean Gmd
## 1492 261 2 0.002 0.00134 0.002681
##
## 0 (1491, 0.999), 2 (1, 0.001)
## ---------------------------------------------------------------------------
## tumourcharls
## n missing distinct Info Mean Gmd
## 1291 462 3 0.156 0.1596 0.306
##
## 0 (1220, 0.945), 2 (55, 0.043), 6 (16, 0.012)
## ---------------------------------------------------------------------------
## aidscharls
## n missing distinct Info Mean Gmd
## 1492 261 2 0.006 0.01206 0.0241
##
## 0 (1489, 0.998), 6 (3, 0.002)
## ---------------------------------------------------------------------------
## charlsoncomorb
## n missing distinct Info Mean Gmd .05 .10
## 1492 261 11 0.754 0.6635 0.9762 0 0
## .25 .50 .75 .90 .95
## 0 0 1 2 3
##
## lowest : 0 1 2 3 4, highest: 6 7 8 9 10
##
## Value 0 1 2 3 4 5 6 7 8 9
## Frequency 917 349 145 40 15 5 10 6 2 2
## Proportion 0.615 0.234 0.097 0.027 0.010 0.003 0.007 0.004 0.001 0.001
##
## Value 10
## Frequency 1
## Proportion 0.001
## ---------------------------------------------------------------------------
## charlsonage
## n missing distinct Info Mean Gmd
## 1492 261 6 0.947 1.692 1.471
##
## lowest : 0 1 2 3 4, highest: 1 2 3 4 5
##
## 0 (399, 0.267), 1 (247, 0.166), 2 (396, 0.265), 3 (321, 0.215), 4 (123,
## 0.082), 5 (6, 0.004)
## ---------------------------------------------------------------------------
## charlsontotal
## n missing distinct Info Mean Gmd .05 .10
## 1492 261 14 0.969 2.355 2.046 0 0
## .25 .50 .75 .90 .95
## 1 2 3 5 5
##
## lowest : 0 1 2 3 4, highest: 9 10 11 12 13
##
## Value 0 1 2 3 4 5 6 7 8 9
## Frequency 291 229 330 280 193 95 30 22 8 6
## Proportion 0.195 0.153 0.221 0.188 0.129 0.064 0.020 0.015 0.005 0.004
##
## Value 10 11 12 13
## Frequency 5 1 1 1
## Proportion 0.003 0.001 0.001 0.001
## ---------------------------------------------------------------------------
## status_at_dc
## n missing distinct Info Sum Mean Gmd
## 1750 3 2 0.014 1742 0.9954 0.009106
## ---------------------------------------------------------------------------
## mortip
## n missing distinct Info Sum Mean Gmd
## 1753 0 2 0.019 11 0.006275 0.01248
## ---------------------------------------------------------------------------
## mort30
## n missing distinct Info Sum Mean Gmd
## 1753 0 2 0.003 2 0.001141 0.002281
## ---------------------------------------------------------------------------
## los
## n missing distinct Info Mean Gmd .05 .10
## 1753 0 77 0.992 9.876 9.902 2 3
## .25 .50 .75 .90 .95
## 4 6 9 16 30
##
## lowest : 0 1 2 3 4, highest: 158 165 167 185 201
## ---------------------------------------------------------------------------
## Followup_Readmission_30Days
## n missing distinct Info Sum Mean Gmd
## 1673 80 2 0.058 33 0.01973 0.0387
## ---------------------------------------------------------------------------
## risk_score
## n missing distinct Info Mean Gmd .05 .10
## 1753 0 45 0.984 1.424 1.017 0.381 0.381
## .25 .50 .75 .90 .95
## 0.712 1.158 1.972 2.675 3.309
##
## lowest : 0.000 0.381 0.712 0.777 1.048
## highest: 4.360 4.381 4.762 5.072 5.951
## ---------------------------------------------------------------------------
## sort_mort
## n missing distinct Info Mean Gmd .05
## 1753 0 45 0.984 0.004691 0.005438 0.0009248
## .10 .25 .50 .75 .90 .95
## 0.0009248 0.0012872 0.0020092 0.0045232 0.0090955 0.0170162
##
## lowest : 0.0006319930 0.0009248072 0.0012871973 0.0013735255 0.0018003001
## highest: 0.0471555481 0.0481081419 0.0688814304 0.0916210990 0.1954466227
## ---------------------------------------------------------------------------
## morb_poss
## n missing distinct Info Mean Gmd .05 .10
## 1753 0 150 0.997 0.3202 0.1814 0.08786 0.12347
## .25 .50 .75 .90 .95
## 0.20424 0.29318 0.40131 0.55971 0.67261
##
## lowest : 0.05468132 0.06356602 0.06537533 0.07378165 0.07585818
## highest: 0.88079708 0.90114393 0.93819653 0.96373628 0.96544377
## ---------------------------------------------------------------------------
## p_possmort
## n missing distinct Info Mean Gmd .05 .10
## 1753 0 150 0.997 0.01981 0.01878 0.003643 0.004961
## .25 .50 .75 .90 .95
## 0.008215 0.012833 0.021140 0.039128 0.060426
##
## lowest : 0.002227601 0.002600105 0.002637192 0.003077977 0.003121859
## highest: 0.198356194 0.214636630 0.288393341 0.447148173 0.514096264
## ---------------------------------------------------------------------------
## srs_mort
## n missing distinct Info Mean Gmd
## 1753 0 8 0.85 0.02056 0.01459
##
## lowest : 0.0006818628 0.0015780281 0.0036477148 0.0084090681 0.0192652326
## highest: 0.0084090681 0.0192652326 0.0435216350 0.0953494649 0.1962340564
##
## Value 0.0006818628 0.0015780281 0.0036477148 0.0084090681
## Frequency 41 65 110 301
## Proportion 0.023 0.037 0.063 0.172
##
## Value 0.0192652326 0.0435216350 0.0953494649 0.1962340564
## Frequency 909 305 21 1
## Proportion 0.519 0.174 0.012 0.001
## ---------------------------------------------------------------------------
We now want to fit the model used in Regression Model 3, but to the Day 14 or 15 POMS outcomes.
## [1] 1056
## [1] 697
##
## Call:
## glm(formula = poms ~ (asa == 3) + (asa >= 4) + (opspecialty >=
## 2 & opspecialty < 6) + (op_severity > 7) + malignancy + (age >=
## 65 & age < 80) + (age >= 80), family = binomial, data = training)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.7676 -0.4610 -0.2896 -0.2714 2.8951
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error z value
## (Intercept) -3.4096 0.3997 -8.531
## asa == 3TRUE 0.2609 0.3282 0.795
## asa >= 4TRUE NA NA NA
## opspecialty >= 2 & opspecialty < 6TRUE 0.9842 0.3061 3.215
## op_severity > 7TRUE 0.2591 0.3038 0.853
## malignancy 1.0949 0.3911 2.800
## age >= 65 & age < 80TRUE -0.1327 0.3035 -0.437
## age >= 80TRUE -1.0251 1.0324 -0.993
## Pr(>|z|)
## (Intercept) < 2e-16 ***
## asa == 3TRUE 0.42665
## asa >= 4TRUE NA
## opspecialty >= 2 & opspecialty < 6TRUE 0.00130 **
## op_severity > 7TRUE 0.39362
## malignancy 0.00511 **
## age >= 65 & age < 80TRUE 0.66189
## age >= 80TRUE 0.32073
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 541.26 on 1055 degrees of freedom
## Residual deviance: 512.17 on 1049 degrees of freedom
## AIC: 526.17
##
## Number of Fisher Scoring iterations: 6
## $Table_HLtest
## total meanpred meanobs predicted observed
## [0.0117,0.0362) 72 0.020 0.069 1.42 5
## 0.0362 107 0.036 0.037 3.87 4
## [0.0395,0.0411) 125 0.041 0.064 5.10 8
## [0.0411,0.0719) 54 0.049 0.167 2.67 9
## [0.0719,0.0912) 84 0.079 0.071 6.66 6
## [0.0912,0.1028) 46 0.092 0.261 4.21 12
## 0.1028 88 0.103 0.159 9.05 14
## [0.1030,0.1270) 62 0.111 0.210 6.89 13
## [0.1270,0.3078] 59 0.196 0.153 11.56 9
##
## $Chi_square
## [1] 52.405
##
## $df
## [1] 8
##
## $p_value
## [1] 0
## AUC [95% CI] for the model 1 : 0.639 [ 0.579 - 0.7 ]
## AUC [95% CI] for the model 1 : 0.639 [ 0.579 - 0.7 ]
## AUC [95% CI] for the model 2 : 0.571 [ 0.497 - 0.645 ]
##
## DeLong's test for two correlated ROC curves
##
## data: roc_model and roc_POSSUM
## Z = 1.3028, p-value = 0.1926
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2
## 0.6393841 0.5709481
If we look at the scatter plots in the earlier parts of the analysis, particularly the one of POMS vs Surgical Specialty, the relationship between POMS and the surgical specialty doesn’t seem very strong for the patients who underwent Upper GI, Vascular, Bariatric or Other surgery (opspecialty 3-6
). The strongest predictor appears to be Colorectal surgery out of those specialties. Therefore it may be better to fit a more restricted model.
We will fit a more restricted model with Day 7 or 8 POMS as the outcome, and the only surgical specialty that functions as a predictor is Colorectal surgery (opspecialty == 2
).
## [1] 1056
## [1] 527
##
## Call:
## glm(formula = poms ~ (asa == 3) + (asa >= 4) + (opspecialty ==
## 2 | opspecialty == 3) + (op_severity > 7) + malignancy +
## (age >= 65 & age < 80) + (age >= 80), family = binomial,
## data = training)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7757 -0.7483 -0.6270 0.6805 2.3167
##
## Coefficients:
## Estimate Std. Error z value
## (Intercept) -2.6129 0.2536 -10.302
## asa == 3TRUE 0.7520 0.1693 4.442
## asa >= 4TRUE 1.1225 0.6668 1.684
## opspecialty == 2 | opspecialty == 3TRUE 1.1705 0.1852 6.320
## op_severity > 7TRUE 0.7310 0.2091 3.496
## malignancy 0.9492 0.3033 3.129
## age >= 65 & age < 80TRUE 0.3550 0.1738 2.043
## age >= 80TRUE 0.8318 0.2471 3.366
## Pr(>|z|)
## (Intercept) < 2e-16 ***
## asa == 3TRUE 8.93e-06 ***
## asa >= 4TRUE 0.092267 .
## opspecialty == 2 | opspecialty == 3TRUE 2.61e-10 ***
## op_severity > 7TRUE 0.000472 ***
## malignancy 0.001754 **
## age >= 65 & age < 80TRUE 0.041045 *
## age >= 80TRUE 0.000763 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1198.5 on 1055 degrees of freedom
## Residual deviance: 1099.4 on 1048 degrees of freedom
## AIC: 1115.4
##
## Number of Fisher Scoring iterations: 4
## $Table_HLtest
## total meanpred meanobs predicted observed
## [0.0683,0.135) 131 0.127 0.099 16.67 13
## [0.1346,0.182) 95 0.178 0.179 16.91 17
## [0.1816,0.244) 68 0.191 0.309 12.97 21
## 0.2442 28 0.244 0.179 6.84 5
## [0.2521,0.319) 70 0.282 0.271 19.73 19
## [0.3188,0.334) 49 0.329 0.429 16.10 21
## [0.3340,0.421) 35 0.378 0.200 13.22 7
## [0.4207,0.793] 51 0.550 0.608 28.04 31
##
## $Chi_square
## [1] 15.373
##
## $df
## [1] 8
##
## $p_value
## [1] 0.0523
## AUC [95% CI] for the model 1 : 0.689 [ 0.638 - 0.74 ]
## AUC [95% CI] for the model 1 : 0.689 [ 0.638 - 0.74 ]
## AUC [95% CI] for the model 2 : 0.656 [ 0.601 - 0.712 ]
##
## DeLong's test for two correlated ROC curves
##
## data: roc_model7 and roc_POSSUM
## Z = 1.0148, p-value = 0.3102
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2
## 0.6888933 0.6562322
It seems that the AUROC has improved with this more restricted model.
We will now do the same as Regression Model 7, but changing surgical specialty into factor variable
## [1] 1056
## [1] 527
##
## Call:
## glm(formula = poms ~ (asa == 3) + (asa >= 4) + factor(opspecialty) +
## (op_severity > 7) + malignancy + (age >= 65 & age < 80) +
## (age >= 80), family = binomial, data = training)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9873 -0.6763 -0.5706 0.6051 2.8527
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.0150 0.2858 -10.551 < 2e-16 ***
## asa == 3TRUE 0.8756 0.1794 4.882 1.05e-06 ***
## asa >= 4TRUE 0.9391 0.7171 1.310 0.19034
## factor(opspecialty)2 1.6562 0.2212 7.489 6.95e-14 ***
## factor(opspecialty)3 -0.9048 0.4390 -2.061 0.03928 *
## factor(opspecialty)4 0.3053 0.3170 0.963 0.33545
## factor(opspecialty)5 -1.0367 0.4537 -2.285 0.02232 *
## factor(opspecialty)6 0.1778 0.5294 0.336 0.73702
## op_severity > 7TRUE 1.2822 0.2314 5.541 3.01e-08 ***
## malignancy 0.8847 0.3267 2.708 0.00677 **
## age >= 65 & age < 80TRUE 0.1415 0.1818 0.778 0.43637
## age >= 80TRUE 0.5735 0.2547 2.252 0.02433 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1198.5 on 1055 degrees of freedom
## Residual deviance: 1044.5 on 1044 degrees of freedom
## AIC: 1068.5
##
## Number of Fisher Scoring iterations: 5
## AUC [95% CI] for the model 1 : 0.717 [ 0.667 - 0.766 ]
## AUC [95% CI] for the model 1 : 0.717 [ 0.667 - 0.766 ]
## AUC [95% CI] for the model 2 : 0.656 [ 0.601 - 0.712 ]
##
## DeLong's test for two correlated ROC curves
##
## data: roc_model8 and roc_POSSUM
## Z = 1.8385, p-value = 0.06599
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2
## 0.7165129 0.6562322
Regression Model 8, but change the ASA to dummy variables, ASA == 2, ASA == 3 or ASA >= 4.
## [1] 1056
## [1] 527
##
## Call:
## glm(formula = poms ~ (asa == 2) + (asa == 3) + (asa >= 4) + factor(opspecialty) +
## (op_severity > 7) + malignancy + (age >= 65 & age < 80) +
## (age >= 80), family = binomial, data = training)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9860 -0.7118 -0.5906 0.5962 2.8211
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.2281 0.3250 -9.932 < 2e-16 ***
## asa == 2TRUE 0.3323 0.2287 1.453 0.14626
## asa == 3TRUE 1.1395 0.2574 4.426 9.58e-06 ***
## asa >= 4TRUE 1.2231 0.7431 1.646 0.09978 .
## factor(opspecialty)2 1.6579 0.2213 7.491 6.83e-14 ***
## factor(opspecialty)3 -0.9294 0.4392 -2.116 0.03431 *
## factor(opspecialty)4 0.2962 0.3163 0.937 0.34898
## factor(opspecialty)5 -1.0647 0.4541 -2.345 0.01904 *
## factor(opspecialty)6 0.1807 0.5303 0.341 0.73324
## op_severity > 7TRUE 1.2378 0.2332 5.308 1.11e-07 ***
## malignancy 0.8968 0.3279 2.735 0.00624 **
## age >= 65 & age < 80TRUE 0.1182 0.1826 0.647 0.51753
## age >= 80TRUE 0.5500 0.2552 2.155 0.03115 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1198.5 on 1055 degrees of freedom
## Residual deviance: 1042.3 on 1043 degrees of freedom
## AIC: 1068.3
##
## Number of Fisher Scoring iterations: 5
## AUC [95% CI] for the model 1 : 0.717 [ 0.668 - 0.766 ]
## AUC [95% CI] for the model 1 : 0.717 [ 0.668 - 0.766 ]
## AUC [95% CI] for the model 2 : 0.656 [ 0.601 - 0.712 ]
##
## DeLong's test for two correlated ROC curves
##
## data: roc_model9 and roc_POSSUM
## Z = 1.8521, p-value = 0.06401
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2
## 0.7169971 0.6562322
## [1] 1056
## [1] 527
##
## Call:
## glm(formula = poms ~ asa + factor(opspecialty) + (op_severity >
## 7) + malignancy + (age >= 65 & age < 80) + (age >= 80), family = binomial,
## data = training)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9263 -0.7321 -0.5780 0.6158 2.7757
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.9524 0.3792 -10.424 < 2e-16 ***
## asa 0.5936 0.1245 4.767 1.87e-06 ***
## factor(opspecialty)2 1.6539 0.2210 7.484 7.23e-14 ***
## factor(opspecialty)3 -0.9114 0.4372 -2.084 0.03713 *
## factor(opspecialty)4 0.2805 0.3036 0.924 0.35545
## factor(opspecialty)5 -1.0656 0.4531 -2.352 0.01869 *
## factor(opspecialty)6 0.1732 0.5299 0.327 0.74380
## op_severity > 7TRUE 1.1917 0.2306 5.169 2.36e-07 ***
## malignancy 0.8980 0.3280 2.737 0.00619 **
## age >= 65 & age < 80TRUE 0.1132 0.1823 0.621 0.53477
## age >= 80TRUE 0.5719 0.2542 2.250 0.02447 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1198.5 on 1055 degrees of freedom
## Residual deviance: 1044.8 on 1045 degrees of freedom
## AIC: 1066.8
##
## Number of Fisher Scoring iterations: 5
## AUC [95% CI] for the model 1 : 0.715 [ 0.665 - 0.765 ]
## AUC [95% CI] for the model 2 : 0.656 [ 0.601 - 0.712 ]
##
## DeLong's test for two correlated ROC curves
##
## data: roc_model10 and roc_POSSUM
## Z = 1.7727, p-value = 0.07628
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2
## 0.7145665 0.6562322
library(car)
#Check for multicollinearity
vif(lr_poms10)
## GVIF Df GVIF^(1/(2*Df))
## asa 1.166086 1 1.079855
## factor(opspecialty) 2.107648 5 1.077407
## op_severity > 7 1.556094 1 1.247435
## malignancy 1.109619 1 1.053384
## age >= 65 & age < 80 1.244317 1 1.115489
## age >= 80 1.200709 1 1.095769
# Cook's D plot
# identify D values > 4/(n-k-1)
cutoff <- 4/((nrow(training)-length(lr_poms10$coefficients)-2))
plot(lr_poms10, which=4, cook.levels=cutoff)
# Influence Plot
influencePlot(lr_poms10, id.method="identify", main="Influence Plot", sub="Circle size is proportial to Cook's Distance" )
Let’s introduce some interaction terms.
## [1] 1056
## [1] 527
##
## Call:
## glm(formula = poms ~ asa + factor(opspecialty) + (op_severity >
## 7) + malignancy + (age >= 65 & age < 80) + (age >= 80) +
## (age >= 65 & age < 80):factor(opspecialty) + (age >= 80):factor(opspecialty),
## family = binomial, data = training)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9083 -0.7207 -0.5921 0.6530 2.8344
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error z value
## (Intercept) -4.04600 0.39428 -10.262
## asa 0.58728 0.12539 4.684
## factor(opspecialty)2 1.82386 0.26691 6.833
## factor(opspecialty)3 -1.52219 0.62589 -2.432
## factor(opspecialty)4 0.78533 0.54858 1.432
## factor(opspecialty)5 -1.12719 0.50173 -2.247
## factor(opspecialty)6 0.71612 0.60089 1.192
## op_severity > 7TRUE 1.21916 0.23006 5.299
## malignancy 0.76600 0.33553 2.283
## age >= 65 & age < 80TRUE 0.10168 0.26442 0.385
## age >= 80TRUE 1.03659 0.33007 3.141
## factor(opspecialty)2:age >= 65 & age < 80TRUE 0.01765 0.39677 0.044
## factor(opspecialty)3:age >= 65 & age < 80TRUE 2.83845 1.14021 2.489
## factor(opspecialty)4:age >= 65 & age < 80TRUE -0.45035 0.68520 -0.657
## factor(opspecialty)5:age >= 65 & age < 80TRUE 1.88693 1.38483 1.363
## factor(opspecialty)6:age >= 65 & age < 80TRUE -1.01787 1.28099 -0.795
## factor(opspecialty)2:age >= 80TRUE -1.37305 0.59846 -2.294
## factor(opspecialty)3:age >= 80TRUE 15.35067 882.74369 0.017
## factor(opspecialty)4:age >= 80TRUE -1.23980 0.83168 -1.491
## factor(opspecialty)5:age >= 80TRUE NA NA NA
## factor(opspecialty)6:age >= 80TRUE -14.71122 591.29701 -0.025
## Pr(>|z|)
## (Intercept) < 2e-16 ***
## asa 2.82e-06 ***
## factor(opspecialty)2 8.30e-12 ***
## factor(opspecialty)3 0.01501 *
## factor(opspecialty)4 0.15227
## factor(opspecialty)5 0.02466 *
## factor(opspecialty)6 0.23335
## op_severity > 7TRUE 1.16e-07 ***
## malignancy 0.02243 *
## age >= 65 & age < 80TRUE 0.70058
## age >= 80TRUE 0.00169 **
## factor(opspecialty)2:age >= 65 & age < 80TRUE 0.96452
## factor(opspecialty)3:age >= 65 & age < 80TRUE 0.01280 *
## factor(opspecialty)4:age >= 65 & age < 80TRUE 0.51102
## factor(opspecialty)5:age >= 65 & age < 80TRUE 0.17302
## factor(opspecialty)6:age >= 65 & age < 80TRUE 0.42685
## factor(opspecialty)2:age >= 80TRUE 0.02177 *
## factor(opspecialty)3:age >= 80TRUE 0.98613
## factor(opspecialty)4:age >= 80TRUE 0.13603
## factor(opspecialty)5:age >= 80TRUE NA
## factor(opspecialty)6:age >= 80TRUE 0.98015
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1198.5 on 1055 degrees of freedom
## Residual deviance: 1025.1 on 1036 degrees of freedom
## AIC: 1065.1
##
## Number of Fisher Scoring iterations: 13
## AUC [95% CI] for the model 1 : 0.72 [ 0.67 - 0.77 ]
## AUC [95% CI] for the model 1 : 0.72 [ 0.67 - 0.77 ]
## AUC [95% CI] for the model 2 : 0.656 [ 0.601 - 0.712 ]
##
## DeLong's test for two correlated ROC curves
##
## data: roc_model11 and roc_POSSUM
## Z = 1.9328, p-value = 0.05325
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2
## 0.7203391 0.6562322
We may need to consider that POMS as a binary outcome may not be the best measure for “morbidity”. Instead the number of POMS domains may provide a better way of scoring morbidity. We can create a scale 0-9 where 0 is no POMS domains positive on the day it is tested, and each domain that is positive increases the score by 1 to a maximum of 9, since POMS has 9 organ-system domains.
join7 <- join7 %>% mutate(poms_score = pulm + inf + renal + gastro + cardio + neuro + wound + haem + pain) %>% mutate(poms_score = replace(poms_score, which(los < 7 & is.na(poms_score)), 0))
describe(join7$poms_score)
## join7$poms_score
## n missing distinct Info Mean Gmd
## 1583 0 9 0.584 0.5389 0.8962
##
## lowest : 0 1 2 3 4, highest: 4 5 6 7 8
##
## 0 (1180, 0.745), 1 (185, 0.117), 2 (105, 0.066), 3 (51, 0.032), 4 (24,
## 0.015), 5 (25, 0.016), 6 (9, 0.006), 7 (2, 0.001), 8 (2, 0.001)
join14 <- join14 %>% mutate(poms_score = pulm + inf + renal + gastro + cardio + neuro + wound + haem + pain) %>% mutate(poms_score = replace(poms_score, which(los < 14 & is.na(poms_score)), 0))
describe(join14$poms_score)
## join14$poms_score
## n missing distinct Info Mean Gmd
## 1753 0 9 0.242 0.2002 0.3775
##
## lowest : 0 1 2 3 4, highest: 4 5 6 7 8
##
## 0 (1598, 0.912), 1 (64, 0.037), 2 (44, 0.025), 3 (20, 0.011), 4 (10,
## 0.006), 5 (9, 0.005), 6 (4, 0.002), 7 (2, 0.001), 8 (2, 0.001)
library(dplyr)
library(Hmisc)
#Examine the groups excluded because missing day 7 POMS data
missing_index <- which(!(data_clean$adm_id %in% join7$adm_id))
missing <- data_clean[missing_index,]
describe(missing)
## missing
##
## 40 Variables 243 Observations
## ---------------------------------------------------------------------------
## X
## n missing distinct Info Mean Gmd .05 .10
## 243 0 243 1 976.6 603.2 89.0 183.0
## .25 .50 .75 .90 .95
## 526.5 1050.0 1394.5 1652.2 1718.5
##
## lowest : 21 26 27 30 34, highest: 1786 1788 1799 1808 1817
## ---------------------------------------------------------------------------
## adm_id
## n missing distinct
## 243 0 243
##
## lowest : 019a8341-e77b-4724-9a5c-b89a4cfa7c65 0230c088-b321-40c7-b3c6-40318a817e14 0426657e-faf1-4c03-b160-cf55c0c61b6b 046d5abd-2e8a-4285-b178-04c3d6c09f37 04772ade-a7af-4e3c-b682-2e3389240a35
## highest: faeae1bf-a225-4b63-b1be-1e7ae98e3403 faeb4542-f64c-4590-b285-dddcc28d3a86 fd0cd879-eb76-4f2b-b121-367801294323 fddeb7e1-fdf4-409c-a2f3-ca77293933bf ffb0f43c-c710-43b2-8f7b-620d68522a57
## ---------------------------------------------------------------------------
## sex
## n missing distinct
## 243 0 2
##
## Female (153, 0.63), Male (90, 0.37)
## ---------------------------------------------------------------------------
## age
## n missing distinct Info Mean Gmd .05 .10
## 243 0 241 1 62.41 18.66 26.55 31.47
## .25 .50 .75 .90 .95
## 55.28 66.94 74.50 80.87 82.83
##
## lowest : 17.34702 18.68036 19.38672 19.86585 21.77687
## highest: 87.14853 87.99179 88.36413 89.34976 89.93840
## ---------------------------------------------------------------------------
## asa
## n missing distinct Info Mean Gmd
## 243 0 4 0.741 2.123 0.6233
##
## 1 (32, 0.132), 2 (152, 0.626), 3 (56, 0.230), 4 (3, 0.012)
## ---------------------------------------------------------------------------
## opspecialty
## n missing distinct Info Mean Gmd
## 243 0 6 0.712 1.519 0.7627
##
## lowest : 1 2 3 4 5, highest: 2 3 4 5 6
##
## 1 (156, 0.642), 2 (69, 0.284), 3 (1, 0.004), 4 (14, 0.058), 5 (2, 0.008),
## 6 (1, 0.004)
## ---------------------------------------------------------------------------
## op_severity
## n missing distinct Info Mean Gmd
## 243 0 4 0.416 7.16 1.443
##
## 1 (6, 0.025), 2 (13, 0.053), 4 (21, 0.086), 8 (203, 0.835)
## ---------------------------------------------------------------------------
## ncepod
## n missing distinct value
## 242 1 1 E
##
## E (242, 1)
## ---------------------------------------------------------------------------
## malignancy
## n missing distinct Info Sum Mean Gmd
## 243 0 2 0.227 20 0.0823 0.1517
## ---------------------------------------------------------------------------
## smoking
## n missing distinct Info Mean Gmd
## 243 0 3 0.823 0.6708 0.8235
##
## 0 (129, 0.531), 1 (65, 0.267), 2 (49, 0.202)
## ---------------------------------------------------------------------------
## alcohol
## n missing distinct Info Mean Gmd
## 242 1 3 0.756 0.9091 0.9956
##
## 0 (130, 0.537), 1 (4, 0.017), 2 (108, 0.446)
## ---------------------------------------------------------------------------
## ecgposs
## n missing distinct Info Mean Gmd
## 243 0 3 0.174 1.235 0.4469
##
## 1 (228, 0.938), 4 (12, 0.049), 8 (3, 0.012)
## ---------------------------------------------------------------------------
## micharls
## n missing distinct Info Sum Mean Gmd
## 184 59 2 0.14 9 0.04891 0.09355
## ---------------------------------------------------------------------------
## ccfcharls
## n missing distinct Info Sum Mean Gmd
## 184 59 2 0.032 2 0.01087 0.02162
## ---------------------------------------------------------------------------
## pvdcharls
## n missing distinct Info Sum Mean Gmd
## 184 59 2 0.183 12 0.06522 0.1226
## ---------------------------------------------------------------------------
## cvdcharls
## n missing distinct Info Sum Mean Gmd
## 184 59 2 0.225 15 0.08152 0.1506
## ---------------------------------------------------------------------------
## dementiacharls
## n missing distinct Info Mean Gmd
## 184 59 1 0 0 0
##
## 0 (184, 1)
## ---------------------------------------------------------------------------
## copdcharls
## n missing distinct Info Sum Mean Gmd
## 184 59 2 0.303 21 0.1141 0.2033
## ---------------------------------------------------------------------------
## ctdcharls
## n missing distinct Info Sum Mean Gmd
## 184 59 2 0.183 12 0.06522 0.1226
## ---------------------------------------------------------------------------
## pudcharls
## n missing distinct Info Sum Mean Gmd
## 184 59 2 0.048 3 0.0163 0.03225
## ---------------------------------------------------------------------------
## hepdxcharls
## n missing distinct Info Mean Gmd
## 184 59 3 0.048 0.02717 0.05399
##
## 0 (181, 0.984), 1 (2, 0.011), 3 (1, 0.005)
## ---------------------------------------------------------------------------
## dmcharls
## n missing distinct Info Sum Mean Gmd
## 184 59 2 0.225 15 0.08152 0.1506
## ---------------------------------------------------------------------------
## hemiplegiacharls
## n missing distinct Info Mean Gmd
## 184 59 2 0.016 0.01087 0.02174
##
## 0 (183, 0.995), 2 (1, 0.005)
## ---------------------------------------------------------------------------
## renalcharls
## n missing distinct Info Mean Gmd
## 184 59 1 0 0 0
##
## 0 (184, 1)
## ---------------------------------------------------------------------------
## dmendorgancharls
## n missing distinct Info Mean Gmd
## 184 59 2 0.016 0.01087 0.02174
##
## 0 (183, 0.995), 2 (1, 0.005)
## ---------------------------------------------------------------------------
## tumourcharls
## n missing distinct Info Mean Gmd
## 165 78 3 0.202 0.1697 0.3199
##
## 0 (153, 0.927), 2 (11, 0.067), 6 (1, 0.006)
## ---------------------------------------------------------------------------
## aidscharls
## n missing distinct Info Mean Gmd
## 184 59 2 0.016 0.03261 0.06522
##
## 0 (183, 0.995), 6 (1, 0.005)
## ---------------------------------------------------------------------------
## charlsoncomorb
## n missing distinct Info Mean Gmd
## 184 59 7 0.777 0.7174 1.038
##
## lowest : 0 1 2 3 4, highest: 2 3 4 5 6
##
## 0 (109, 0.592), 1 (45, 0.245), 2 (15, 0.082), 3 (8, 0.043), 4 (4, 0.022),
## 5 (1, 0.005), 6 (2, 0.011)
## ---------------------------------------------------------------------------
## charlsonage
## n missing distinct Info Mean Gmd
## 184 59 5 0.94 2.043 1.439
##
## lowest : 0 1 2 3 4, highest: 0 1 2 3 4
##
## 0 (36, 0.196), 1 (19, 0.103), 2 (52, 0.283), 3 (55, 0.299), 4 (22, 0.120)
## ---------------------------------------------------------------------------
## charlsontotal
## n missing distinct Info Mean Gmd
## 184 59 9 0.97 2.761 2.054
##
## lowest : 0 1 2 3 4, highest: 4 5 6 7 8
##
## 0 (24, 0.130), 1 (23, 0.125), 2 (37, 0.201), 3 (40, 0.217), 4 (35, 0.190),
## 5 (10, 0.054), 6 (7, 0.038), 7 (5, 0.027), 8 (3, 0.016)
## ---------------------------------------------------------------------------
## status_at_dc
## n missing distinct Info Sum Mean Gmd
## 243 0 2 0.012 242 0.9959 0.00823
## ---------------------------------------------------------------------------
## mortip
## n missing distinct Info Sum Mean Gmd
## 243 0 2 0.012 1 0.004115 0.00823
## ---------------------------------------------------------------------------
## mort30
## n missing distinct Info Sum Mean Gmd
## 243 0 2 0.012 1 0.004115 0.00823
## ---------------------------------------------------------------------------
## los
## n missing distinct Info Mean Gmd .05 .10
## 243 0 15 0.676 9.152 3.977 7.0 7.0
## .25 .50 .75 .90 .95
## 7.0 7.0 8.0 9.0 13.8
##
## lowest : 7 8 9 10 11, highest: 26 29 39 104 144
##
## Value 7 8 9 10 11 12 14 16 18 21
## Frequency 166 41 15 2 4 2 1 2 1 1
## Proportion 0.683 0.169 0.062 0.008 0.016 0.008 0.004 0.008 0.004 0.004
##
## Value 26 29 39 104 144
## Frequency 2 3 1 1 1
## Proportion 0.008 0.012 0.004 0.004 0.004
## ---------------------------------------------------------------------------
## Followup_Readmission_30Days
## n missing distinct Info Sum Mean Gmd
## 239 4 2 0.061 5 0.02092 0.04114
## ---------------------------------------------------------------------------
## risk_score
## n missing distinct Info Mean Gmd .05 .10
## 243 0 33 0.979 1.505 1.021 0.381 0.381
## .25 .50 .75 .90 .95
## 0.712 1.158 1.972 2.857 3.281
##
## lowest : 0.000 0.381 0.712 1.048 1.093
## highest: 3.546 3.714 3.877 4.095 4.360
## ---------------------------------------------------------------------------
## sort_mort
## n missing distinct Info Mean Gmd .05
## 243 0 33 0.979 0.004688 0.005073 0.0009248
## .10 .25 .50 .75 .90 .95
## 0.0009248 0.0012872 0.0020092 0.0045232 0.0109252 0.0165448
##
## lowest : 0.0006319930 0.0009248072 0.0012871973 0.0018003001 0.0018830080
## highest: 0.0214572897 0.0252833682 0.0296268398 0.0365795704 0.0471555481
## ---------------------------------------------------------------------------
## morb_poss
## n missing distinct Info Mean Gmd .05 .10
## 243 0 62 0.997 0.3417 0.1725 0.1171 0.1795
## .25 .50 .75 .90 .95
## 0.2315 0.3122 0.4403 0.5597 0.6348
##
## lowest : 0.07378165 0.07585818 0.08548914 0.08786391 0.09279295
## highest: 0.76852478 0.80376594 0.81457258 0.82053848 0.82491373
## ---------------------------------------------------------------------------
## p_possmort
## n missing distinct Info Mean Gmd .05 .10
## 243 0 62 0.997 0.02043 0.01721 0.005103 0.007109
## .25 .50 .75 .90 .95
## 0.009183 0.014540 0.024941 0.039994 0.050874
##
## lowest : 0.003077977 0.003121859 0.003541702 0.003643356 0.003695268
## highest: 0.095989688 0.104968369 0.116006857 0.139553908 0.141267768
## ---------------------------------------------------------------------------
## srs_mort
## n missing distinct Info Mean Gmd
## 243 0 7 0.823 0.02182 0.01388
##
## lowest : 0.0006818628 0.0015780281 0.0036477148 0.0084090681 0.0192652326
## highest: 0.0036477148 0.0084090681 0.0192652326 0.0435216350 0.0953494649
##
## Value 0.0006818628 0.0015780281 0.0036477148 0.0084090681
## Frequency 3 5 12 38
## Proportion 0.012 0.021 0.049 0.156
##
## Value 0.0192652326 0.0435216350 0.0953494649
## Frequency 133 50 2
## Proportion 0.547 0.206 0.008
## ---------------------------------------------------------------------------
t.test(missing$age, join7$age)
##
## Welch Two Sample t-test
##
## data: missing$age and join7$age
## t = 2.9138, df = 313.75, p-value = 0.003827
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.112344 5.738259
## sample estimates:
## mean of x mean of y
## 62.41368 58.98838
t.test(missing$morb_poss, join7$morb_poss)
##
## Welch Two Sample t-test
##
## data: missing$morb_poss and join7$morb_poss
## t = 1.9808, df = 335.66, p-value = 0.04843
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.0001503437 0.0431870073
## sample estimates:
## mean of x mean of y
## 0.3416967 0.3200280
t.test(missing$p_possmort, join7$p_possmort)
##
## Welch Two Sample t-test
##
## data: missing$p_possmort and join7$p_possmort
## t = 0.25004, df = 406.46, p-value = 0.8027
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.002555180 0.003299924
## sample estimates:
## mean of x mean of y
## 0.02043441 0.02006204
t.test(missing$srs_mort, join7$srs_mort)
##
## Welch Two Sample t-test
##
## data: missing$srs_mort and join7$srs_mort
## t = 1.3428, df = 339.15, p-value = 0.1802
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.0006233685 0.0033055852
## sample estimates:
## mean of x mean of y
## 0.0218202 0.0204791
t.test(missing$sort_mort, join7$sort_mort)
##
## Welch Two Sample t-test
##
## data: missing$sort_mort and join7$sort_mort
## t = -0.2111, df = 425.91, p-value = 0.8329
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.0010063686 0.0008111643
## sample estimates:
## mean of x mean of y
## 0.004687562 0.004785164
wilcox.test(missing$asa, join7$asa)
##
## Wilcoxon rank sum test with continuity correction
##
## data: missing$asa and join7$asa
## W = 199400, p-value = 0.2924
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(missing$los, join7$los)
##
## Wilcoxon rank sum test with continuity correction
##
## data: missing$los and join7$los
## W = 245070, p-value = 4.64e-12
## alternative hypothesis: true location shift is not equal to 0
plotCalibration(data = join14,
cOutcome = match("poms", colnames(join14)),
predRisk = predict(lr_poms10, newdata = join14, type = "response"),
plottitle = "Calibration plot for SORT-morbidity using Day 14 POMS")
## $Table_HLtest
## total meanpred meanobs predicted observed
## [0.0118,0.0769) 182 0.065 0.011 11.78 2
## [0.0769,0.1265) 169 0.107 0.024 18.06 4
## [0.1265,0.1864) 362 0.164 0.061 59.54 22
## [0.1864,0.1897) 261 0.188 0.031 49.17 8
## [0.1897,0.2660) 151 0.239 0.113 36.15 17
## [0.2660,0.2931) 137 0.270 0.066 37.05 9
## [0.2931,0.3734) 141 0.317 0.071 44.75 10
## [0.3734,0.5463) 238 0.449 0.206 106.80 49
## [0.5463,0.8951] 112 0.685 0.304 76.66 34
##
## $Chi_square
## [1] 305.915
##
## $df
## [1] 8
##
## $p_value
## [1] 0
x <- predict(lr_poms10, newdata = join14, type = "response") #Temp variable
roc_join14model10 <- roc(join14$poms ~ x, plot=FALSE, auc=TRUE, ci=TRUE)
roc_join14POSSUM <- roc(join14$poms ~ join14$morb_poss, plot=FALSE, auc=TRUE, ci=TRUE)
#Test the ROC objects against each other
roc.test(roc_join14model10, roc_join14POSSUM)
##
## DeLong's test for two correlated ROC curves
##
## data: roc_join14model10 and roc_join14POSSUM
## Z = 4.4183, p-value = 9.947e-06
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2
## 0.7332411 0.6003452
plot(roc_join14model10)
##
## Call:
## roc.formula(formula = join14$poms ~ x, plot = FALSE, auc = TRUE, ci = TRUE)
##
## Data: x in 1598 controls (join14$poms FALSE) < 155 cases (join14$poms TRUE).
## Area under the curve: 0.7332
## 95% CI: 0.6908-0.7757 (DeLong)
The sensitivity analysis also shows that the SORT-morbidity model has superior discrimination compared to POSSUM when used to predict the presence of POMS-defined morbidity at Day 14 or 15 postoperatively (AUROC = 0.7332411, 0.6907762, 0.7332411, 0.7757061 vs. 0.6003452, 0.5497408, 0.6003452, 0.6509496; z = 4.4183297, p = 9.946662410^{-6}).
The BJA Manuscript reviewers have returned with some comments about a sensitivity analysis. This report is an attempt to conduct this.
library(rms)
We will use Frank Harrell’s rms
package to do easily plot calibration curves and obtain AUROCS. The model we will examine is:
summary(lr_poms9)
##
## Call:
## glm(formula = poms ~ (asa == 2) + (asa == 3) + (asa >= 4) + factor(opspecialty) +
## (op_severity > 7) + malignancy + (age >= 65 & age < 80) +
## (age >= 80), family = binomial, data = training)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9860 -0.7118 -0.5906 0.5962 2.8211
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.2281 0.3250 -9.932 < 2e-16 ***
## asa == 2TRUE 0.3323 0.2287 1.453 0.14626
## asa == 3TRUE 1.1395 0.2574 4.426 9.58e-06 ***
## asa >= 4TRUE 1.2231 0.7431 1.646 0.09978 .
## factor(opspecialty)2 1.6579 0.2213 7.491 6.83e-14 ***
## factor(opspecialty)3 -0.9294 0.4392 -2.116 0.03431 *
## factor(opspecialty)4 0.2962 0.3163 0.937 0.34898
## factor(opspecialty)5 -1.0647 0.4541 -2.345 0.01904 *
## factor(opspecialty)6 0.1807 0.5303 0.341 0.73324
## op_severity > 7TRUE 1.2378 0.2332 5.308 1.11e-07 ***
## malignancy 0.8968 0.3279 2.735 0.00624 **
## age >= 65 & age < 80TRUE 0.1182 0.1826 0.647 0.51753
## age >= 80TRUE 0.5500 0.2552 2.155 0.03115 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1198.5 on 1055 degrees of freedom
## Residual deviance: 1042.3 on 1043 degrees of freedom
## AIC: 1068.3
##
## Number of Fisher Scoring iterations: 5
We have identified some low-grade complications for POMS (e.g. urinary catheter, pain).
val.prob(y = join7$poms_lo, logit = predict(lr_poms9, newdata = join7))
## Dxy C (ROC) R2 D D:Chi-sq
## 4.950408e-01 7.475204e-01 1.710515e-01 1.143925e-01 1.820834e+02
## D:p U U:Chi-sq U:p Q
## NA 1.395459e-02 2.409012e+01 5.873509e-06 1.004379e-01
## Brier Intercept Slope Emax E90
## 1.415803e-01 -3.158285e-01 1.007832e+00 1.779985e-01 7.696361e-02
## Eavg S:z S:p
## 4.881226e-02 -3.523574e+00 4.257684e-04
HL_pomslo <- plotCalibration(join7,
cOutcome = match("poms_lo", colnames(join7)),
predRisk = predict(lr_poms9, newdata = join7, "response"))
roc_pomslo <- roc(join7$poms_lo ~ predict(lr_poms9, newdata = join7, "response"), plot=FALSE, auc=TRUE, ci=TRUE)
Conversely, there are some high-grade complications (e.g. cardiac ischaemia).
val.prob(y = join7$poms_hi, logit = predict(lr_poms9, newdata = join7))
## Dxy C (ROC) R2 D D:Chi-sq
## 4.493923e-01 7.246962e-01 8.506971e-03 4.246795e-03 7.722676e+00
## D:p U U:Chi-sq U:p Q
## NA 7.106473e-02 1.144955e+02 0.000000e+00 -6.681794e-02
## Brier Intercept Slope Emax E90
## 1.321966e-01 -8.742130e-01 8.272451e-01 4.544539e-01 1.909423e-01
## Eavg S:z S:p
## 1.016634e-01 -5.301993e+00 1.145454e-07
HL_pomshi <- plotCalibration(join7,
cOutcome = match("poms_hi", colnames(join7)),
predRisk = predict(lr_poms9, newdata = join7, "response"))
roc_pomshi <- roc(join7$poms_hi ~ predict(lr_poms9, newdata = join7, "response"), plot=FALSE, auc=TRUE, ci=TRUE)
We can also see how well the model predicts POMS outcomes 2 weeks postoperatively.
val.prob(y = join14$poms, logit = predict(lr_poms9, newdata = join14))
## Dxy C (ROC) R2 D D:Chi-sq
## 4.691388e-01 7.345694e-01 -3.320648e-01 -1.398194e-01 -2.441034e+02
## D:p U U:Chi-sq U:p Q
## NA 1.977038e-01 3.485748e+02 0.000000e+00 -3.375232e-01
## Brier Intercept Slope Emax E90
## 1.085156e-01 -1.477501e+00 8.941401e-01 5.224268e-01 2.748685e-01
## Eavg S:z S:p
## 1.609582e-01 -1.035485e+01 3.978216e-25
HL_poms14 <- plotCalibration(join14,
cOutcome = match("poms", colnames(join14)),
predRisk = predict(lr_poms9, newdata = join14, "response"))
roc_poms14 <- roc(join14$poms ~ predict(lr_poms9, newdata = join14, "response"), plot=FALSE, auc=TRUE, ci=TRUE)
Now we see how well the model predicts POMS outcomes 3 weeks postoperatively.
val.prob(y = join21$poms, logit = predict(lr_poms9, newdata = join21))
## Dxy C (ROC) R2 D D:Chi-sq
## 5.848879e-01 7.924439e-01 -1.420562e+00 -3.349814e-01 -5.868924e+02
## D:p U U:Chi-sq U:p Q
## NA 3.774939e-01 6.645018e+02 0.000000e+00 -7.124753e-01
## Brier Intercept Slope Emax E90
## 9.298428e-02 -2.326692e+00 1.081127e+00 6.603238e-01 3.795625e-01
## Eavg S:z S:p
## 2.105892e-01 -1.342258e+01 4.459248e-41
HL_poms21 <- plotCalibration(join21,
cOutcome = match("poms", colnames(join21)),
predRisk = predict(lr_poms9, newdata = join21, "response"))
roc_poms21 <- roc(join21$poms ~ predict(lr_poms9, newdata = join21, "response"), plot=FALSE, auc=TRUE, ci=TRUE)
Lastly, we see how well the model predicts POMS outcomes 4 weeks postoperatively.
val.prob(y = join28$poms, logit = predict(lr_poms9, newdata = join28))
## Dxy C (ROC) R2 D D:Chi-sq
## 6.223219e-01 8.111610e-01 -2.375358e+00 -4.164674e-01 -7.373967e+02
## D:p U U:Chi-sq U:p Q
## NA 4.449196e-01 7.908425e+02 0.000000e+00 -8.613870e-01
## Brier Intercept Slope Emax E90
## 9.269795e-02 -2.770420e+00 1.048215e+00 7.618127e-01 4.151957e-01
## Eavg S:z S:p
## 2.247681e-01 -1.363655e+01 2.427874e-42
HL_poms28 <- plotCalibration(join28,
cOutcome = match("poms", colnames(join28)),
predRisk = predict(lr_poms9, newdata = join28, "response"))
roc_poms28 <- roc(join28$poms ~ predict(lr_poms9, newdata = join28, "response"), plot=FALSE, auc=TRUE, ci=TRUE)
It looks as though the model is well-calibrated for POMS_lo but mis-calibrated for all the outcomes. However it still remains good at discriminating all the outcomes. Therefore the sensible thing would be to fit another model for the high-grade morbidity outcomes:
#Split the data set into training and testing
set.seed(123)
x <- createDataPartition(join7$poms, p = (2/3), list = FALSE, times = 1)
training <- join7[x, ]
testing <- join7[-x, ]
#Fit a model with the training dataset
lr_poms_hi <- glm(poms_hi ~ (asa == 2) +
(asa == 3) +
(asa >= 4) +
factor(opspecialty) +
(op_severity > 7) +
malignancy +
(age >= 65 & age <80) +
(age >= 80),
data = training,
family = binomial)
summary(lr_poms9)
##
## Call:
## glm(formula = poms ~ (asa == 2) + (asa == 3) + (asa >= 4) + factor(opspecialty) +
## (op_severity > 7) + malignancy + (age >= 65 & age < 80) +
## (age >= 80), family = binomial, data = training)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9860 -0.7118 -0.5906 0.5962 2.8211
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.2281 0.3250 -9.932 < 2e-16 ***
## asa == 2TRUE 0.3323 0.2287 1.453 0.14626
## asa == 3TRUE 1.1395 0.2574 4.426 9.58e-06 ***
## asa >= 4TRUE 1.2231 0.7431 1.646 0.09978 .
## factor(opspecialty)2 1.6579 0.2213 7.491 6.83e-14 ***
## factor(opspecialty)3 -0.9294 0.4392 -2.116 0.03431 *
## factor(opspecialty)4 0.2962 0.3163 0.937 0.34898
## factor(opspecialty)5 -1.0647 0.4541 -2.345 0.01904 *
## factor(opspecialty)6 0.1807 0.5303 0.341 0.73324
## op_severity > 7TRUE 1.2378 0.2332 5.308 1.11e-07 ***
## malignancy 0.8968 0.3279 2.735 0.00624 **
## age >= 65 & age < 80TRUE 0.1182 0.1826 0.647 0.51753
## age >= 80TRUE 0.5500 0.2552 2.155 0.03115 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1198.5 on 1055 degrees of freedom
## Residual deviance: 1042.3 on 1043 degrees of freedom
## AIC: 1068.3
##
## Number of Fisher Scoring iterations: 5
val.prob(y = testing$poms_hi, logit = predict(lr_poms_hi, newdata = testing))
## Dxy C (ROC) R2 D D:Chi-sq
## 0.399186257 0.699593128 0.119703958 0.068834868 37.275975326
## D:p U U:Chi-sq U:p Q
## NA -0.002641733 0.607806771 0.737932158 0.071476601
## Brier Intercept Slope Emax E90
## 0.118276423 -0.169306583 0.883451908 0.090627914 0.035036177
## Eavg S:z S:p
## 0.019355148 0.522449243 0.601357577