We can’t teach all of statistics in one hour.
Many of the examples here are not strictly correct, but are used to demonstrate a point.
library(tidyverse)
RCT <- read.csv("http://bit.ly/2XzvW47")
We aspire to discern some truth from the universe.
We take measurements, but cannot measure all patients in a population.
Therefore we have to take a sample, and take measurements from this sample.
We then apply mathematical functions on these raw measurements to try and discern truth.
f(raw measurements) -> Statistics
Our RCT attempts to answer a research question by collecting data from a sample of a population.
Data collection consists in measuring the values of several variables for each member of the sample.
hist(RCT$age)
hist(RCT$ps12)
Have a go with these variables: age, ps12, move24h
mean()
and median()
, for example:mean(RCT$ps12, na.rm = TRUE)
## [1] 1.40625
median(RCT$ps12, na.rm = TRUE)
## [1] 1
IQR(RCT$age, na.rm = TRUE)
## [1] 22
quantile(RCT$age, na.rm = TRUE, probs = c(0.25, 0.75))
## 25% 75%
## 55 77
sd(RCT$age, na.rm = TRUE)
## [1] 14.55191
summary(RCT)
## X id recruit age gender
## Min. : 0.00 Min. : 1.00 1/1/2014 : 1 Min. :29.00 f:25
## 1st Qu.:15.75 1st Qu.:16.75 1/10/2014: 1 1st Qu.:55.00 F:38
## Median :31.50 Median :32.50 1/11/2014: 1 Median :70.50 M: 1
## Mean :31.50 Mean :32.50 1/12/2014: 1 Mean :66.27
## 3rd Qu.:47.25 3rd Qu.:48.25 1/13/2014: 1 3rd Qu.:77.00
## Max. :63.00 Max. :64.00 1/14/2014: 1 Max. :95.00
## (Other) :58 NA's :4
## random ps0h ps3h ps12
## drain:35 Min. :0.0000 Min. :0.000 Min. :0.000
## skin :29 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.000
## Median :0.0000 Median :1.000 Median :1.000
## Mean :0.1905 Mean :1.125 Mean :1.406
## 3rd Qu.:0.0000 3rd Qu.:2.000 3rd Qu.:2.000
## Max. :3.0000 Max. :5.000 Max. :5.000
## NA's :1
## ps24h ps168h move12h move24h
## Min. :0.00 Min. :0.0000 Min. :0.000 Min. :0.0000
## 1st Qu.:0.00 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.0000
## Median :1.00 Median :0.0000 Median :1.000 Median :1.0000
## Mean :1.25 Mean :0.6275 Mean :1.078 Mean :0.9375
## 3rd Qu.:2.00 3rd Qu.:1.0000 3rd Qu.:2.000 3rd Qu.:1.2500
## Max. :5.00 Max. :3.0000 Max. :5.000 Max. :5.0000
## NA's :13
## move168h tylenol codeine oramorph
## Min. :0.0000 8g :31 0 :56 0 :47
## 1st Qu.:0.0000 4g :15 10mg : 1 10mg : 7
## Median :0.0000 0 : 5 120mg: 1 20mg : 4
## Mean :0.6078 2g : 4 210mg: 1 50mg : 2
## 3rd Qu.:1.0000 3g : 3 30mg : 1 30mg : 1
## Max. :4.0000 1g : 2 60mg : 4 5mg : 1
## NA's :13 (Other): 4 (Other): 2
## other los
## 0 :34 Min. : 0.000
## Diclofenac 225mg:10 1st Qu.: 2.000
## Diclofenac 300mg: 2 Median : 3.000
## Diclofenac 75mg : 2 Mean : 3.431
## Ibuprofen 800mg : 2 3rd Qu.: 4.000
## Morphine 20mg : 2 Max. :10.000
## (Other) :12 NA's :13
## los_reason satisfaction
## :12 :12
## Comorbidities- recurrent PE : 2 Excellent :13
## Continued fluid through drain :47 Good :29
## Continued fluid through drain and bowel stasis: 1 Poor : 3
## Pain due to spinal mets : 1 satisfactory: 6
## Urinary incontinence : 1 Satisfactory: 1
##
## fake_age binary_satisfaction
## Min. :43.98 Min. :0.0000
## 1st Qu.:63.13 1st Qu.:0.0000
## Median :76.99 Median :1.0000
## Mean :73.49 Mean :0.6562
## 3rd Qu.:83.01 3rd Qu.:1.0000
## Max. :92.50 Max. :1.0000
##
‘Mean age differs between the two treatment arms’
hist(RCT[RCT$random == "drain", ]$age, , xlim=c(10, 100), ylim=c(0, 20), col = "#FF000080")
hist(RCT[RCT$random == "skin", ]$age, col = "#0000FF80", add = TRUE)
hist(RCT[RCT$random == "drain", ]$fake_age, xlim=c(30, 100), ylim=c(0, 20), col = "#FF000080")
hist(RCT[RCT$random == "skin", ]$fake_age, col = "#0000FF80", add = TRUE)
t.test(RCT$fake_age ~ RCT$random)
##
## Welch Two Sample t-test
##
## data: RCT$fake_age by RCT$random
## t = -9.6394, df = 48.021, p-value = 8.275e-13
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -21.19970 -13.88222
## sample estimates:
## mean in group drain mean in group skin
## 65.54566 83.08662
‘Patient satisfaction after axillary node dissection depends on treatment arm.’
‘Patient satisfaction after axillary node dissection depends on treatment arm.’
tbl <- table(RCT$random, RCT$binary_satisfaction)
tbl
##
## 0 1
## drain 8 27
## skin 14 15
chisq.test(tbl)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: tbl
## X-squared = 3.4854, df = 1, p-value = 0.06191
‘Postoperative pain scores at 12hrs depends on treatment arm.’
‘Postoperative pain scores at 12hrs depends on treatment arm.’
wilcox.test(RCT$ps12 ~ RCT$random, conf.int = TRUE)
## Warning in wilcox.test.default(x = c(0L, 2L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, :
## cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = c(0L, 2L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, :
## cannot compute exact confidence intervals with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: RCT$ps12 by RCT$random
## W = 259, p-value = 0.0005341
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
## -1.9999351 -0.9999614
## sample estimates:
## difference in location
## -1.000061
install.packages(tableone)
library(tableone)
CreateTableOne(data = RCT,
vars = c("age", "ps0h", "los", "satisfaction"),
strata = "random")
## Stratified by random
## drain skin p test
## n 35 29
## age (mean (sd)) 67.50 (14.04) 64.65 (15.32) 0.458
## ps0h (mean (sd)) 0.09 (0.29) 0.31 (0.71) 0.101
## los (mean (sd)) 3.23 (1.55) 3.71 (2.15) 0.357
## satisfaction (%) 0.266
## 4 (11.4) 8 (27.6)
## Excellent 9 (25.7) 4 (13.8)
## Good 18 (51.4) 11 (37.9)
## Poor 1 ( 2.9) 2 ( 6.9)
## satisfactory 2 ( 5.7) 4 (13.8)
## Satisfactory 1 ( 2.9) 0 ( 0.0)
plot(RCT$los ~ RCT$ps3h)
plot(jitter(RCT$los) ~ jitter(RCT$ps3h))
linear_model <- lm(los ~ ps3h, data = RCT)
summary(linear_model)
##
## Call:
## lm(formula = los ~ ps3h, data = RCT)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7703 -1.0648 -0.0648 0.7297 6.2297
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0648 0.3473 8.825 1.08e-11 ***
## ps3h 0.3527 0.2313 1.525 0.134
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.79 on 49 degrees of freedom
## (13 observations deleted due to missingness)
## Multiple R-squared: 0.04532, Adjusted R-squared: 0.02584
## F-statistic: 2.326 on 1 and 49 DF, p-value: 0.1336
plot(RCT$binary_satisfaction ~ RCT$age)
logit_model <- glm(binary_satisfaction ~ age,
data = RCT,
family = binomial)
summary(logit_model)
##
## Call:
## glm(formula = binary_satisfaction ~ age, family = binomial, data = RCT)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7452 -1.3640 0.8001 0.9520 1.0715
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.25316 1.41651 1.591 0.112
## age -0.02324 0.02043 -1.138 0.255
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 76.382 on 59 degrees of freedom
## Residual deviance: 75.006 on 58 degrees of freedom
## (4 observations deleted due to missingness)
## AIC: 79.006
##
## Number of Fisher Scoring iterations: 4