install.packages()
:
dplyr
lubridate
RCT
install.packages("tidyverse")
library(tidyverse)
RCT <- read.csv("../data/course_exemplar2.csv")
library(dplyr)
filter(RCT, age >= 65)
## id recruit age gender random ps0h ps3h ps12 ps24h ps168h move12h
## 1 1 1 Jan 2014 80 F drain 0 0 0 2 2 2
## 2 2 2 Jan 2014 72 f drain 1 2 2 1 2 2
## 3 3 3 Jan 2014 72 F drain 0 1 0 0 0 0
## 4 5 5 Jan 2014 84 f skin 0 0 2 3 NA 1
## 5 6 6 Jan 2014 72 F skin 1 2 2 2 NA 1
## 6 7 7 Jan 2014 65 F skin 0 1 0 0 0 0
## 7 8 8 Jan 2014 75 F drain 0 0 0 0 0 0
## 8 12 12 Jan 2014 72 F skin 0 1 2 2 1 2
## 9 13 13 Jan 2014 75 f skin 0 2 1 0 1 2
## 10 15 15 Jan 2014 95 f skin 0 1 1 1 NA 1
## 11 18 18 Jan 2014 73 F drain 0 0 0 0 0 0
## 12 19 19 Jan 2014 73 f drain 0 0 0 0 0 0
## 13 21 21 Jan 2014 86 M drain 0 2 2 2 1 3
## 14 23 23 Jan 2014 66 f skin 0 1 1 1 NA 1
## 15 24 24 Jan 2014 73 f drain 0 0 0 0 NA 0
## 16 26 26 Jan 2014 80 f drain 0 1 0 1 0 1
## 17 27 27 Jan 2014 85 f drain 0 1 1 1 NA 1
## 18 31 31 Jan 2014 73 f drain 0 0 0 0 NA 0
## 19 33 2 Feb 2014 79 f drain 0 1 4 1 1 2
## 20 34 3 Feb 2014 77 f drain 0 0 0 0 0 0
## 21 35 4 Feb 2014 81 F skin 0 0 5 5 NA 5
## 22 37 6 Feb 2014 67 f drain 1 0 0 1 NA 2
## 23 38 7 Feb 2014 80 F drain 0 0 0 1 0 3
## 24 39 8 Feb 2014 66 F skin 0 3 4 2 3 1
## 25 40 9 Feb 2014 66 F drain 0 1 3 3 0 2
## 26 43 12 Feb 2014 79 F skin 0 0 3 0 0 0
## 27 44 13 Feb 2014 70 F skin 1 1 1 1 0 1
## 28 46 15 Feb 2014 71 F skin 0 0 0 0 0 0
## 29 49 18 Feb 2014 70 F skin 0 1 3 0 1 0
## 30 50 19 Feb 2014 65 F skin 0 2 3 3 2 3
## 31 52 21 Feb 2014 66 F skin 0 1 1 0 0 1
## 32 54 23 Feb 2014 76 F drain 0 3 2 2 0 0
## 33 55 24 Feb 2014 74 F skin 1 2 5 1 1 0
## 34 57 26 Feb 2014 75 F drain 0 2 2 0 0 2
## 35 58 27 Feb 2014 77 F drain 0 2 2 1 0 2
## 36 59 28 Feb 2014 77 F skin 0 0 1 0 0 1
## 37 60 1 Mar 2014 85 F drain 0 3 3 4 0 3
## 38 62 3 Mar 2014 79 f drain 0 1 1 1 1 2
## 39 64 5 Mar 2014 80 f drain 0 0 0 2 1 0
## move24h move168h tylenol codeine oramorph
## 1 0 0 8g 0 0
## 2 1 2 8g 210mg 0
## 3 0 0 6g 0 0
## 4 1 NA 8g 0 20mg
## 5 1 NA 8g 0 20mg
## 6 0 0 4g 0 0
## 7 0 0 0 0 0
## 8 2 2 8g 0 0
## 9 0 1 4g 0 10mg
## 10 1 NA 8g 0 0
## 11 0 0 8g 0 5mg
## 12 0 1 0 0 0
## 13 2 1 8g 0 0
## 14 1 NA 6g 0 0
## 15 0 NA 4g 60mg 0
## 16 1 0 4g 0 0
## 17 1 NA 4g 0 0
## 18 0 NA 0 0 0
## 19 4 1 4g 0 50mg
## 20 0 0 8g 0 0
## 21 5 NA 8g 0 0
## 22 2 NA 4g 0 0
## 23 3 0 4g 0 0
## 24 1 3 8g 0 30mg
## 25 2 0 3g 0 0
## 26 0 0 1g 0 0
## 27 1 0 8g 0 0
## 28 0 0 2g 0 0
## 29 0 0 4g 0 20mg
## 30 3 1 4g 0 0
## 31 1 0 8g 0 0
## 32 0 0 4g 0 60mg
## 33 0 0 8g 60mg 0
## 34 0 0 8g 30mg 0
## 35 1 0 8g 0 0
## 36 1 0 4g 0 0
## 37 4 0 4g 60mg 0
## 38 2 0 2g 0 20mg
## 39 0 0 0 0 0
## other los
## 1 0 3
## 2 0 7
## 3 Diclofenac 225mg 4
## 4 0 3
## 5 Amitryptaline 10mg, pregabalin 300mg 10
## 6 0 4
## 7 0 3
## 8 Morphine 20mg 4
## 9 Diclofenac 225mg 3
## 10 0 5
## 11 0 5
## 12 0 2
## 13 0 4
## 14 0 5
## 15 0 2
## 16 0 4
## 17 0 1
## 18 0 0
## 19 Morphing 2.5mg, tramadol 100mg 5
## 20 0 4
## 21 Oxycontin 240mg, pregabalin 200mg, oxynorm 25mg 8
## 22 0 NA
## 23 0 NA
## 24 Morphine Sulphate MR 40mg 4
## 25 Ibuprofen 800mg NA
## 26 0 2
## 27 Ibuprofen 800mg NA
## 28 0 NA
## 29 0 2
## 30 0 0
## 31 Diclofenac 225mg NA
## 32 0 5
## 33 Ibuprofen 1200mg NA
## 34 Morphine Sulphate MR 60mg 4
## 35 Morphine sulphate MR 50mg 4
## 36 0 NA
## 37 0 NA
## 38 0 NA
## 39 0 3
## los_reason satisfaction
## 1 Continued fluid through drain Good
## 2 Continued fluid through drain and bowel stasis Poor
## 3 Urinary incontinence Good
## 4 Continued fluid through drain Good
## 5 Comorbidities- recurrent PE Poor
## 6 Continued fluid through drain Excellent
## 7 Continued fluid through drain Excellent
## 8 Continued fluid through drain Good
## 9 Continued fluid through drain Good
## 10 Continued fluid through drain satisfactory
## 11 Continued fluid through drain Excellent
## 12 Continued fluid through drain Good
## 13 Continued fluid through drain Good
## 14 Continued fluid through drain Good
## 15 Continued fluid through drain Excellent
## 16 Continued fluid through drain Excellent
## 17 Continued fluid through drain Good
## 18 Continued fluid through drain Excellent
## 19 Continued fluid through drain Good
## 20 Continued fluid through drain Good
## 21 Pain due to spinal mets Poor
## 22
## 23
## 24 Continued fluid through drain satisfactory
## 25
## 26 Continued fluid through drain Excellent
## 27
## 28
## 29 Continued fluid through drain Excellent
## 30 Continued fluid through drain Good
## 31
## 32 Continued fluid through drain Good
## 33
## 34 Continued fluid through drain Good
## 35 Continued fluid through drain Good
## 36
## 37
## 38 Continued fluid through drain Satisfactory
## 39 Continued fluid through drain Good
Done! It filters rows from the RCT
data frame where age
is >= 65
.
>
(greater than), >=
(greater than or equal to), and similarly for <
and <=
.!=
means Not Equal to…==
compares equality. Notice that there are two equal signs. This is because in R =
means an assigment, you’re making somethign equal to something else:x = 6
means make the variable called x
equal 6. If you then do x == 8
is a question, is x equal to 8? Here, the answer is a FALSE
RCT_1 <- filter(RCT, age >= 65)
RCT_2 <- select(RCT_1, gender)
table(RCT_2)
## RCT_2
## f F M
## 15 23 1
#The same as:
table(select(filter(RCT, age >= 65), gender))
##
## f F M
## 15 23 1
>= 65
)RCT_1
RCT_2
, and create a table from thatfilter(RCT, age >= 65)
RCT %>% filter(age >= 65)
The %>%
operator (created by the dplyr library) is called a pipe, and it (surprise, surprise) pipes data from one command to the next. So in plain English, the above line filters the data where the age
is >= 65, then selects the Gender.
Now that we have our data’s subset, we can pass it onto other functions in R:
RCT %>% filter(age >= 65) %>%
select(gender) %>% table()
## .
## f F M
## 15 23 1
This says, grab my data labeled RCT
, filter the rows so that we only find patients who have an Age and are 65 and older, select the column called gender
. With that column, make me a summary.
arrange
–> sorts rowsdistinct
–> finds unique valuesmutate
–> creates a new column based on some parameters. Hint: you can use other column names here, useful for finding out the Age of patients, length of time they were in hospital…etcrecode
–> helps you recode values which might have been mislabelledgroup_by
–> This one is great!!RCT %>% arrange(age)
RCT %>% arrange(desc(age))
RCT %>% select(gender) %>% distinct()
RCT %>% filter(age >= 65) %>%
select(gender) %>%
mutate(gender = recode(gender, f = "F"))
It breaks down a dataset into specified groups of rows. When you then apply the functions above on the resulting object they’ll be automatically applied by group. Most importantly, all this is achieved by using the same exact syntax you’d use with an ungrouped object.
RCT %>% group_by(gender) %>% summarise(age_avg = mean(age))
## # A tibble: 3 x 2
## gender age_avg
## <fct> <dbl>
## 1 f NA
## 2 F NA
## 3 M 86
Sadly this won’t work because mean
has a little hissy fit if there are NA’s in the data.
To fix this we add the argument na.rm = TRUE
:
RCT %>% group_by(gender) %>%
summarise(age_avg = mean(age, na.rm = TRUE))
## # A tibble: 3 x 2
## gender age_avg
## <fct> <dbl>
## 1 f 66.2
## 2 F 65.8
## 3 M 86
RCT %>% mutate(gender = recode(gender, f = "F")) %>%
group_by(gender) %>%
summarise(age_avg = mean(age, na.rm = TRUE))
## # A tibble: 2 x 2
## gender age_avg
## <fct> <dbl>
## 1 F 65.9
## 2 M 86
How do we combine all these steps to get insight into the data?
Since this RCT looking at the effects of having a drain vs having skin infiltration on postop pain, let’s see what the mean change in pain scores is at 24h vs baseline:
RCT %>% mutate(ps_change = ps24h - ps0h) %>%
group_by(random) %>%
summarise(mean_ps_change = mean(ps_change, na.rm = TRUE))
# If not using pipes:
RCT$ps_change <- RCT$ps24h - RCT$ps0h
drain <- filter(RCT, random == "drain")
mean(drain$ps_change, na.rm = TRUE)
skin <- filter(RCT, random == "skin")
mean(skin$ps_change, na.rm = TRUE)
RCT %>% group_by(random) %>%
summarise(mean_ps0h = mean(ps0h, na.rm = TRUE),
mean_ps3h = mean(ps3h, na.rm = TRUE),
mean_ps12h = mean(ps12, na.rm = TRUE),
mean_ps24h = mean(ps24h, na.rm = TRUE))
## # A tibble: 2 x 5
## random mean_ps0h mean_ps3h mean_ps12h mean_ps24h
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 drain 0.0882 0.886 0.886 0.971
## 2 skin 0.310 1.41 2.03 1.59
Select Some of our favourite (data wrangling) things and do some practice