Learning ‘coding’ is a pain, and for a ‘one off’ process there will nearly always be some tempting ‘point-and-click’ alternative.
Greatest benefits from learning to code:
Arguably one of the oldest, and most ubiquitous is the ‘comma separated values’ (.csv
) file. This is easily exportable from Microsoft Excel, Apple Numbers, Open Office, Google Sheets…etc.
It’s strength is in it’s simplicity.
.csv
..xlsx
file that was produced in the Excel Hell lecture..csv
file. If all else fails then we have a copy here..csv
file into a folder called data/
nested within your project root folder.Once a sheet has been exported and saved, it can be imported into R.
There’re many ways to do this. These include:
read_csv()
function.Point and click is lovely and easy but sadly not reproducible.
Much better to write down where your data comes from. We are going to do this in 3 steps here:
R
is to use the file.choose()
function.myfile
.read_csv()
to import the data into a data frame (which we in turn name RCT
for convenience).Try right clicking a file in Finder (on a Mac) or Windows Explorer (on a PC). You’ll normally see an option for ‘info’ or ‘properties’ that will show you the path to your file.
install.packages("readr") # install only needed the 1st time
library(readr) # load the readr family of
# functions including read_csv
myfile <- file.choose()
myfile
dataframename <- read_csv(myfile)
You could have done this in one step, but it would have made things harder to read. Hard to read, means difficult to remember, and we are doing our best to avoid that!
There is a function
read.csv()
provided by R butread_csv()
is better. The built-in function has a couple of annoying ‘habits’. If you wish to use it then don’t forget to specify: (1)header = TRUE
which tells the function to expect column names in row 1 instead of data, and (2)stringsAsFactors = FALSE
which is a necessary but annoying reminder to R that you want it to leave ‘strings’ alone and not convert them to ‘labelled’ factors.
RCT
.RCT
.RCT <- read_csv("data/file.csv")
Ok now you’ve managed to get your data into R
! What next?
Spreadsheets in R are called ‘dataframes’.
A data-frame has columns, each identified by a name, and rows for observations.
This means that however you import your data into R within your pipeline, those data will end up as a dataframe.
Let’s do a quick tour of dataframes in R.
We successfully brought in the RCT data and named it RCT
. How do we look at it?
[row, column]
(i.e. which row, then which column).Most of the time the rows are ‘observations’ and we want to pick out ‘characteristics’ of those observations (i.e. the columns). Rather than having to remember the column number, we can just ask for a columm by name using the $
operator: e.g. dataframename$some_column
What are the names of the columns? Hint: try names()
Display in the console the items in the column ps0h
Can you do the same for all the other columns in RCT
?
Can you find out how many patients were randomised to each arm? Hint: you can use the function table()
Can you find out how many subjects there are?
What are the names of the columns? names(RCT)
Display in the console the items in the column RCT$ps0h
Can you do the same for all the other columns in RCT
? RCT$gender
, RCT$age
, etc.
Can you find out how many patients were randomised to each arm? table(RCT$random)
Can you find out how many subjects there are? nrow(RCT)
We have learned:
Exporting an .xlsx
file into a .csv
file.
Import data into R using
Navigate around a dataframe to start getting at some answers from data.
outreach.csv
dataset? What is the average heart rate? How many patients were accepted to ICU (the column is called icu_accept
), and what was the mortality?