I recently had my first paper written completely in R Markdown accepted for publication by a journal. I thought it would be a good opportunity to talk about my current workflow, since it’s been a while since I last blogged. In some ways the paper itself is irrelevant, but if you’re interested in seeing the paper in its accepted manuscript format, click here.
The first thing about R Markdown is that it has a YAML configuration header. In my manuscript it looks like this:
title: 'Short Report: Evaluating Services at a Major Trauma Centre Before Helipad Construction'
author: "Danny Wong, James Bedford, Simon Luck & Roger Bloomer"
date: "29 July 2016"
The title, author and date parts are quite self-explanatory. The output in this case is word_document, which was chosen because the journal I submitted to wanted that format for its submissions, and also many of my collaborators find it easier to use Word for its track changes feature when we collaborate. csl refers to the citation style language, which pandoc uses to then help decide the format of the references and so on. I keep my references organised using zotero, which allows me to output the reference in a bibtex or biblatex file, that is also referenced in the YAML header. Together, the .csl and .bib files tell pandoc where to find my references and how to display them. Zotero has a cslrepository which has a whole variety of styles to choose from, most of the major journals are supported.
In the next chunk we set up the document with data and with the required packages.
We can see that I load the packages dplyr, lubridate and readxl, and set options(digits = 1) in order to control the number of significant figures that comes out. This is still not the most elegant solution as there are still nagging problems with how the R Markdown output displays numbers after the decimal places, and this has been well-described elsewhere.
The chunk then sets us up to write text with R code inline.
We now get into the meat of the manuscript, and the abstract is what the above chunk of code would produce. Writing median() (replacing the quotation marks with backticks) then produces a number within the text corresponding to the function call, in this case the median ISS score for the patients.
I won’t reproduce the entire source code for the manuscript but will include 2 further chunks to talk about referencing and then figures and tables:
The development of Major Trauma Networks in England was a National requirement set out within the revised 2010/11 NHS England Operating Framework.[@department_of_health_revision_2010; @imison_reconfiguration_2014] King's College Hospital (KCH) began functioning as a Major Trauma Centre (MTC) in April 2010 as part of the South East London Trauma Network, subsequently expanding coverage to also service Kent and Medway in April 2013 as the MTC for the South East London, Kent and Medway Trauma Network (SELKaM).
SELKaM serves a population of approximately 4.5 million, operating a "hub-and-spoke" model, with KCH as the MTC supported by seven trauma units and three local emergency hospitals. Prehospital emergency care services within SELKaM are provided by London Ambulance Service and South East Coast Ambulance Service, with enhanced prehospital medical teams (HEMS) provided by Kent, Surrey and Sussex Air Ambulance Trust and London's Air Ambulance.
Patients transported to KCH by helicopter land at a nearby park necessitating secondary land ambulance transfer to the hospital, with time-critical patients potentially "overflying" KCH to another MTC with an operational helipad. Of the 4 MTCs in London, 2 currently have on-site helicopter landing pads –The Royal London Hospital in Whitechapel, and St. George's Hospital in Tooting. KCH expects to commence operations of a newly-built elevated helipad within the hospital footprint in the second half of 2016. We therefore evaluate the current trauma services at KCH, as part of a service evaluation to assess the future impact of the helipad.
This chunk above demonstrates how we put in references with square brackets and “@”. The [@department_of_health_revision_2010] pulls a reference with that particular identifier from the .bib file specified at the front of the YAML and pandoc inserts it with the appropriate style into the text, when the paper is knitted.
The following code chunk shows how figures are drawn once you knit the paper. High quality figures can be output by calling pdf() and then dev.off(). The .pdf file can then be manipulated in GIMP or photoshop to meet the specifications required by the journal. Unfortunately this is still a necessary step because different journals can be particular about how the image files are uploaded. Presumably in the future I could write a call to Ghostscript or other scriptable graphics device to fully code the entire process for even better reproducibility, but somehow manipulating images in a GUI still yields the best results at the moment.
Lastly, the following chunk shows how I formatted a table for the publication
By using this method, the cells are populated by numbers which are generated from R code and any changes to the data upstream will cascade downstream so that the numbers will reflect these changes. It looks like a wall of gibberish, because it is, but actually once you get used to what you are typing it makes sense.