Introduction to R and RStudio


Figure 1

RStudio layout

Figure 2

RStudio layout with .R file open

Project Management With RStudio


Figure 1

Screenshot of file manager demonstrating bad project organisation

Seeking Help


Data Structures


Exploring Data Frames


Subsetting Data


Figure 1

Inequality testing

Figure 2

Inequality testing: results of recycling

Creating Publication-Quality Graphics with ggplot2


Figure 1

Blank plot, before adding any mapping aesthetics to ggplot().

Figure 2

Plotting area with axes for a scatter plot of life expectancy vs GDP, with no data points visible.

Figure 3

Scatter plot of life expectancy vs GDP per capita, now showing the data points.

Figure 4

Binned scatterplot of life expectancy versus year showing how life expectancy has increased over time
Binned scatterplot of life expectancy versus year showing how life expectancy has increased over time

Figure 5

Binned scatterplot of life expectancy vs year with color-coded continents showing value of 'aes' function
Binned scatterplot of life expectancy vs year with color-coded continents showing value of ‘aes’ function

Figure 6


Figure 7


Figure 8


Figure 9


Figure 10

Scatter plot of life expectancy vs GDP per capita with a trend line summarising the relationship between variables. The plot illustrates the possibilities for styling visualisations in ggplot2 with data points enlarged, coloured orange, and displayed without transparency.

Figure 11


Figure 12

Scatterplot of GDP vs life expectancy showing logarithmic x-axis data spread
Scatterplot of GDP vs life expectancy showing logarithmic x-axis data spread

Figure 13

Scatter plot of life expectancy vs GDP per capita with a blue trend line summarising the relationship between variables, and gray shaded area indicating 95% confidence intervals for that trend line.

Figure 14

Scatter plot of life expectancy vs GDP per capita with a trend line summarising the relationship between variables. The blue trend line is slightly thicker than in the previous figure.

Figure 15

Scatter plot of life expectancy vs GDP per capita with a trend line summarising the relationship between variables. The plot illustrates the possibilities for styling visualisations in ggplot2 with data points enlarged, coloured orange, and displayed without transparency.

Figure 16


Figure 17


Figure 18


Figure 19


Writing Data


Data Frame Manipulation with dplyr


Figure 1

Diagram illustrating use of select function to select two columns of a data frame If we want to remove one column only from the gapminder data, for example, removing the continent column.


Figure 2

Diagram illustrating how the group by function oraganizes a data frame into groups

Figure 3

Diagram illustrating the use of group by and summarize together to create a new variable

Figure 4


Figure 5


Figure 6


Data Frame Manipulation with tidyr


Figure 1

Diagram illustrating the difference between a wide versus long layout of a data frame

Figure 2

Diagram illustrating the wide format of the gapminder data frame

Figure 3

Diagram illustrating how pivot longer reorganizes a data frame from a wide to long format

Figure 4

Diagram illustrating the long format of the gapminder data

Basic Statistics: describing, modelling and reportingDescribing dataInferential statisticsRegression Modelling


Figure 1


Figure 2


Figure 3


Figure 4


Figure 5


Logistic Regression


Figure 1

We can also look at where the specimens were processed:


Figure 2


Broom


Producing Reports With Quarto


Figure 1

Screenshot of the New Quarto Document dialogue box in RStudio

Figure 2

Schematic of the Quarto rendering process

Figure 3

RStudio versions 1.4 and later include visual markdown editing mode. In visual editing mode, markdown expressions (like **bold words**) are transformed to the formatted appearance (bold words) as you type. This mode also includes a toolbar at the top with basic formatting buttons, similar to what you might see in common word processing software programs. You can turn visual editing on and off by pressing the Icon for turning on and off the visual editing mode in RStudio, which looks like a pair of compasses button in the top right corner of your R Markdown document.


SQL and R


Figure 1

Labelled image of a database table identifying Table, Field, Record and Value These tables can be linked to each other when a field in one table can be matched to a field in another table. To enable this one column in each table is identified as a primary key. A primary key, often designated as PK, is one attribute of an entity that distinguishes it from the other entities (or records) in your table. The primary key must be unique for each row for this to work. A common way to create a primary key in a table is to make an ‘id’ field that contains an auto-generated integer that increases by 1 for each new record. This will ensure that your primary key is unique.


Figure 2

E-R diagram showing the three tables of the database and the relationship between them Relationships between entities and their attributes are represented by lines linking them together. For example, the line linking amr and trust is interpreted as follows: The ‘amr’ entity is related to the ‘trust’ entity through the attributes ‘trst_cd’ and ‘nhs_trust_code’ respectively.


Figure 3

Visualisation of different types of SQL join

Figure 4

E-R diagram showing relationship between police files You will need to:


Best Practices for Writing R Code


Figure 1


Introduction to Reproducibility


Figure 1

They illustrated the differences between the terms with the following diagram: Reproducible produces the same answer: when the same data and same analysis are used. Replicable produces qualitatively similar answers: when different data, the same analysis is used. Robust results show that the work is not dependent on the specificities of the programming language chosen to perform the analysis: when the same data, but a different analysis is used. Generalisable: combining replicable and robust findings allow us to form generalisable results.


Figure 2

Five practices for clincal epidemiology. 1 Study registration, 2 open data, code and materials, 3 Use of reporting guidelines, 4 pre prints 5 Open access Image source:Key challenges in epidemiology: embracing open science


Figure 3

There are a number of components that need to be shared for work to be reproducible, these include: data, code, tools and results

Writing Good Software


Supplemental - Assumption Diagnostics and Regression Trouble Shooting


Figure 1


Figure 2


Figure 3


Figure 4


Figure 5


Figure 6