Introduction to R and RStudio


Figure 1

RStudio layout

Figure 2

RStudio layout with .R file open

Project Management With RStudio


Figure 1

Screenshot of file manager demonstrating bad project organisation

Seeking Help


Data Structures


Exploring Data Frames


Subsetting Data


Figure 1

Inequality testing

Figure 2

Inequality testing: results of recycling

Creating Publication-Quality Graphics with ggplot2


Figure 1

Blank plot, before adding any mapping aesthetics to ggplot().

Figure 2

Plotting area with axes for a scatter plot of life expectancy vs GDP, with no data points visible.

Figure 3

Scatter plot of life expectancy vs GDP per capita, now showing the data points.

Figure 4

Binned scatterplot of life expectancy versus year showing how life expectancy has increased over time
Binned scatterplot of life expectancy versus year showing how life expectancy has increased over time

Figure 5

Binned scatterplot of life expectancy vs year with color-coded continents showing value of 'aes' function
Binned scatterplot of life expectancy vs year with color-coded continents showing value of ‘aes’ function

Figure 6


Figure 7


Figure 8


Figure 9


Figure 10

Scatter plot of life expectancy vs GDP per capita with a trend line summarising the relationship between variables. The plot illustrates the possibilities for styling visualisations in ggplot2 with data points enlarged, coloured orange, and displayed without transparency.

Figure 11


Figure 12

Scatterplot of GDP vs life expectancy showing logarithmic x-axis data spread
Scatterplot of GDP vs life expectancy showing logarithmic x-axis data spread

Figure 13

Scatter plot of life expectancy vs GDP per capita with a blue trend line summarising the relationship between variables, and gray shaded area indicating 95% confidence intervals for that trend line.

Figure 14

Scatter plot of life expectancy vs GDP per capita with a trend line summarising the relationship between variables. The blue trend line is slightly thicker than in the previous figure.

Figure 15

Scatter plot of life expectancy vs GDP per capita with a trend line summarising the relationship between variables. The plot illustrates the possibilities for styling visualisations in ggplot2 with data points enlarged, coloured orange, and displayed without transparency.

Figure 16


Figure 17


Figure 18


Figure 19


Writing Data


Data Frame Manipulation with dplyr


Figure 1

Diagram illustrating use of select function to select two columns of a data frame If we want to remove one column only from the gapminder data, for example, removing the continent column.


Figure 2

Diagram illustrating how the group by function oraganizes a data frame into groups

Figure 3

Diagram illustrating the use of group by and summarize together to create a new variable

Figure 4


Figure 5


Figure 6


Data Frame Manipulation with tidyr


Figure 1

Diagram illustrating the difference between a wide versus long layout of a data frame

Figure 2

Diagram illustrating the wide format of the gapminder data frame

Figure 3

Diagram illustrating how pivot longer reorganizes a data frame from a wide to long format

Figure 4

Diagram illustrating the long format of the gapminder data

Basic Statistics: describing, modelling and reportingDescribing dataInferential statisticsRegression Modelling


Figure 1


Figure 2


Figure 3


Figure 4


Figure 5


Logistic Regression


Figure 1

We can also look at where the specimens were processed:


Figure 2


Broom


Producing Reports With Quarto


Figure 1

Screenshot of the New Quarto Document dialogue box in RStudio

Figure 2

Schematic of the Quarto rendering process

Figure 3

RStudio versions 1.4 and later include visual markdown editing mode. In visual editing mode, markdown expressions (like **bold words**) are transformed to the formatted appearance (bold words) as you type. This mode also includes a toolbar at the top with basic formatting buttons, similar to what you might see in common word processing software programs. You can turn visual editing on and off by pressing the Icon for turning on and off the visual editing mode in RStudio, which looks like a pair of compasses button in the top right corner of your R Markdown document.


Best Practices for Writing R Code


Figure 1


Introduction to Reproducibility


Figure 1

{alt=“Five practices for clincal epidemiology. 1 Study registration, 2 open data, code and materials, 3 Use of reporting guidelines, 4 pre prints 5 Open access} Image source:Key challenges in epidemiology: embracing open science


Automated Version Control


Figure 1

Comic: a PhD student sends "FINAL.doc" to their supervisor, but after several increasingly intense and frustrating rounds of comments and revisions they end up with a file named "FINAL_rev.22.comments49.corrections.10.#@$%WHYDIDCOMETOGRADSCHOOL????.doc"
“notFinal.doc” by Jorge Cham, https://www.phdcomics.com

Figure 2

A diagram demonstrating how a single document grows as the result of sequential changes

Figure 3

A diagram with one source document that has been modified in two different ways to produce two different versions of the document

Figure 4

A diagram that shows the merging of two different document versions into one document that contains all of the changes from both versions

Setting Up Git


Creating a Repository


Figure 1

RStudio screenshot showing the file menu dropdown with "New Project..." selected

Figure 2

RStudio screenshot showing New Project dialog window with "New Directory" selected

Figure 3

RStudio screenshot showing New Project dialog window with "Quarto Website" selected

Figure 4

RStudio screenshot showing New Project wizard dialog window with the Directory name added and the Create a git repository box checked

Figure 5

RStudio screenshot showing the files automatically added to the amr-data-dictionary directory

Figure 6

RStudio screenshot showing updated interface with Git features

Figure 7

RStudio screenshot showing updated interface with Git features

Figure 8

RStudio screenshot showing terminal in lower left pane

Tracking Changes


Figure 1

RStudio screenshot showing the current working directory in the Files tab

Figure 2

RStudio screenshot showing the default content of index.qmd automatically created.

Figure 3

RStudio Git Diff icon, Git tab
RStudio Git Diff icon, Git tab

Figure 4

RStudio screenshot showing changes made to index.qmd and that these are unstaged.
RStudio screenshot showing changes made to index.qmd and that these are unstaged.

Figure 5

RStudio screenshot showing staging of index.qmd and change of ? to A under status column.
RStudio screenshot showing staging of index.qmd and change of ? to A under status column.

Figure 6

RStudio screenshot showing initial commit message for index.qmd.
RStudio screenshot showing initial commit message for index.qmd.

Figure 7

RStudio screenshot showing selection of Diff index.qmd on Git dropdown menu.
RStudio screenshot showing selection of Diff index.qmd on Git dropdown menu.

Figure 8

RStudio screenshot showing a dialogue box with the text “There are no changes to the file "index.qmd" to diff.”.
RStudio screenshot showing a dialogue box with the text “There are no changes to the file "index.qmd" to diff.”.

Figure 9

RStudio screenshot showing selection of History on Git dropdown menu.
RStudio screenshot showing selection of History on Git dropdown menu.

Figure 10

RStudio screenshot showing details of first commit in Git history.
RStudio screenshot showing details of first commit in Git history.

Figure 11

RStudio screenshot showing unstagged changes to index.qmd and the M flag to show that the file has been modified.
RStudio screenshot showing unstagged changes to index.qmd and the M flag to show that the file has been modified.

Figure 12

RStudio screenshot showing dialogue box when attempting to commit an unstaged commit.
RStudio screenshot showing dialogue box when attempting to commit an unstaged commit.

Figure 13

RStudio screenshot showing dialogue box when attempting to commit an unstaged commit.
RStudio screenshot showing dialogue box when attempting to commit an unstaged commit.

Figure 14

RStudio screenshot showing dialogue box when attempting to commit an unstaged commit.
RStudio screenshot showing dialogue box when attempting to commit an unstaged commit.

Figure 15

RStudio screenshot showing dialogue box confirming commit was successful.
RStudio screenshot showing dialogue box confirming commit was successful.

Figure 16

A diagram showing how git add registers changes in the staging area, while git commit moves changes from the staging area to the repository
A diagram showing how git add registers changes in the staging area, while git commit moves changes from the staging area to the repository

Figure 17

A screenshot of RStudio highlighting changes to index.qmd
A screenshot of RStudio highlighting changes to index.qmd

Figure 18

A screenshot of RStudio showing commit message for change
A screenshot of RStudio showing commit message for change

Figure 19

and look at the history git History icon of what we’ve done so far:


Figure 20

A screenshot of RStudio history of the 3 commits
A screenshot of RStudio history of the 3 commits

Figure 21

A diagram showing two documents being separately staged using git add, before being combined into one commit using git commit
A diagram showing two documents being separately staged using git add, before being combined into one commit using git commit

Exploring History


Figure 1

A screenshot showing the modified text of index.qmd

Figure 2

Combined screenshots showing chnages to index.qmd and that HEAD is referring to last commit

Figure 3

Combined screenshots showing changes to index.qmd and that HEAD is referring to last commit

Figure 4

RStuio screenshot of the Revert icon

Figure 5

A diagram showing how git restore can be used to restore the previous version of two files

Figure 6

A diagram showing the entire git workflow: local changes are staged using git add, applied to the local repository using git commit, and can be restored from the repository using git checkout

Ignoring Things


Figure 1

RStudio screenshot showing .gitignore open in the editor pane with the files .Rproj.user, .Rhistory, .RData, and *.Rproj added to the end

Figure 2

RStudio screenshot showing .gitignore commit text

Figure 3

RStudio screenshot showing that the expansion of .gitignore to ignore .png files.

Figure 4

RStudio screenshot showing that the .png file has been ignored.

Remotes in GitHub


Figure 1

The first step in creating a repository on GitHub: clicking the "create new" button

Figure 2

The second step in creating a repository on GitHub: filling out the new repository form to provide the repository name, and specify that neither a readme nor a license should be created

Figure 3

The summary page displayed by GitHub after a new repository has been created. It contains instructions for configuring the new GitHub repository as a git remote

Figure 4

A diagram showing how "git add" registers changes in the staging area, while "git commit" moves changes from the staging area to the repository

Figure 5

A diagram illustrating how the GitHub "recipes" repository is also a git repository like our local repository, but that it is currently empty

Figure 6

HTTPS URL for repository

Figure 7

A screenshot of GitHub showing the local files mirrored in the remote repository

Figure 8

A diagram showing how "git push origin" will push changes from the local repository to the remote, making the remote repository an exact copy of the local repository.

Collaborating


Figure 1

A screenshot of the GitHub Collaborators settings page, which is accessed by clicking "Settings" then "Collaborators"

Figure 2

A screenshot of RStudio New Project wizard dialogue box showing the threee options available

Figure 3

A screenshot of RStudio New Project wizard dialogue box showing Git and SVN as options

Figure 4

A screenshot of RStudio New Project wizard dialogue box showing Git repository information required

Figure 5

A screenshot of GitHub showing clone URL

Figure 6

A screenshot of RStudio dialogue box showing completed details of repo to be cloned.

Figure 7

A screenshot of RStudio dialogue box showing completed details of repo to be cloned.

Figure 8

A diagram showing that "git clone" can create a copy of a remote GitHub repository, allowing a second person to create their own local repository that they can make changes to.

Figure 9

A screenshot of RStudio showing amendments to index.qmd.

Figure 10

A screenshot of RStudio showing commit window and meesage for amended file.

Figure 11

A screenshot of RStudio showing commit window and meesage for amended file.

Figure 12

A screenshot of GitHub showing the commit history of the cloned repo, and showing latest `push`

Figure 13

A screenshot of RStudio showing owner's file and Pull option

Conflicts


Figure 1

A screenshot showing current mirrored state of index.qmd

Figure 2

A screenshot showing collaborator update to index.qmd

Figure 3

A screenshot showing owner update to index.qmd on a different line

Figure 4

A screenshot showing conflict when attempting to Push

Figure 5

A diagram showing a conflict that might occur when two sets of independent changes are merged

Figure 6

A screenshot of dialogue box explaining merge conflicts need to be resolved

Figure 7

A screenshot of dialogue box explaining merge conflicts need to be resolved

Figure 8

A screenshot showing modified index.qmd to resolve merge conflicts

Figure 9

A screenshot of dialogue box explaining that the merge hasn't been resolved and unable to commit

Figure 10

Screenshot showing commit of resolved merge

Figure 11

Screenshot showing updated collaborator file from owner merge edits and push

Branches


Figure 1

diagram showing branch from main then re-merging

Figure 2

diagram showing branch from main then re-merging

Figure 3

creating a new Git branch called restructure

Figure 4

creating a new Git branch called restructure

Figure 5

screenshot in RStudio showing current working branch

Figure 6

GitHub branch icon, indicating number of current branches

Figure 7

GitHub screenshot showing which branches exist and the current active branch

Figure 8

GitHub screenshot showing active branch

Figure 9

GitHub screenshot showing pull requests, none currently open

Figure 10

GitHub screenshot showing comparison options for new pull request

Figure 11

GitHub screenshot showing comparison options for new pull request

Figure 12

GitHub screenshot confirm merge dialogue box

Figure 13

GitHub screenshot showing PR merged and closed

Issues


Figure 1

GH screenshot showing Issues

Figure 2

GH screenshot showing definition of new issue

Figure 3

GH screenshot showing definition of new issue

Figure 4

GH screenshot detail of newly opened issue

Figure 5

GH screenshot showing list of open issues

Figure 6

GH screenshot issue in edit mode

Figure 7

GH screenshot issue in edit mode

Open Science


Licensing


Citation


Hosting


SQL and R


Figure 1

Labelled image of a database table identifying Table, Field, Record and Value These tables can be linked to each other when a field in one table can be matched to a field in another table. To enable this one column in each table is identified as a primary key. A primary key, often designated as PK, is one attribute of an entity that distinguishes it from the other entities (or records) in your table. The primary key must be unique for each row for this to work. A common way to create a primary key in a table is to make an ‘id’ field that contains an auto-generated integer that increases by 1 for each new record. This will ensure that your primary key is unique.


Figure 2

E-R diagram showing the three tables of the database and the relationship between them Relationships between entities and their attributes are represented by lines linking them together. For example, the line linking amr and trust is interpreted as follows: The ‘amr’ entity is related to the ‘trust’ entity through the attributes ‘trst_cd’ and ‘nhs_trust_code’ respectively.


Figure 3

Visualisation of different types of SQL join

Figure 4

E-R diagram showing relationship between police files You will need to:


Writing Good Software


Supplemental - Assumption Diagnostics and Regression Trouble Shooting


Figure 1


Figure 2


Figure 3


Figure 4


Figure 5


Figure 6