Summary and Schedule

This material was commisioned by the EPSRC Digital Health Hub for AMR, for delivery to the UK HSA w/c 16th September 2024.

An introduction to R for non-programmers using the Gapminder data. In addition, this lesson makes use of bespoke synthetic data provided by UKHSA.

The goal of this lesson is to teach novice programmers to write modular code and best practices for using R for data analysis. R is commonly used in many scientific disciplines for statistical analysis and its array of third-party packages. We find that many scientists who come to Software Carpentry workshops use R and want to learn more. The emphasis of these materials is to give attendees a strong foundation in the fundamentals of R, and to teach best practices for scientific computing: breaking down analyses into modular units, task automation, and encapsulation.

This lesson has been expanded to incorporate additional content on the use of Git and GitHub for version control via RStudio, creation and validation of regression models and SQL.

This content has been developed to form a five day course.
The instructor notes page has some suggested lesson plans suitable for shorter workshops.

A variety of third party packages are used throughout this workshop. These are not necessarily the best, nor are they comprehensive, but they are packages we find useful, and have been chosen primarily for their usability.

Prerequisites

Understand that computers store data and instructions (programs, scripts etc.) in files. Files are organised in directories (folders). Know how to access files not in the working directory by specifying the path.

Setup Instructions

Download files required for the lesson

00h 00m

1. Introduction to R and RStudio

How to find your way around RStudio?
How to interact with R?
How to manage your environment?
How to install packages?

00h 55m

2. Project Management With RStudio

How can I manage my projects in R?

01h 25m

3. Seeking Help

How can I get help in R?

01h 45m

4. Data Structures

How can I read data in R?
What are the basic data types in R?
How do I represent categorical information in R?

02h 40m

5. Exploring Data Frames

How can I manipulate a data frame?

03h 10m

6. Subsetting Data

How can I work with subsets of data in R?

04h 00m

7. Creating Publication-Quality Graphics with ggplot2

How can I create publication-quality graphics in R?

05h 20m

8. Writing Data

How can I save plots and data created in R?

05h 40m

9. Data Frame Manipulation with dplyr

How can I manipulate data frames without repeating myself?

06h 35m

10. Data Frame Manipulation with tidyr

How can I change the layout of a data frame?

07h 20m

11. Basic Statistics: describing, modelling and reporting

How can I detect the type of data I have?
How can I make meaningful summaries of my data?

08h 40m

12. Logistic Regression

How can I identify factors for antibiotic resistance?
How can I check the validity of model?

10h 40m

13. Broom

How can I present model outputs in an easier to read way?

11h 30m

14. Producing Reports With Quarto

How can I integrate software and reports?

13h 30m

15. SQL and R

What is the relationship between a relational database and SQL?
How do you query databases using SQL?
How can I filter data?
Can SQL be used to make calculations?
How do I join two tables if they share a common point of information?

17h 30m

16. Best Practices for Writing R Code

How can I write R code that other people can understand and use?

18h 10m

17. Introduction to Reproducibility

What is reproducibility and why should I care?

20h 40m

18. Writing Good Software

How can I write software that other people can use?

20h 55m

19. Supplemental - Assumption Diagnostics and Regression Trouble Shooting

How can I check that my data is suitable for use in a linear regression model?

22h 35m

Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.

This lesson assumes you have R and RStudio installed on your computer.

Download and install the latest version of R.
Download and install RStudio. RStudio is an application (an integrated development environment or IDE) that facilitates the use of R and offers a number of nice additional features. You will need the free Desktop version for your computer.