Summary and Setup

This material was commisioned by the EPSRC Digital Health Hub for AMR, for delivery to the UK HSA w/c 16th September 2024.

An introduction to R for non-programmers using the Gapminder data. In addition, this lesson makes use of bespoke synthetic data provided by UKHSA.

The goal of this lesson is to teach novice programmers to write modular code and best practices for using R for data analysis. R is commonly used in many scientific disciplines for statistical analysis and its array of third-party packages. We find that many scientists who come to Software Carpentry workshops use R and want to learn more. The emphasis of these materials is to give attendees a strong foundation in the fundamentals of R, and to teach best practices for scientific computing: breaking down analyses into modular units, task automation, and encapsulation.

This lesson has been expanded to incorporate additional content on the use of Git and GitHub for version control via RStudio, creation and validation of regression models and SQL.

This content has been developed to form a five day course.
The instructor notes page has some suggested lesson plans suitable for shorter workshops.

A variety of third party packages are used throughout this workshop. These are not necessarily the best, nor are they comprehensive, but they are packages we find useful, and have been chosen primarily for their usability.

Prerequisite

Prerequisites

Understand that computers store data and instructions (programs, scripts etc.) in files. Files are organised in directories (folders). Know how to access files not in the working directory by specifying the path.

This lesson assumes you have R and RStudio installed on your computer.

Download and install the latest version of R.
Download and install RStudio. RStudio is an application (an integrated development environment or IDE) that facilitates the use of R and offers a number of nice additional features. You will need the free Desktop version for your computer.