Summary and Schedule
This material was commisioned by the EPSRC Digital Health Hub for AMR, for delivery to the UK HSA w/c 16th September 2024.
An introduction to R for non-programmers using the Gapminder data. In addition, this lesson makes use of open data from the Centre for Consumer Research Data and bespoke synthetic data provided by UKHSA.
The goal of this lesson is to teach novice programmers to write modular code and best practices for using R for data analysis. R is commonly used in many scientific disciplines for statistical analysis and its array of third-party packages. We find that many scientists who come to Software Carpentry workshops use R and want to learn more. The emphasis of these materials is to give attendees a strong foundation in the fundamentals of R, and to teach best practices for scientific computing: breaking down analyses into modular units, task automation, and encapsulation.
This lesson has been expanded to incorporate additional content on the use of Git for Version Control, navigating files and directories in a terminal, creation and validation of regression models and SQL.
This content has been developed to form a five day course.
The instructor notes page has some
suggested lesson plans suitable for shorter workshops.
A variety of third party packages are used throughout this workshop. These are not necessarily the best, nor are they comprehensive, but they are packages we find useful, and have been chosen primarily for their usability.
Prerequisites
Understand that computers store data and instructions (programs, scripts etc.) in files. Files are organised in directories (folders). Know how to access files not in the working directory by specifying the path.
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Introduction to R and RStudio |
How to find your way around RStudio? How to interact with R? How to manage your environment? How to install packages? |
Duration: 00h 55m | 2. Project Management With RStudio | How can I manage my projects in R? |
Duration: 01h 25m | 3. Seeking Help | How can I get help in R? |
Duration: 01h 45m | 4. Data Structures |
How can I read data in R? What are the basic data types in R? How do I represent categorical information in R? |
Duration: 02h 40m | 5. Exploring Data Frames | How can I manipulate a data frame? |
Duration: 03h 10m | 6. Subsetting Data | How can I work with subsets of data in R? |
Duration: 04h 00m | 7. Creating Publication-Quality Graphics with ggplot2 | How can I create publication-quality graphics in R? |
Duration: 05h 20m | 8. Writing Data | How can I save plots and data created in R? |
Duration: 05h 40m | 9. Data Frame Manipulation with dplyr | How can I manipulate data frames without repeating myself? |
Duration: 06h 35m | 10. Data Frame Manipulation with tidyr | How can I change the layout of a data frame? |
Duration: 07h 20m | 11. Producing Reports With Quarto | How can I integrate software and reports? |
Duration: 08h 35m | 12. Basic Statistics: describing, modelling and reporting |
How can I detect the type of data I have? How can I make meaningful summaries of my data? |
Duration: 09h 55m | 13. Logistic Regression |
How can I identify factors for antibiotic resistance? How can I check the validity of model? |
Duration: 11h 55m | 14. Linear regression and Broom |
How can I explore relationships between variables in my data? How can I present model outputs in an easier to read way? ::: |
Duration: 13h 35m | 15. SQL and R |
What is the relationship between a relational database and SQL? How do you query databases using SQL? How can I filter data? Can SQL be used to make calculations? How do I join two tables if they share a common point of information? |
Duration: 16h 05m | 16. Writing Good Software | How can I write software that other people can use? |
Duration: 16h 20m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
This lesson assumes you have R and RStudio installed on your computer.
- Download and install the latest version of R.
- Download and install RStudio. RStudio is an application (an integrated development environment or IDE) that facilitates the use of R and offers a number of nice additional features. You will need the free Desktop version for your computer.