Summary and Schedule
This material was commisioned by the EPSRC Digital Health Hub for AMR, for delivery to the UK HSA w/c 16th September 2024.
An introduction to R for non-programmers using the Gapminder data. In addition, this lesson makes use of open data from the Centre for Consumer Research Data and bespoke synthetic data provided by UKHSA.
The goal of this lesson is to teach novice programmers to write modular code and best practices for using R for data analysis. R is commonly used in many scientific disciplines for statistical analysis and its array of third-party packages. We find that many scientists who come to Software Carpentry workshops use R and want to learn more. The emphasis of these materials is to give attendees a strong foundation in the fundamentals of R, and to teach best practices for scientific computing: breaking down analyses into modular units, task automation, and encapsulation.
This lesson has been expanded to incorporate additional content on the use of Git for Version Control, navigating files and directories in a terminal, creation and validation of regression models and SQL.
This content has been developed to form a five day course.
The instructor notes page has some
suggested lesson plans suitable for shorter workshops.
A variety of third party packages are used throughout this workshop. These are not necessarily the best, nor are they comprehensive, but they are packages we find useful, and have been chosen primarily for their usability.
Prerequisites
Understand that computers store data and instructions (programs, scripts etc.) in files. Files are organised in directories (folders). Know how to access files not in the working directory by specifying the path.
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Introduction to R and RStudio |
How to find your way around RStudio? How to interact with R? How to manage your environment? How to install packages? |
Duration: 00h 55m | 2. Project Management With RStudio | How can I manage my projects in R? |
Duration: 01h 25m | 3. Seeking Help | How can I get help in R? |
Duration: 01h 45m | 4. Data Structures |
How can I read data in R? What are the basic data types in R? How do I represent categorical information in R? |
Duration: 02h 40m | 5. Exploring Data Frames | How can I manipulate a data frame? |
Duration: 03h 10m | 6. Navigating Files and Directories |
How can I move around on my computer? How can I see what files and directories I have? How can I specify the location of a file or directory on my computer? |
Duration: 03h 50m | 7. Automated Version Control | What is version control and why should I use it? |
Duration: 03h 55m | 8. Setting Up Git | How do I get set up to use Git? |
Duration: 04h 00m | 9. Creating a Repository | Where does Git store information? |
Duration: 04h 10m | 10. Tracking Changes |
How do I record changes in Git? How do I check the status of my version control repository? How do I record notes about what changes I made and why? |
Duration: 04h 30m | 11. Exploring History |
How can I identify old versions of files? How do I review my changes? How can I recover old versions of files? |
Duration: 04h 55m | 12. Ignoring Things | How can I tell Git to ignore files I don’t want to track? |
Duration: 05h 00m | 13. Supplemental: Using Git from RStudio | How can I use Git with RStudio? |
Duration: 05h 10m | 14. Subsetting Data | How can I work with subsets of data in R? |
Duration: 06h 00m | 15. Creating Publication-Quality Graphics with ggplot2 | How can I create publication-quality graphics in R? |
Duration: 07h 20m | 16. Writing Data | How can I save plots and data created in R? |
Duration: 07h 40m | 17. Remotes in GitHub | How do I share my changes with others on the web? |
Duration: 08h 25m | 18. Collaborating | How can I use version control to collaborate with other people? |
Duration: 08h 50m | 19. Conflicts | What do I do when my changes conflict with someone else’s? |
Duration: 09h 05m | 20. Data Frame Manipulation with dplyr | How can I manipulate data frames without repeating myself? |
Duration: 10h 00m | 21. Data Frame Manipulation with tidyr | How can I change the layout of a data frame? |
Duration: 10h 45m | 22. Producing Reports With Quarto | How can I integrate software and reports? |
Duration: 12h 00m | 23. Basic Statistics: describing, modelling and reporting |
How can I detect the type of data I have? How can I make meaningful summaries of my data? |
Duration: 13h 20m | 24. Linear regression and Broom |
How can I explore relationships between variables in my data? How can I present model outputs in an easier to read way? ::: |
Duration: 15h 00m | 25. Assumption Diagnostics and Regression Trouble Shooting | How can I check that my data is suitable for use in a linear regression model? |
Duration: 16h 40m | 26. Logistic Regression |
How can I identify factors for antibiotic resistance? How can I check the validity of model? |
Duration: 18h 40m | 27. SQL and R |
What is the relationship between a relational database and SQL? How do you query databases using SQL? How can I filter data? Can SQL be used to make calculations? How do I join two tables if they share a common point of information? |
Duration: 21h 10m | 28. Writing Good Software | How can I write software that other people can use? |
Duration: 21h 25m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
This lesson assumes you have R and RStudio installed on your computer.
- Download and install the latest version of R.
- Download and install RStudio. RStudio is an application (an integrated development environment or IDE) that facilitates the use of R and offers a number of nice additional features. You will need the free Desktop version for your computer.