Summary and Schedule
This material was commisioned by the EPSRC Digital Health Hub for AMR, for delivery to the UK HSA w/c 16th September 2024.
An introduction to R for non-programmers using the Gapminder data. In addition, this lesson makes use of bespoke synthetic data provided by UKHSA.
The goal of this lesson is to teach novice programmers to write modular code and best practices for using R for data analysis. R is commonly used in many scientific disciplines for statistical analysis and its array of third-party packages. We find that many scientists who come to Software Carpentry workshops use R and want to learn more. The emphasis of these materials is to give attendees a strong foundation in the fundamentals of R, and to teach best practices for scientific computing: breaking down analyses into modular units, task automation, and encapsulation.
This lesson has been expanded to incorporate additional content on the use of Git and GitHub for version control via RStudio, creation and validation of regression models and SQL.
This content has been developed to form a five day course.
The instructor notes page has some
suggested lesson plans suitable for shorter workshops.
A variety of third party packages are used throughout this workshop. These are not necessarily the best, nor are they comprehensive, but they are packages we find useful, and have been chosen primarily for their usability.
Prerequisites
Understand that computers store data and instructions (programs, scripts etc.) in files. Files are organised in directories (folders). Know how to access files not in the working directory by specifying the path.
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Introduction to R and RStudio |
How to find your way around RStudio? How to interact with R? How to manage your environment? How to install packages? |
Duration: 00h 55m | 2. Project Management With RStudio | How can I manage my projects in R? |
Duration: 01h 25m | 3. Seeking Help | How can I get help in R? |
Duration: 01h 45m | 4. Data Structures |
How can I read data in R? What are the basic data types in R? How do I represent categorical information in R? |
Duration: 02h 40m | 5. Exploring Data Frames | How can I manipulate a data frame? |
Duration: 03h 10m | 6. Subsetting Data | How can I work with subsets of data in R? |
Duration: 04h 00m | 7. Creating Publication-Quality Graphics with ggplot2 | How can I create publication-quality graphics in R? |
Duration: 05h 20m | 8. Writing Data | How can I save plots and data created in R? |
Duration: 05h 40m | 9. Data Frame Manipulation with dplyr | How can I manipulate data frames without repeating myself? |
Duration: 06h 35m | 10. Data Frame Manipulation with tidyr | How can I change the layout of a data frame? |
Duration: 07h 20m | 11. Basic Statistics: describing, modelling and reporting |
How can I detect the type of data I have? How can I make meaningful summaries of my data? |
Duration: 08h 40m | 12. Logistic Regression |
How can I identify factors for antibiotic resistance? How can I check the validity of model? |
Duration: 10h 40m | 13. Broom | How can I present model outputs in an easier to read way? |
Duration: 11h 30m | 14. Producing Reports With Quarto | How can I integrate software and reports? |
Duration: 13h 30m | 15. Best Practices for Writing R Code | How can I write R code that other people can understand and use? |
Duration: 14h 10m | 16. Introduction to Reproducibility | What is reproducibility and why should I care? |
Duration: 16h 40m | 17. Automated Version Control | What is version control and why should I use it? |
Duration: 16h 45m | 18. Setting Up Git | How do I get set up to use Git? |
Duration: 16h 50m | 19. Creating a Repository | Where does Git store information? |
Duration: 17h 00m | 20. Tracking Changes |
How do I record changes in Git? How do I check the status of my version control repository? How do I record notes about what changes I made and why? |
Duration: 17h 20m | 21. Exploring History |
How can I identify old versions of files? How do I review my changes? How can I recover old versions of files? |
Duration: 17h 45m | 22. Ignoring Things | How can I tell Git to ignore files I don’t want to track? |
Duration: 17h 50m | 23. Remotes in GitHub | How do I share my changes with others on the web? |
Duration: 18h 35m | 24. Collaborating | How can I use version control to collaborate with other people? |
Duration: 19h 00m | 25. Conflicts | What do I do when my changes conflict with someone else’s? |
Duration: 19h 15m | 26. Branches | What do I do when I want to try something a little different? |
Duration: 19h 30m | 27. Issues | How do I specify a change that needs to be made to a file(s)? |
Duration: 19h 45m | 28. Open Science | How can version control help me make my work more open? |
Duration: 19h 55m | 29. Licensing | What licensing information should I include with my work? |
Duration: 20h 00m | 30. Citation | How can I make my work easier to cite? |
Duration: 20h 02m | 31. Hosting | Where should I host my version control repositories? |
Duration: 20h 12m | 32. SQL and R |
What is the relationship between a relational database and SQL? How do you query databases using SQL? How can I filter data? Can SQL be used to make calculations? How do I join two tables if they share a common point of information? |
Duration: 00h 12m | 33. Writing Good Software | How can I write software that other people can use? |
Duration: 00h 27m | 34. Supplemental - Assumption Diagnostics and Regression Trouble Shooting | How can I check that my data is suitable for use in a linear regression model? |
Duration: 02h 07m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
This lesson assumes you have R and RStudio installed on your computer.
- Download and install the latest version of R.
- Download and install RStudio. RStudio is an application (an integrated development environment or IDE) that facilitates the use of R and offers a number of nice additional features. You will need the free Desktop version for your computer.