Instructor Notes

Timing


Leave about 30 minutes at the start of each workshop and another 15 mins at the start of each session for technical difficulties like WiFi and installing things (even if you asked students to install in advance, longer if not).

Lesson Plans


The lesson contains much more material than can be taught in a day. Instructors will need to pick an appropriate subset of episodes to use in a standard one day course.

Some suggested paths through the material are:

(suggested by @liz-is)

  • 01 Introduction to R and RStudio
  • 04 Data Structures
  • 05 Exploring Data Frames (“Realistic example” section onwards)
  • 08 Creating Publication-Quality Graphics with ggplot2
  • 10 Functions Explained
  • 13 Dataframe Manipulation with dplyr
  • 15 Producing Reports With knitr

(suggested by @naupaka)

  • 01 Introduction to R and RStudio
  • 02 Project Management With RStudio
  • 03 Seeking Help
  • 04 Data Structures
  • 05 Exploring Data Frames
  • 06 Subsetting Data
  • 09 Vectorization
  • 08 Creating Publication-Quality Graphics with ggplot2 OR 13 Dataframe Manipulation with dplyr
  • 15 Producing Reports With knitr

A half day course could consist of (suggested by @karawoo):

  • 01 Introduction to R and RStudio
  • 04 Data Structures (only creating vectors with c())
  • 05 Exploring Data Frames (“Realistic example” section onwards)
  • 06 Subsetting Data (excluding factor, matrix and list subsetting)
  • 08 Creating Publication-Quality Graphics with ggplot2

Setting up git in RStudio


There can be difficulties linking git to RStudio depending on the operating system and the version of the operating system. To make sure Git is properly installed and configured, the learners should go to the Options window in the RStudio application.

  • Mac OS X:
    • Go RStudio -> Preferences… -> Git/SVN
    • Check and see whether there is a path to a file in the “Git executable” window. If not, the next challenge is figuring out where Git is located.
    • In the terminal enter which git and you will get a path to the git executable. In the “Git executable” window you may have difficulties finding the directory since OS X hides many of the operating system files. While the file selection window is open, pressing “Command-Shift-G” will pop up a text entry box where you will be able to type or paste in the full path to your git executable: e.g. /usr/bin/git or whatever else it might be.
  • Windows:
    • Go Tools -> Global options… -> Git/SVN
    • If you use the Software Carpentry Installer, then ‘git.exe’ should be installed at C:/Program Files/Git/bin/git.exe.

To prevent the learners from having to re-enter their password each time they push a commit to GitHub, this command (which can be run from a bash prompt) will make it so they only have to enter their password once:

BASH

$ git config --global credential.helper 'cache --timeout=10000000'

RStudio Color Preview


RStudio has a feature to preview the color for certain named colors and hexadecimal colors. This may confuse or distract learners (and instructors) who are not expecting it.

Mainly, this is likely to come up during the episode on “Data Structures” with the following code block:

R

cats <- data.frame(coat = c("calico", "black", "tabby"),
                    weight = c(2.1, 5.0, 3.2),
                    likes_string = c(1, 0, 1))

This option can be turned off and on in the following menu setting: Tools -> Global Options -> Code -> Display -> Enable preview of named and hexadecimal colors (under “Syntax”)

Pulling in Data


The easiest way to get the data used in this lesson during a workshop is to have attendees download the raw data from gapminder-data and gapminder-data-wide.

Attendees can use the File - Save As dialog in their browser to save the file.

Overall


Make sure to emphasize good practices: put code in scripts, and make sure they’re version controlled. Encourage students to create script files for challenges.

If you’re working in a cloud environment, get them to upload the gapminder data after the second lesson.

Make sure to emphasize that matrices are vectors underneath the hood and data frames are lists underneath the hood: this will explain a lot of the esoteric behaviour encountered in basic operations.

Vector recycling and function stacks are probably best explained with diagrams on a whiteboard.

Be sure to actually go through examples of an R help page: help files can be intimidating at first, but knowing how to read them is tremendously useful.

Be sure to show the CRAN task views, look at one of the topics.

There’s a lot of content: move quickly through the earlier lessons. Their extensiveness is mostly for purposes of learning by osmosis: so that their memory will trigger later when they encounter a problem or some esoteric behaviour.

Key lessons to take time on:

  • Data subsetting - conceptually difficult for novices
  • Functions - learners especially struggle with this
  • Data structures - worth being thorough, but you can go through it quickly.

Don’t worry about being correct or knowing the material back-to-front. Use mistakes as teaching moments: the most vital skill you can impart is how to debug and recover from unexpected errors.

Introduction to R and RStudio


Instructor Note

When installing ggplot2, it may be required for some users to use the dependencies flag as a result of lazy loading affecting the install. This suggestion is not tied to any known bug discussion, and is advised based off instructor feedback/experience in resolving stochastic occurences of errors identified through delivery of this workshop:

R

install.packages("ggplot2", dependencies = TRUE)


Project Management With RStudio


Seeking Help


Data Structures


Exploring Data Frames


Navigating Files and Directories


Instructor Note

Introducing and navigating the filesystem in the shell (covered in Navigating Files and Directories section) can be confusing. You may have both terminal and GUI file explorer open side by side so learners can see the content and file structure while they’re using terminal to navigate the system.



Automated Version Control


Setting Up Git


Creating a Repository


Tracking Changes


Exploring History


Ignoring Things


Supplemental: Using Git from RStudio


Subsetting Data


Creating Publication-Quality Graphics with ggplot2


Writing Data


Remotes in GitHub


Collaborating


Conflicts


Data Frame Manipulation with dplyr


Data Frame Manipulation with tidyr


Producing Reports With Quarto


Basic Statistics: describing, modelling and reportingDescribing dataInferential statisticsRegression Modelling


Instructor Note

Emphasise that parametric is not equal to normal.



Instructor Note

Get them to plot the graphs. Explain that we are generating random data from different distributions and plotting them.



Linear regression and Broom


Assumption Diagnostics and Regression Trouble Shooting


Logistic Regression


SQL and R


Writing Good Software