R for AMR Epidemiology
Use RStudio to write and run R programs.
R has the usual arithmetic operators and mathematical
functions.
Use <-
to assign values to variables.
Use ls()
to list the variables in a program.
Use rm()
to delete objects in a program.
Use install.packages()
to install packages
(libraries).
Use RStudio to create and manage projects with consistent
layout.
Treat raw data as read-only.
Treat generated output as disposable.
Separate function definition and application.
Use help()
to get online help in R.
Use read.csv
to read tabular data in R.
The basic data types in R are double, integer, complex, logical, and
character.
Data structures such as data frames or matrices are built on top of
lists and vectors, with some added attributes.
Use cbind()
to add a new column to a data frame.
Use rbind()
to add a new row to a data frame.
Remove rows from a data frame.
Use str()
, summary()
, nrow()
,
ncol()
, dim()
, colnames()
,
head()
, and typeof()
to understand the
structure of a data frame.
Read in a csv file using read.csv()
.
Understand what length()
of a data frame
represents.
The file system is responsible for managing information on the
disk.
Information is stored in files, which are stored in directories
(folders).
Directories can also store other directories, which then form a
directory tree.
pwd
prints the user’s current working directory.
ls [path]
prints a listing of a specific file or
directory; ls
on its own lists the current working
directory.
cd [path]
changes the current working directory.
Most commands take options that begin with a single
-
.
Directory names in a path are separated with /
on Unix,
but \
on Windows.
/
on its own is the root directory of the whole file
system.
An absolute path specifies a location from the root of the file
system.
A relative path specifies a location starting from the current
location.
.
on its own means ‘the current directory’;
..
means ‘the directory above the current one’.
Version control is like an unlimited ‘undo’.
Version control also allows many people to work in parallel.
Use git config
with the --global
option to
configure a user name, email address, editor, and other preferences once
per machine.
git init
initializes a repository.
Git stores all of its repository data in the .git
directory.
git status
shows the status of a repository.
Files can be stored in a project’s working directory (which users
see), the staging area (where the next commit is being built up) and the
local repository (where commits are permanently recorded).
git add
puts files in the staging area.
git commit
saves the staged content as a new commit in
the local repository.
Write a commit message that accurately describes your changes.
git diff
displays differences between commits.
git restore
recovers old versions of files.
The .gitignore
file tells Git what files to
ignore.
Using RStudio’s Git integration allows you to version control a
project over time.
Indexing in R starts at 1, not 0.
Access individual values by location using []
.
Access slices of data using [low:high]
.
Access arbitrary sets of data using [c(...)]
.
Use logical operations and logical vectors to access subsets of
data.
Use ggplot2
to create plots.
Think about graphics in layers: aesthetics, geometry, statistics,
scale transformation, and grouping.
Save plots from RStudio using the ‘Export’ button.
Use write.table
to save tabular data.
A local Git repository can be connected to one or more remote
repositories.
Use the SSH protocol to connect to remote repositories.
git push
copies changes from a local repository to a
remote repository.
git pull
copies changes from a remote repository to a
local repository.
git clone
copies a remote repository to create a local
repository with a remote called origin
automatically set
up.
Conflicts occur when two or more people change the same lines of the
same file.
The version control system does not allow people to overwrite each
other’s changes blindly, but highlights conflicts so that they can be
resolved.
Use the dplyr
package to manipulate data frames.
Use select()
to choose variables from a data
frame.
Use filter()
to choose data based on values.
Use group_by()
and summarize()
to work
with subsets of data.
Use mutate()
to create new variables.
Use the tidyr
package to change the layout of data
frames.
Use pivot_longer()
to go from wide to longer
layout.
Use pivot_wider()
to go from long to wider layout.
Mix reporting written in R Markdown with software written in R.
Specify chunk options to control formatting.
Use knitr
to convert these documents into PDF and other
formats.
Keep your project folder structured, organized and tidy.
Document what and why, not how.
Break programs into short single-purpose functions.
Write re-runnable tests.
Don’t repeat yourself.
Be consistent in naming, indentation, and other aspects of
style.