GOPINATHAN3419
(AM) or SWCPYTHON2025
(PM) to follow along and submit answers to the exercises.shell-lesson-data.zip
file and Unzip/extract it, save to your Desktop.python-novice-inflammation-data.zip
file and (optionally) the code.zip
file too.Instructors: Dimitra Salmanidou, William Graham
Helpers: Stephen Thompson, Ankur Sinha
Instructors: Devaraj Gopinathan, Arindam Saha
Helpers: William Graham
This is Will typing
Some SHELL commands:
ls
: list contents of directory
pwd
: print working directory
cd
: change directory
mkdir
: make directory
nano
: run text editor Nano to create or edit a file
mv
: move file
cp
: copy file
rm
: remove/delete file. BE CAREFUL! Deleting is permenant, there's no recycle bin! A good habit is to use rm -i
where it will ask you for confirmation before deleting items
rmdir
: remove a directory (will refuse if it is not empty)
wc
: word count/ count lines, word and characters in file
cat
: concatenate/print file contents (what does tac
do?)
sort
: sort text and binary files by lines
head
: print lines from the start of a file
tail
: print lines from the end of a file
history
: print all the commands you have used previously (including wrong commands!)
command
flag/option
optional-argument-for-flag
another-flag/option
optional-argument-for-flag2
...
Example:
ls
-F
adirectory
Here, ls
is the command, -F
is the first flag, and adirectory
is the argument to the ls
command.
head
-n
10
afile
Here, head
is the command -n
is the option, and 10
is the argument passed to the option -n
, and finally afile
is the agrument passed to the head
command.
Note: not all commands/flags take arguments
*
) will match any number of characters in a name. For example, if I have 3 files called alpha.txt, bravo.txt, and charlie.csv
;
ls *.txt
will match alpha.txt
and bravo.txt
ls *.csv
will match charlie.csv
ls *a*.*
will match all 3 files!ls *a.*
will match alpha.txt
(it's the only one in which a.
appears).?
) matches exactly one character in a name. If I have 3 files called data-01.csv
, data-02.csv
, data-10.csv
…
ls data-0?.csv
will match data-01.csv
and data-02.csv
ls data-??.csv
will match all 3 files (the ??
matches 01, 02
and 10
!)ls data-?.csv
will match none of the files (since the ?
only looks for 1 character, and all our files have 2 characters between the hyphon and the .
).*
and ?
can be combined together too. Let's say we have data-01.csv
, data-20.csv
, data-01.h5
instead…
ls data-??.*
will match all 3 files.ls data-*.h5
will match data-01.h5
.ls data-*.*
will match everything again.ls data-0?.*
will match data-01.csv
and data-01.h5
.[..]
means match any of the letters in side the square brackets
[ab]
means "a or b"*
and ?
too
[ab]*
for example means "either a or b and anything after"The >
character can be used to redirect the output of a command to somewhere else, normally a text file.
For example,
ls -a .
normally lists all the files in your current directory, and displays them in the terminal.
ls -a . > list_of_files.txt
will instead not print this list to your terminal. Instead, it will create a file called list_of_files.txt
(in your current working directory) that contains the result. You can view the contents of the file using
cat list_of_files.txt
Note that >
will overwrite a file that already exists! You can use >>
to instead append the output to the file.
You can also "pipe" outputs to other commands, using the |
character (next to "z" on a UK keyboard). This will take the output of the first command and pass it directly into the second command! The following commands will do the same thing, for example:
(Note, we ran these commands in the exercise-data/alkanes/
directory)
wc -l *.txt > word_counts.txt
, followed by sort -n word_counts.txt
,wc -l *.txt | sort -n
The second way, using a pipe, avoids creating a temporary file (word_counts.txt
) that we don't actually need afterwards.
Everything we do in this session is done inside the shell-lesson-data
folder. The first thing we'll do is create a new directory, research-paper
, inside that folder, next to north-pacific-gyre
and exercise-data
.
We can check the directory is empty with ls -aF research-paper
We will then cd research-paper
to move into our new folder, and work on git in there.
Start by telling git who you are:
git config --global user.name "yourname"
If you have a github account you should use that for your name.
git config --global user.email "youremail"
You can choose what editor git will use with;
git config --global core.editor "nano -w"
And how git handles carriage return and line feeds. For mac and linux;
git config --global core.autocrlf input
For Windows;
git config --global core.autocrlf true
And what your default branch is;
git config --global init.defaultBranch main
You should still be in your research-paper directory. You can check with pwd
and ls
if you wish. We can now start using git. Step 1, initialise you're directory.
git init
Now using 'ls -a' you should see a new item .git
. .git
is a hidden folder that git uses for version control of the directory contents. Don't remove ir or edit its contents.
Now try:
git status
Let's create a file to add to our repository.
nano abstract.md
Enter some text of your choice, save the file and exit nano. Now try git status
again. You should see that abstract.md is listed under "Untracked files". You can add it to git's version control with;
'git add abstract.md'
then
git commit
This should open a text editor (nano if that's what you set earlier). In nano you can add a descriptive message, maybe "my first commit". Save and exit nano.
Re-run 'git status'
Now let's edit abstract.md and use git to keep track of our changes.
Use nano to edit abstract.md. Then:
git status
If you like you can the changes you made with:
git diff
Then git add
then git status
, add a commit message, save and exit nano.
Now lets create a subfolder for our analysis.
mkdir analysis
Running git status
should show "nothing to commit, working tree clean". This is because git doesn't worry about directories, it works on files. So create a file in analysis (maybe a pythpn script).
nano analyis/analysis.py
Enter some text of your choice, save and exit.
Re-run git status
Add some more files:
nano introduction.md
etc.
You can selectively add file for each commit. So let's just add analsys files.
git add analyis/analysis.py
git commit
Enter some text like "added analysis script"
This enables us to have a meaningful commit history.
We can use git commit -m "a commit message"
as a shortcut to commit with the provided commit message. git will not open nano (your editor) for you to write a commit message if you use -m
.
We can do git add introduction.md
, git commit
and "added introduction".
git log
shows the complete "commit history" of your project.
The commit information includes:
git log --oneline
shows a shorter summary:
To see the history of a particular file/path:
git log -- <path to file>
To see what changed in a particular commit, use:
git diff <hash> <file>
In a diff:
---
line shows what file lines were removed from+++
line shows what file lines were added to (usually the same file)@@ ... @@
line notes how many lines were added (+N
) and removed (-N
), and at what line in the document+
were added-
were removed.The HEAD
is a pointer to where git thinks it currently is. This usually point to the latest commit.
So, after making a change to some file, file.md
, if we run this before running git add
:
git diff HEAD file.md
we're asking git
what has changed from HEAD
to now.
Git works line-by-line.
So, even if you have added a few more words to the same line, git will say that a line was removed and a new one was added.
Once a file is staged, git diff
will not show it in the diff, because git now considers it part of the "present" version that is ready to be committed
To see the diff for a file that has been added with git add
, we can use:
git add --staged <file>
The HEAD
pointer can be used to easily go back in time:
HEAD~1
: the commit before HEAD
HEAD~20
: the 20th commit before HEAD
.So, this command will show what has changed since the last commit (the commit before HEAD
):
git diff HEAD~1
To restore files to the "present" (HEAD
), we can use:
git restore <files>
To restore a file to a particular commit, we can specify the commit too:
git restore --source=<commit> -- <files>
Note that restore
does not automatically run git add
.
git still considers it a modification and you will need to manually add
it.
To remove a file from git (tell git to stop tracking it):
git rm <file>
To "undo" a commit, we can use:
git revert <hash of the commit to undo>
This will open a commit editor for you to edit the commit message, which will include information about the commit being reverted.
Sometimes, we want to keep data files but not track them in Git. So, we can tell git to "ignore" them by adding them in to a .gitignore
file that must be placed in the same folder where the .git
folder is. Note that you must git add .gitignore
and then git commit
it too.
Each line in the .gitignore
file:
analyis/01-data.csv
)analysis/dataset_*.csv
, or analysis/*.csv
)To ignore all csv files in subdirectories we can add **.csv
to .gitignore
A set of gitignore files for different projects can be found on GitHub here: https://github.com/github/gitignore
git config --global init.defaultBranch main
did not seem to work for a fewgit config --global core.autocrlf true
may not be working, git gives a warning saying LF will be replaced by CLRF (not sure if this is OK).For Python today, we will be making use of Jupyter notebooks.
These let you write either code (Python code to be run) or markdown (text) "cells" so you can annotate your work as you go along.
For those who installed Anaconda as per the setup instructions, to load up Jupyter:
If you can't get Jupyter / Anaconda working…
.csv
files for upload. (Hint: you can highlight multiple files to upload them at once)..csv
files appear in the file explorer on the left of the page.Launcher
tab, click on Python (pyodide)
to open a new notebook. You're all set!Python "comments" start with a hashtag (#
) - this makes Python ignore the rest of the line, and lets you write notes to yourself to remind you what your code is doing.
The equals (=
) operator assigns a value to a variable.
weight_kg = 60
creates a variable called weight_kg
, and stores the value 60 in it.
weight_kg = weight_kg + 30
will use the OLD value of weight_kg
in the calculation weight_kg + 30
, and THEN overwrite the value of weight_kg
with the result!print
is an in-built Python function, that displays the value currently stored inside a variable. print(weight_kg)
- display the value of the variable weight_kg
.
type
is another in-built Python function, that displays what type of value is stored in a variable. It might be an int
(whole number, INTeger), float
(a decimal number, or FLOATing point number), or str
(STRing of characters), or one of many other types!
This is a recap of the code that Devaraj has written so far to create the plots.
import numpy
import matplotlib.pyplot
# Remember, if your notebooks are in the same folder as your notebooks, you need to use
# fname="inflammation-01.csv" instead of fname="data/inflammation-01.csv"
data = numpy.loadtxt(fname="data/inflammation-01.csv", delimiter=",")
# To create a heatmap
matplotlib.pyplot.imshow(data)
# Then to actually display it to the screen
matplotlib.pyplot.show()
# To create a line plot, of the daily average inflammation
ave_inflammation = numpy.mean(data, axis=0)
ave_plot = matplotlib.pyplot.plot(avg_inflammation)
# Display our new figure
matplotlib.pyplot.show()
# We didn't actually need to create an intermediary variable (ave_inflammation)!
# So if we create a plot for the max, we can just pass in the values from the numpy.max calculation directly
max_plot = matplotlib.pyplot.plot(numpy.amax(data, axis=0))
# Display figure
matplotlib.pyplot.show()
# Similarly we can do this for a plot of the minimum
min_plot = matplotlib.pyplot.plot(numpy.amin(data, axis=0))
# Display figure
matplotlib.pyplot.show()
This is the code that we used when we started grouping plots.
import numpy
import matplotlib.pyplot
# Remember, if your notebooks are in the same folder as your notebooks, you need to use
# fname="inflammation-01.csv" instead of fname="data/inflammation-01.csv"
data = numpy.loadtxt(fname="data/inflammation-01.csv", delimiter=",")
# Create a blank canvas, that we're storing in a variable called 'fig'
fig = matplotlib.pyplot.figure(figsize=(10., 3.))
# The figure is empty, so displaying it will show nothing
matplotlib.pyplot.show(fig)
# We need to add something to our figure!
# Let's start by actually adding some axes on which to plot.
axes1 = fig.add_subplot(1, 3, 1) # 1 row, 3 columns of subfigures. The last '1' means that axes1 will correspond to the first subfigure
axes2 = fig.add_subplot(1, 3, 2)
axes3 = fig.add_subplot(1, 3, 3)
# Notice that we are doing "fig.add_subplot"" here - this means that we are using a function inside the 'fig' variable, which exists to add more pieces to our figure!
# Add some labels so our figure is readable!
# Again, we are using axes1.set_ylabel here because we are adding something to our axes.
axes1.set_ylabel("Average")
axes2.set_ylabel("Max")
axes3.set_ylabel("Min")
# We can see what we've got so far...
matplotlib.pyplot.show(fig)
# .. which is now 3 empty subplots with labels!
# So let's start plotting on our axes
axes1.plot(numpy.mean(data, axis=0))
axes2.plot(numpy.amax(data, axis=0))
axes3.plot(numpy.amin(data, axis=0))
matplotlib.pyplot.show(fig)
# Looks good, but our axis labels are overlapping with the other plots.
# We can fix this by forcing a tighter (stricter) layout
fig.tight_layout()
matplotlib.pyplot.show(fig)
# If you want to save your figure, we can do this too!
fig.savefig("inflammation.png")
# Other options we set were our axis limits
axes1.set_xlim(0, 40) # we know there are exactly 40 days, so we might as well compress our axes limits, for example!
We can define a list by using square brackets, and separating the items (elements) of the list with commas:
list_of_numbers = [1, 3, 5, 7]
list_of_names = ["Will", "Arindam", "Devaraj"]
list_with_a_mixture_of_things = [1, "seven", 3.141592]
emtpy_list = []
IMPORTANT Do not call your list list
! list
is a special function that Python uses to create lists, so doing something like
list = [1, 3, 5, 7]
will mean that you can't make any more lists! If you did accidently do this, you can "restart" your notebook using the restart button in the toolbar (next to the "run cell" button), and then re-run the code cells (obviously avoiding this one!).
Lists can be accessed in similar ways to arrays and data; we can access by index in the list
print(list_of_numbers[0]) # Remember, indexes start at 0!
print(list_of_numbers[-1]) # Count backwards from the end
print(list_of_numbers[0:3]) # Slicing also works on lists
# But you can "empty slice" lists too if you start at an index that is earlier than your final index:
print(list_of_numbers[3:0]) # Will return an empty list
# We can also change the "step size" of our slices
long_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# Start at index 1, go up to (but not including)
# index 8, and take every 2nd element!
print(long_list[1:8:2])
We can also add extra elements to lists using the append
method.
print(list_of_numbers)
list_of_numbers.append(9)
print(list_of_numbers)
And we can check how many elements our list has with the len
function:
print(len(list_of_numbers))
Lists are also mutable, we can change their values in-place:
print(list_of_names)
list_of_names[0] = "Graham"
print(list_of_names)
note how this is different to strings, which up until this point have behaved like lists of characters:
my_string = "William"
# Slicing / indexing works, just like lists
print(my_string[4])
# But we can't change the individual characters
my_string[4] = "g" # <- This produces an error!
Lists can contain anything… even other lists!
dogs = ["Cyote", "Wolf", "Dingo"]
cats = ["Manx", "Lion", "Tiger", "Leopard"]
horses = ["Horse", "Zebra", "Donkey"]
list_of_animal_types = [dogs, cats, horses]
print(list_of_animal_types)
# The 0-indexed item in this list, is itself a list!
print(list_of_animal_types[0]) # dogs
# So we can ask for the 2-indexed item, inside the list at index 0
print(list_of_animal_types[0][2]) # dogs[2] = "Poodle"
# Note that we can't do
print(list_of_animal_types[0, 2]) # Causes an error
# because a list isn't 2D. It's a list, and we have to "get" the sub-list before we can access anything else. There's no concept of rows and columns like numpy arrays.
In general though, if you are in a situation where you're using lists-of-lists, you can normally do something easier using either numpy
arrays (which we saw earlier) or your own custom class
es (which we won't cover today).
Also, notice that sub-list elements are not counted by len
;
print(len(list_of_animal_types)) # 3, since there are 3 "sub-lists" inside this list. The individual elements of the sub-lists do not contribute to the count, because they are not "direct" elements of the main list.
The reason why we use lists is so that we can write loops. We will be looking at the for
-loop, which is a way of letting us repeat instructions multiple times, once for each element in a list.
For example, if we want to print out every element in a list, with a comment, we could write out the following code
fibbonacci = [0, 1, 1, 2, 3, 5, 8, 13, 21]
print("Element 1 is", fibbonacci[0])
print("Element 2 is", fibbonacci[1])
print("Element 3 is", fibbonacci[2])
print("Element 4 is", fibbonacci[3])
but this is quite tedious. It would also break if you later changed fibbonacci
to have less than 4 elements too! Instead, we can reliably use a for
-loop to run this print statement for every element in the list:
for number in fibbonacci:
print(number)
Note: Notice how the print(number)
line is indented. This is important, since Python uses indentation to know when your loop instructions end!
for number in fibbonacci:
print("Start of loop instructions")
print("Current number is", number)
print("End of loop instructions, but still in the loop")
print("Now outside the loop - this text will only appear once")
Our plan is to use loops to run our analysis (or make plots for) each of our inflammation datasets.
import glob # We will use this to fetch our list of files
import numpy
import matplotlib.pyplot
# First, we need to search for the csv data files.
# This is what glob is for
# Remember, if your csv files are in the same folder as your notebook,
# you need "inflammation-*.csv" instead of "data/inflammation-*.csv".
csv_files = glob.glob("data/inflammation-*.csv")
# The csv files are not necessarily found in order, so we
# might need to sort them into alphabetical order
csv_files = sorted(csv_files)
# Now, we want to create the figure we had before in plotting, but
# we want to do this for EVERY data file!
# So first, we need to loop over our file names.
for filename in csv_files:
# Print out a record so we can see what's happening
print("Currently looking at:" filename)
# Load the current datafile
data = numpy.loadtxt(fname=filename, delimiter=",")
# Prepare our blank canvas
fig = matplotlib.pyplot.figure(figsize=(10., 3.))
# Create our 3 subplots
ax1 = fig.add_subplot(1, 3, 1)
ax2 = fig.add_subplot(1, 3, 2)
ax3 = fig.add_subplot(1, 3, 3)
# Plot the average, mean, and min of THIS dataset on the figure axis
ax1.plot(numpy.mean(data, axis=0))
ax2.plot(numpy.amax(data, axis=0))
ax3.plot(numpy.amin(data, axis=0))
# Add some nice axis labels
ax1.set_ylabel("Average")
ax2.set_ylabel("Max")
ax3.set_ylabel("Min")
# Add one super-title to each figure.
# We use slicing to access just the "01", "02", "03" number
# of each dataset:
# inflammation-XX.csv
# ^ ^
# index -6 index -4
title = "Plot for dataset number " + filename[-6:-4]
fig.suptitle(title)
# Ensure the figure has a layout that doesn't overlap
# different subplots
fig.tight_layout()