Programming with Python
Basic data types in Python include integers, strings, and
floating-point numbers.
Use variable = value
to assign a value to a variable in
order to record it in memory.
Variables are created on demand whenever a value is assigned to
them.
Use print(something)
to display the value of
something
.
Use # some kind of explanation
to add comments to
programs.
Built-in functions are always available to use.
Import a library into a program using
import libraryname
.
Use the pandas
library to work with tabular data in
Python.
Use the read_csv
function to load data into a dataframe
variable.
Use index_col
to specify that a column’s values should
be used as row headings.
Use info
to find out basic information about a
dataframe.
Use slices and loc
to extract entries from a
dataframe.
The expression dataframe.shape
gives the shape of the
underlying array.
Use label_a:label_c
to specify a slice
that includes the rows or columns from label_a
to, and
including, label_c
.
Array indices start at 0, not 1.
Use low:high
to specify a slice
that
includes the indices from low
to high-1
.
Use # some kind of explanation
to add comments to
programs.
Use the pyplot
module from the matplotlib
library to create visualizations of data.
Dataframes have methods like min
, max
, and
mean
to compute statistics along either the rows or the
columns.
Use axis
argument in statistic functions to calculate
the values across the specified axis.
We can use add_subplot
to create multiple plots in a
single figure.
We can customise the labels, axis ranges, line styles, and more of
our plots using matplotlib
.
[value1, value2, value3, ...]
creates a list.
Lists can contain any Python object, including lists (i.e., list of
lists).
Lists are indexed and sliced with square brackets (e.g., list\[0\] and list\[2:9\] ), in the same way as strings and
arrays.
Lists are mutable (i.e., their values can be changed in place).
Strings are immutable (i.e., the characters in them cannot be
changed).
Use for variable in sequence
to process the elements of
a sequence one at a time.
The body of a for
loop must be indented.
Use len(thing)
to determine the length of something
that contains other values.
Use glob.glob(pattern)
to create a list of files whose
names match a pattern.
Use *
in a pattern to match zero or more characters,
and ?
to match any single character.
Use if condition
to start a conditional statement,
elif condition
to provide additional tests, and
else
to provide a default.
The bodies of the branches of conditional statements must be
indented.
Use ==
to test for equality.
X and Y
is only true if both X
and
Y
are true.
X or Y
is true if either X
or
Y
, or both, are true.
Zero, the empty string, and the empty list are considered false; all
other numbers, strings, and lists are considered true.
True
and False
represent truth
values.
Define a function using
def function_name(parameter)
.
The body of a function must be indented.
Call a function using function_name(value)
.
Variables defined within a function can only be seen and used within
the body of the function.
Variables created outside of any function are called global
variables.
Within a function, we can access global variables.
If we want to do the same calculation on all entries in our columns,
we can pass the dataframe columns as the inputs to a function.
Use help(thing)
to view help for something.
Put docstrings in functions to provide help for that function.
Specify default values for parameters when defining a function using
name=value
in the parameter list.
Parameters can be passed by matching based on name, by position, or
by omitting them (in which case the default value is used).
Put code whose parameters change frequently in a function, then call
it with different parameter values to customize its behavior.
Tracebacks can look intimidating, but they give us a lot of useful
information about what went wrong in our program, including where the
error occurred and what type of error it was.
An error having to do with the ‘grammar’ or syntax of the program is
called a SyntaxError
. If the issue has to do with how the
code is indented, then it will be called an
IndentationError
.
A NameError
will occur when trying to use a variable
that does not exist. Possible causes are that a variable definition is
missing, a variable reference differs from its definition in spelling or
capitalization, or the code contains a string that is missing quotes
around it.
Containers like lists and strings will generate errors if you try to
access items in them that do not exist. This type of error is called an
IndexError
.
Trying to read a file that does not exist will give you an
FileNotFoundError
. Trying to read a file that is open for
writing, or writing to a file that is open for reading, will give you an
IOError
.
Program defensively, i.e., assume that errors are going to arise,
and write code to detect them when they do.
Put assertions in programs to check their state as they run, and to
help readers understand how those programs are supposed to work.
Use preconditions to check that the inputs to a function are safe to
use.
Use postconditions to check that the output from a function is safe
to use.
Write tests before writing code in order to help determine exactly
what that code is supposed to do.
Know what code is supposed to do before trying to debug
it.
Make it fail every time.
Make it fail fast.
Change one thing at a time, and for a reason.
Keep track of what you’ve done.
Be humble.
The sys
library connects a Python program to the system
it is running on.
The list sys.argv
contains the command-line arguments
that a program was run with.
Avoid silent failures.
The pseudo-file sys.stdin
connects to a program’s
standard input.