Python Fundamentals


  • Basic data types in Python include integers, strings, and floating-point numbers.
  • Use variable = value to assign a value to a variable in order to record it in memory.
  • Variables are created on demand whenever a value is assigned to them.
  • Use print(something) to display the value of something.
  • Use # some kind of explanation to add comments to programs.
  • Built-in functions are always available to use.

Reading Tabular Data into DataFrames


  • Import a library into a program using import libraryname.
  • Use the pandas library to work with tabular data in Python.
  • Use the read_csv function to load data into a dataframe variable.
  • Use index_col to specify that a column’s values should be used as row headings.
  • Use info to find out basic information about a dataframe.
  • Use slices and loc to extract entries from a dataframe.
  • The expression dataframe.shape gives the shape of the underlying array.
  • Use label_a:label_c to specify a slice that includes the rows or columns from label_a to, and including, label_c.
  • Array indices start at 0, not 1.
  • Use low:high to specify a slice that includes the indices from low to high-1.
  • Use # some kind of explanation to add comments to programs.

Visualizing Tabular Data


  • Use the pyplot module from the matplotlib library to create visualizations of data.
  • Dataframes have methods like min, max, and mean to compute statistics along either the rows or the columns.
  • Use axis argument in statistic functions to calculate the values across the specified axis.
  • We can use add_subplot to create multiple plots in a single figure.
  • We can customise the labels, axis ranges, line styles, and more of our plots using matplotlib.

Storing Multiple Values in Lists


  • [value1, value2, value3, ...] creates a list.
  • Lists can contain any Python object, including lists (i.e., list of lists).
  • Lists are indexed and sliced with square brackets (e.g., list\[0\] and list\[2:9\]), in the same way as strings and arrays.
  • Lists are mutable (i.e., their values can be changed in place).
  • Strings are immutable (i.e., the characters in them cannot be changed).

Repeating Actions with Loops


  • Use for variable in sequence to process the elements of a sequence one at a time.
  • The body of a for loop must be indented.
  • Use len(thing) to determine the length of something that contains other values.

Analyzing Data from Multiple Files


  • Use glob.glob(pattern) to create a list of files whose names match a pattern.
  • Use * in a pattern to match zero or more characters, and ? to match any single character.

Making Choices


  • Use if condition to start a conditional statement, elif condition to provide additional tests, and else to provide a default.
  • The bodies of the branches of conditional statements must be indented.
  • Use == to test for equality.
  • X and Y is only true if both X and Y are true.
  • X or Y is true if either X or Y, or both, are true.
  • Zero, the empty string, and the empty list are considered false; all other numbers, strings, and lists are considered true.
  • True and False represent truth values.

Creating Functions


  • Define a function using def function_name(parameter).
  • The body of a function must be indented.
  • Call a function using function_name(value).
  • Variables defined within a function can only be seen and used within the body of the function.
  • Variables created outside of any function are called global variables.
  • Within a function, we can access global variables.
  • If we want to do the same calculation on all entries in our columns, we can pass the dataframe columns as the inputs to a function.
  • Use help(thing) to view help for something.
  • Put docstrings in functions to provide help for that function.
  • Specify default values for parameters when defining a function using name=value in the parameter list.
  • Parameters can be passed by matching based on name, by position, or by omitting them (in which case the default value is used).
  • Put code whose parameters change frequently in a function, then call it with different parameter values to customize its behavior.

Errors and Exceptions


  • Tracebacks can look intimidating, but they give us a lot of useful information about what went wrong in our program, including where the error occurred and what type of error it was.
  • An error having to do with the ‘grammar’ or syntax of the program is called a SyntaxError. If the issue has to do with how the code is indented, then it will be called an IndentationError.
  • A NameError will occur when trying to use a variable that does not exist. Possible causes are that a variable definition is missing, a variable reference differs from its definition in spelling or capitalization, or the code contains a string that is missing quotes around it.
  • Containers like lists and strings will generate errors if you try to access items in them that do not exist. This type of error is called an IndexError.
  • Trying to read a file that does not exist will give you an FileNotFoundError. Trying to read a file that is open for writing, or writing to a file that is open for reading, will give you an IOError.

Defensive Programming


  • Program defensively, i.e., assume that errors are going to arise, and write code to detect them when they do.
  • Put assertions in programs to check their state as they run, and to help readers understand how those programs are supposed to work.
  • Use preconditions to check that the inputs to a function are safe to use.
  • Use postconditions to check that the output from a function is safe to use.
  • Write tests before writing code in order to help determine exactly what that code is supposed to do.

Debugging


  • Know what code is supposed to do before trying to debug it.
  • Make it fail every time.
  • Make it fail fast.
  • Change one thing at a time, and for a reason.
  • Keep track of what you’ve done.
  • Be humble.

Command-Line Programs


  • The sys library connects a Python program to the system it is running on.
  • The list sys.argv contains the command-line arguments that a program was run with.
  • Avoid silent failures.
  • The pseudo-file sys.stdin connects to a program’s standard input.