XClose

Research Software Engineering Summer School

Home
Menu

Coding Conventions

Let's import a few variables from context.py that will be used in the following lesson.

In [1]:
from context import (
    sEntry,
    iOffset,
    entry,
    offset,
    anothervariable,
    variable,
    flag1,
    flag2,
    do_something,
)

One code, many layouts:

Consider the following fragment of python:

In [2]:
import species
def AddToReaction(name, reaction):
    reaction.append(species.Species(name))

this could also have been written:

In [3]:
from species import Species

def add_to_reaction(a_name,
                    a_reaction):
    l_species = Species(a_name)
    a_reaction.append( l_species )

So many choices

  • Layout
  • Naming
  • Syntax choices

Layout

In [4]:
reaction = {
    "reactants": ["H", "H", "O"],
    "products": ["H2O"]
}
In [5]:
reaction2=(
{
  "reactants":
  [
    "H",
    "H",
    "O"
  ],
  "products":
  [
    "H2O"
  ]
}
)

Layout choices

  • Brace style
  • Line length
  • Indentation
  • Whitespace/Tabs

Inconsistency will produce a mess in your code! Some choices will make your code harder to read, whereas others may affect the code. For example, if you copy/paste code with tabs in a place that's using spaces, they may appear OK in your screen but it will fail when running it.

Naming Conventions

Camel case is used in the following example, where class name is in UpperCamel, functions in lowerCamel and underscore_separation for variable names.

In [6]:
class ClassName:
    def methodName(variable_name):
        instance_variable = variable_name

This other example uses underscore_separation for variable and function names, and CamelCase for class names. This convention is used broadly in the python community.

In [7]:
class ClassName:
    def method_name(a_variable):
        m_instance_variable = a_variable

Hungarian Notation

Prefix denotes type:

In [8]:
fNumber = float(sEntry) + iOffset

So in the example above we know that we are creating a float number as a composition of a string entry and an integer offset.

People may find this useful in languages like Python where the type is intrisic in the variable.

In [9]:
number = float(entry) + offset

Newlines

  • Newlines make code easier to read
  • Newlines make less code fit on a screen

Use newlines to describe your code's rhythm.

Syntax Choices

The following two snippets do the same, but the second is separated into more steps, making it more readable.

In [10]:
anothervariable += 1
if ((variable == anothervariable) and flag1 or flag2): do_something()
In [11]:
anothervariable = anothervariable + 1
variable_equality = (variable == anothervariable)
if ((variable_equality and flag1) or flag2):
    do_something()

We create extra variables as an intermediate step. Don't worry about the performance now, the compiler will do the right thing.

What about operator precedence? Being explicit helps to remind yourself what you are doing.

Syntax choices

  • Explicit operator precedence
  • Compound expressions
  • Package import choices

Coding Conventions

You should try to have an agreed policy for your team for these matters.

If your language sponsor has a standard policy, use that. For example:

Lint

There are automated tools which enforce coding conventions and check for common mistakes.

These are called ** formatters** and linters. Some widely used linters and formatters in the Python ecosystem ar -

  • pycodestyle: check your code against PEP8
  • pylint: useful information about the quality of your code
  • black: code formatter written in Python
  • ruff: blazing fast code formatter and linter written in Rust with ideas borrowed from Pythonic linters and formatters

Most of such tools can be directly used on Python files / repositories using a CLI utility. For instance -

In [12]:
%%bash --no-raise-error
pycodestyle species.py
species.py:2:6: E111 indentation is not a multiple of 4
species.py:2:6: E117 over-indented
In [13]:
%%bash --no-raise-error
pylint species.py
************* Module species
species.py:2:0: W0311: Bad indentation. Found 5 spaces, expected 4 (bad-indentation)
species.py:1:0: C0114: Missing module docstring (missing-module-docstring)
species.py:1:0: C0115: Missing class docstring (missing-class-docstring)
species.py:1:0: R0903: Too few public methods (0/2) (too-few-public-methods)

-----------------------------------
Your code has been rated at 0.00/10

In [14]:
%%bash --no-raise-error
ruff check species.py
All checks passed!

These linters can be configured to choose which points to flag and which to ignore.

Do not blindly believe all these automated tools! Style guides are guides not rules.

It is a good idea to run a linter before every commit, or include it in your CI tests.

pre-commit allows developers to add tools like linters and formatters as git hooks, such that they run before every commit. The hooks can be installed locally using -

pip install pre-commit
pre-commit install  # provided a .pre-commit-config.yaml is present in your repository

This would run the checks every time a commit is created locally. The checks will only run on the files modified by that commit.

Finally, there are tools like editorconfig to help sharing the conventions used within a project, where each contributor uses different IDEs and tools. There are also bots like pep8speaks and pre-commit.ci that comments/run checks on contributors' pull requests suggesting what to change to follow the conventions for the project.