This lesson is in the early stages of development (Alpha version)

Environment Variables

Overview

Teaching: 10 min
Exercises: 5 min
Questions
  • How are variables set and accessed in the Unix shell?

  • How can I use variables to change how a program runs?

Objectives
  • Understand how variables are implemented in the shell

  • Read the value of an existing variable

  • Create new variables and change their values

  • Change the behaviour of a program using an environment variable

  • Explain how the shell uses the PATH variable to search for executables

Episode provenance

This episode has been remixed from the Shell Extras episode on Shell Variables and the HPC Shell episode on scripts

The shell is just a program, and like other programs, it has variables. Those variables control its execution, so by changing their values you can change how the shell behaves (and with a little more effort how other programs behave).

Variables are a great way of saving information under a name you can access later. In programming languages like Python and R, variables can store pretty much anything you can think of. In the shell, they usually just store text. The best way to understand how they work is to see them in action.

Let’s start by running the command set and looking at some of the variables in a typical shell session:

$ set
COMPUTERNAME=TURING
HOME=/home/vlad
HOSTNAME=TURING
HOSTTYPE=i686
NUMBER_OF_PROCESSORS=4
PATH=/Users/vlad/bin:/usr/local/git/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin
PWD=/home/vlad
UID=1000
USERNAME=vlad
...

As you can see, there are quite a few — in fact, four or five times more than what’s shown here. And yes, using set to show things might seem a little strange, even for Unix, but if you don’t give it any arguments, it might as well show you things you could set.

Every variable has a name. All shell variables’ values are strings, even those (like UID) that look like numbers. It’s up to programs to convert these strings to other types when necessary. For example, if a program wanted to find out how many processors the computer had, it would convert the value of the NUMBER_OF_PROCESSORS variable from a string to an integer.

Showing the Value of a Variable

Let’s show the value of the variable HOME:

$ echo HOME
HOME

That just prints “HOME”, which isn’t what we wanted (though it is what we actually asked for). Let’s try this instead:

$ echo $HOME
/home/vlad

The dollar sign tells the shell that we want the value of the variable rather than its name. This works just like wildcards: the shell does the replacement before running the program we’ve asked for. Thanks to this expansion, what we actually run is echo /home/vlad, which displays the right thing.

Creating and Changing Variables

Creating a variable is easy — we just assign a value to a name using “=” (we just have to remember that the syntax requires that there are no spaces around the =!):

$ SECRET_IDENTITY=Dracula
$ echo $SECRET_IDENTITY
Dracula

To change the value, just assign a new one:

$ SECRET_IDENTITY=Camilla
$ echo $SECRET_IDENTITY
Camilla

Environment variables

When we ran the set command we saw there were a lot of variables whose names were in upper case. That’s because, by convention, variables that are also available to use by other programs are given upper-case names. Such variables are called environment variables as they are shell variables that are defined for the current shell and are inherited by any child shells or processes.

To create an environment variable you need to export a shell variable. For example, to make our SECRET_IDENTITY available to other programs that we call from our shell we can do:

$ SECRET_IDENTITY=Camilla
$ export SECRET_IDENTITY

You can also create and export the variable in a single step:

$ export SECRET_IDENTITY=Camilla

Using environment variables to change program behaviour

Set a shell variable TIME_STYLE to have a value of iso and check this value using the echo command.

Now, run the command ls with the option -l (which gives a long format).

export the variable and rerun the ls -l command. Do you notice any difference?

Solution

The TIME_STYLE variable is not seen by ls until is exported, at which point it is used by ls to decide what date format to use when presenting the timestamp of files.

You can see the complete set of environment variables in your current shell session with the command env (which returns a subset of what the command set gave us). The complete set of environment variables is called your runtime environment and can affect the behaviour of the programs you run.

Job environment variables

When SGE runs a job, it sets a number of environment variables for the job. One of these will let us check what directory our job script was submitted from. The SGE_O_WORKDIR variable is set to the directory from which our job was submitted.

Using the SGE_O_WORKDIR variable, modify your job so that it prints out the location from which the job was submitted.

Solution

[yourUsername@login12 ~]$  nano example-job.sh
[yourUsername@login12 ~]$  cat example-job.sh
#!/bin/bash -l
#$ -l h_rt=00:00:30

echo -n "This script is running on "
hostname

echo "This job was launched in the following directory:"
echo ${SGE_O_WORKDIR}

To remove a variable or environment variable you can use the unset command, for example:

$ unset SECRET_IDENTITY

The PATH Environment Variable

Similarly, some environment variables (like PATH) store lists of values. In this case, the convention is to use a colon ‘:’ as a separator. If a program wants the individual elements of such a list, it’s the program’s responsibility to split the variable’s string value into pieces.

Let’s have a closer look at that PATH variable. Its value defines the shell’s search path for executables, i.e., the list of directories that the shell looks in for runnable programs when you type in a program name without specifying what directory it is in.

For example, when we type a command like analyze, the shell needs to decide whether to run ./analyze or /bin/analyze. The rule it uses is simple: the shell checks each directory in the PATH variable in turn, looking for a program with the requested name in that directory. As soon as it finds a match, it stops searching and runs the program.

To show how this works, here are the components of PATH listed one per line:

/Users/vlad/bin
/usr/local/git/bin
/usr/bin
/bin
/usr/sbin
/sbin
/usr/local/bin

On our computer, there are actually three programs called analyze in three different directories: /bin/analyze, /usr/local/bin/analyze, and /users/vlad/analyze. Since the shell searches the directories in the order they’re listed in PATH, it finds /bin/analyze first and runs that. Notice that it will never find the program /users/vlad/analyze unless we type in the full path to the program, since the directory /users/vlad isn’t in PATH.

This means that I can have executables in lots of different places as long as I remember that I need to to update my PATH so that my shell can find them.

What if I want to run two different versions of the same program? Since they share the same name, if I add them both to my PATH the first one found will always win. In the next episode we’ll learn how to use helper tools to help us manage our runtime environment to make that possible without us needing to do a lot of bookkeeping on what the value of PATH (and other important environment variables) is or should be.

Key Points

  • Shell variables are by default treated as strings

  • Variables are assigned using “=” and recalled using the variable’s name prefixed by “$

  • Use “export” to make an variable available to other programs

  • The PATH variable defines the shell’s search path