3. Variables

Shell Variables Defined
Declaring Variables
Properties of Variables
Environment Variables
Special Parameters
Reading Data into Variables

3.1 Shell Variables Defined

Just like any other programming language, the shell supports variables. A variable is a logical name for a stored value. The value may be changed during program execution, hence, the term variable. In the Bourne shell, a variable's name may be any string of alphanumeric characters and underscores (_). Whitespace is not allowed. Examples of legal variable names include:

COUNT
HOME_DIRECTORY
_file1

Illegal variable names include:

FILE*
The Answer
array(i)

Some variables are predefined by the shell. They are known as environment variables and special parameters. All other variables are user defined. It is the scripter's responsibility to declare such variables and assign them values.

3.2 Declaring Variables

A script assigns a value to a variable by declaring the variable's name immediately followed by an equal sign (=) and the value.

COUNT=1
HOME_DIRECTORY=/export/home/bugsy
_file1=table.dat

No white space is permitted between the variable, the equal sign, and the value. If spaces exist in a variable assignment, the shell tries to execute either the variable name of the value as can be seen:

$ COUNT =1
COUNT: not found
$ COUNT= 1
1: execute permission denied
$ COUNT=1
$

To retrieve the value stored in a variable, the variable must be preceded by a dollar sign ($). Getting the value is called dereferencing.

$ COUNT=1
$ echo $COUNT
1

Just as the name of a variable cannot contain white space, neither may the value.

$ NAME=John Doe
Doe: not found

But with the help of the shell's quoting mechanisms, it is possible to create strings that do contain white space and then assign them to a variable's value.

$ NAME="John Doe"
$ echo $NAME
John Doe
$ NAME='Jane Doe'
$ echo $NAME
Jane Doe

Moreover, the same quoting techinques can be used to place quotes in a variable's value.

$ ERROR_MSG="You can't do that!"
$ echo $ERROR_MSG
You can't do that!
$ A_QUOTE='"To be, or not to be"'
$ echo $A_QUOTE
"To be, or not to be"

Using quotes, it is even possible to store the output of a command into a variable.

$ XTERMINALS=`ypcat hosts | grep ncd[0-9][0-9] | \
> cut -f2 | cut -f2 -d" " | uniq`
$ echo "The NCD xterminals are $XTERMINALS"
The NCD xterminals are ncd05
ncd02
ncd03
ncd01
ncd07
ncd04
ncd06

Filename expansion can be used to assign values to variables. The usual rules for metacharacters apply.

$ XPROGS=/usr/bin/x*
$ echo $XPROGS
/usr/bin/xargs /usr/bin/xgettext /usr/bin/xstr
$ IPROGS=/usr/bin/i[0-9]*
$ echo $IPROGS
/usr/bin/i286 /usr/bin/i386 /usr/bin/i486 /usr/bin/i860 /usr/bin/i86pc

Braces can be used during variable dereferencing. It is highly recommended that programmers consistently delimit variables with braces. The technique has a few advantages. First, it makes variables standout within a script. Readers can easily see the variable being dereferenced within a block of code. Second, it has the advantage of permitting the generation of expanded strings from variable values. If a script declared a variable named FILE, it could then make a backup copy of the actual value stored in FILE:

$ FILE=oracle.log
$ cp ${FILE} ${FILE}.bak
$ ls
oracle.log oracle.log.bak

This technique would work without the braces, but notice the difference between using the braces and not using them.

$ cp $FILE $FILE.bak

In this case, is it obvious to the reader that the second dereference is the value of FILE with .bak appended to it, or might the reader think that the variable being dereferenced is FILE.bak? By using braces to delimit the variable, the problem is solved. Certainly it is clear that the variable is FILE not FILE.bak. Moreover, there are times when appending text to the variable name may produce unexpected results.

$ cp $FILE $FILE2
cp: Insufficient arguments (1)
Usage: cp [-i] [-p] f1 f2
 cp [-i] [-p] f1 ... fn d1
 cp [-i] [-p] [-r] d1 d2

Here, the user attempts to create a copy of oracle.log into oracle.log2, but since digits can be part of a variable name, the shell tries to dereference the undefined variable FILE2. Third, braces allow parameter substitution.

Parameter substitution is a method of providing a default value for a variable in the event that it is currently null. The construct uses a combination of braces delimiting a variable and its default. The variable and default value are separated by a keyword. The keyword serves as a condition for when to assign the value to the variable. The list of keywords is shown in the following table.

*Table 3.2-1. Summary of Parameter Substitution*
Construct	Meaning
`$parameter` `${parameter}`	Substitue the value of `parameter`.
`${parameter:-value}`	Substitue the value of `parameter` if it is not null; otherwise, use `value`.
`${parameter:=value}`	Substitue the value of `parameter` if it is not null; otherwise, use `value` and also assign `value` to `parameter`.
`${parameter:?value}`	Substitue the value of `parameter` if it is not null; otherwise, write `value` to standard error and exit. If `value` is omitted, then write "`parameter: parameter null or not set`" instead.
`${parameter:+value}`	Substitue `value` if `parameter` is not null; otherwise, substitue nothing.

Supposing a script tries to dereference a null variable, a good programmer can avoid catastrophic errors by using parameter substitution:

$ echo $NULL
 
$ echo "Is it null? : ${NULL} :" 
Is it null? : :
$ echo "Is it null? : ${NULL:-Nope} :"
Is it null? : Nope :
$ echo "Actually it still is null : ${NULL:?} :"
NULL: parameter null or not set
$ echo "We'll take care of that : ${NULL:=Nope} :"
We'll take care of that : Nope :
$ echo "Is it null? : ${NULL} :"
Is it null? : Nope :

3.3 Properties of Variables

First and foremost, the shell has no concept of data types. Some programming languages such as C++ or Pascal are strongly typed. In these languages, variables are grouped into type classes such as integer, character, and real. Typed variables expect a specific format for their value, and generally, they only operate within their class. For example, a real valued variable does not add well with a character. For the Bourne shell, there is no such restriction because every variable is simply a string of characters. This is why the results of commands and filename expansions are as legal as typing in values by hand. For the shell, there is no concept of integers, reals, or characters. The shell does not understand that COUNT=1 means that COUNT has an integral value. It only knows that the variable holds a single character string with the ASCII value for 1.

Variables are read-write by default. Their values may be dereferenced and reset at will. If the need arises, a variable can be declared to be read-only by using the readonly command. The command takes a previously declared variable and applies write protection to it so that the value may not be modified.

$ NOWRITE="Try to change me"
$ readonly NOWRITE
$ echo ${NOWRITE}
Try to change me
$ NOWRITE="Ugh, you changed me"
NOWRITE: is read only

It is important to remember that once a variable is declared read-only, from that point on, it is immutable. Therefore, it is equally important to set the variable's value before applying write protection.

The scope of variables within a script is important to consider as well. Variables defined by a shell script are not necessarily visible to subshells the program spawns. Given a variable x defined in a shell session and given a script local executed during the same session, the value of x is undefined for script local:

$ x=World
$ echo ${x}
World
$ cat local
#!/bin/sh
echo "Hello ${x}"
$ local
Hello

On the other hand, it is possible to instruct a shell session to publish its variables to all of its subshells. Called variable exporting, the technique uses the export command. Export takes a list of variables as its arguments. Each variable listed gains global scope for the current session.

$ cat local2
#!/bin/sh
echo "${y} ${z}"
$ y=Hello
$ z=World
$ local2
 
$ export y z
$ local2
Hello World

In the example above, variables y and z are invisible to local2 as shown by the first invocation, but after applying export to both variables, local2 gains access to them as if it had declared them itself. Now eventhough a shell can publish its variables to subshells, the converse does not hold. Subshells cannot promote local variables to global scope.

$ cat local3
#!/bin/sh
a=foo
export foo
$ b=bar
$ echo "${a} ${b}"
 bar
$ local3
$ echo "${a} ${b}"
 bar

Although local3 attempts to export a to its parent shell, the active session, a does not gain visibility. Only the subshells of local3 would be able to use a. Furthermore, subshells retain a copy of exported variables. They may reassign the value of a global variable, but the reassignment is not permanent. Its effects last only within the subshell. The global variable regains its original value as soon as control is returned to the parent shell.

$ cat local4
#!/bin/sh
echo "var1 = ${var1}"
var1=100
echo "var1 = ${var1}"
$ var1=50
$ export var1
$ local4
var1 = 50
var1 = 100
$ echo ${var1}
50

The listing of local4 shows that it prints var1's value and then assigns a value of 100 to it. Var1 is first set to 50 and then exported by the active shell. Local4 is then executed. The output shows that the variable is correctly exported and has its value changed to 100. At this point, the script terminates, and control returns to the interactive session. A quick echo of var1 shows that the global instance of the variable has not changed.

3.4 Environment Variables

The Bourne shell predefines a set of global variables known as environment variables. They define default values for the current account's home directory, the strings to use as the primary and secondary prompts, a list of directories to search through when attempting to find an executable, the name of the account, and the current shell, to name a few. The values for these variables are generally set at login when the shell reads the .login script. This script resides within the account's home directory. Environment variables are by no means set in stone. They may be changed from session to session, but generally, they remain constant and keep a user's account configured for the system. A short listing of some common environment variables is shown in Table 3.4-1.

*Table 3.4-1. Common Environment Variables*
Variable	Meaning
`HOME`	The absolute path to the user's home directory.
`IFS`	The internal field separator characters. Usually contains space, tab, and newline.
`PATH`	A colon separated list of directories to search through when trying to execute a command.
`PS1` `PS2`	The strings to use as the primary, `PS1` (`$`), and secondary prompts, `PS2` (`>`).
`PWD`	The current working directory.
`SHELL`	The absolute path to the executable that is being run as the current shell session.
`USER`	The name of the current account.

Changing an environment variable is as easy as assigning it a new value. As an example, the current directory could be appended to PATH's value. Doing so would allows a user to execute programs in the current directory without specifying a fully qualified path to the executable.

$ echo ${PATH}
/bin:/usr/bin:/usr/ucb/bin
$ PATH=${PATH}:${PWD}
$ echo ${PATH}
/bin:/usr/bin:/usr/ucb/bin:/home/rsayle

3.5 Special Parameters

In addition to predefining environment variables, the shell accepts a set of built-in variables known as special parameters. There are two types. The first type provides a method for accessing a set of arguments. When a command is issued at the prompt or a function is called, it often takes a set of parameters that define what it operates on or how to perform its operations. The script or function retrieves this argument list by using the positional parameters.

$ cat show-pos-params
#!/bin/sh
echo ${1} ${2} ${3} ${4} ${5} ${6} ${7} ${8} ${9}
$ show-pos-params testing 1 2 3 ...
testing 1 2 3 ...
$ show-pos-params One, two, buckle your shoe, three four shut the door
One, two, buckle your shoe, three four shut the

There are nine positional parameters available, and each is accessible by dereferencing the argument number. The script show-pos-params listed above demonstrates how to use the positional parameters. The example also shows that only nine arguments can be handled by a script, or a function, at any time. In the second invocation of show-pos-params, the tenth argument, door, does not get printed. It has not vanished and is still available to the script; it is just not readily available via the positional parameters. This could be remedied by shifting the positional parameters so that door moved from the tenth to the ninth position. Shifting is discussed in Section 6, Looping.

An astute reader might now ask about ${0}. Zero is a special positional parameter. It always holds the name of the executing program. Usually, it also includes a path to the program.

$ cat show-pos-param-0
#!/bin/sh
echo "This script's name is ${0}"
$ show-pos-param-0
This script's name is ./show-pos-param-0

The second set of special parameters look a little strange. They seem more like punctuation than variables because each special parameter is a symbol. They are, however, extremely useful for manipulating arguments, processes, and even the currently executing shell.

*Table 3.5-1. The Special Shell Parameters*
Parameter	Usage
`$#`	The number of arguments passed to the shell or the number of parameters set by executing the `set` command.
`$*`	Returns the values of all the positional parameters as a single value.
`$@`	Same as `$*` except when double qouted it has the effect of double quoting each parameter.
`$$`	Returns the process id of the current shell session or executing script.
`$!`	Returns the process id of the last program sent to the background.
`$?`	Returns the exit status of the last command not executed in the background.
`$-`	Lists the current options in effect (same as executing `set` with no arguments).

A quick example shows how these parameters behave. The succeeding chapters will demonstrate their use extensively.

$ cat show-special-params
#!/bin/sh
echo "There are $# arguments: $*"
echo Or if we needed them quoted, they would be: "$@"
echo "If I wanted to backup this file, I might use ${0}.$$"
echo "The last echo had exit status = $?"
$ show-special-params with some arguments
There are 3 arguments: with some arguments
Or if we needed them quoted, they would be: with some arguments
If I wanted to backup this file, I might use ./show-special-params.2163
The last echo had exit status = 0

Notice that in the second echo command, with some arguments was not printed as "with" "some" "arguments". This is a result of how echo treats quoted values; however, with some arguments is actually passed to echo as separately quoted values. This has the advantage of protecting the original parameter values. Without double quotes, the shell could interpret what was meant to be distinct values as one value.

3.6 Reading Data into Variables

Data can be stored into user defined variables by assignment, as has already been shown, interactively, or from a file. To have a user provide a value or to get the value from a file, the read command must be used. Read takes a list of variables as its arguments. When being used interactively, it suspends program execution and awaits user input. After the user enters some text, read scans the line of input and seequentially assigns the words found to each of the variables in the list.

$ cat readit
#!/bin/sh
TIC_TAC="tic tac"
echo "${TIC_TAC} ?"
read ANSWER
echo "${TIC_TAC} ${ANSWER}"
$ readit
tic tac ?
toe
tic tac toe

Above, readit pauses after printing tic tac ? On the next line, the user enters toe. Read receives the value from standard input and stores it in the new variable ANSWER which becomes immediately usable by the program. For read, words are delimited according to the characters defined by the internal field separator variable, IFS. It is usually set to be any white space. The example shows an interactive application; read may also be used to extract values from a file. To do this, however, requires advanced knowledge of how to combine looping constructs with I/O. This is explained later in Section 8.1, More I/O.

Continuing with read's behavior when used interactively, if there are not enough words listed in the line of input, the remaining variables are simply left null.

$ cat not-enough-args
#!/bin/sh
read a b c d e
echo "e = ${e}"
$ not-enough-args
There aren't enough variables
e =

On the other hand, the last variable gets all the extra words if the list of variables is shorter than the available words.

$ cat more-than-enough-args
#!/bin/sh
read a b c
echo "c = ${c}"
$ more-than-enough-args
There are more than enough variables
c = more than enough variables

The point here is that programmers should be aware that unexpected results can occur quite easily when using the read command. It is therefore important to provide rudimentary checks on a variable after reading its value. Section 5, Branching presents the tools to perform such checks.