9. Debugging Hints

Declaring the Shell

A very common mistake among novice scripters is that they forget to declare the shell. The first line of a shell script should always be the shell declaration, #!/bin/sh. Without it, the script automatically inherits the shell of the parent program. In other words, when a script does not declare the shell type, it uses the same shell that is currently being used. This can yield undesirable results if, for example, an operator uses the csh for command line processing and then decides to run a Bourne shell script that does not reset the running shell. The results of neglecting to declare the shell are not necessarily deleterious. In most cases in fact, it simply results in a program crash with a lot of unexpected errors. But it happens often enough to UNIX users that it is worthy of mention and should be added to the list of debugging hints.

Tracing Program Execution

Traditional programming languages that require compiling and linking of programs have sophisticated tools for debugging. These tools require a program to be compiled with certain flags that allow the program's symbols and instructions to be loaded into a run-time debugger. Within the debugger, a programmer can do all sorts of nifty things such as step through a program's execution, set stop points, dump the function stack, and even examine and sometimes change the state of variables. Of course, all of these actions allow the programmer to understand, correct, and fine tune the program's execution. Unfortunately, the shell does not have a readily available debugger, but it does have a tool that is close enough.

By providing options to the set command a programmer can trace the steps through a script's execution. All of the options to set were introduced in Section 8.6, Setting Shell Flags. The most useful option is -x. This option causes the shell to print each command as it is evaluated. This is especially useful to a scripter because the results of actions such as variable substitution and file name expansion are shown. Set -x could be further enhanced by combining -v with -x. The -v option instructs the shell to print a command before it is evaluated. The combination obviously allows a scripter to see the command followed by the results of the shell evaluating it. In practice, however, this results in an extremely verbose output that may actually be difficult to follow. Readers are encouraged to try both methods and decide which works for the job at hand. The rest of this chapter will only consider the use of -x.

Now to turn the tracing option on, a script can be executed at the command line by preceding it with sh -x, but a better method is to include the set -x command in the script. Placing this command on a line by itself instructs the shell to start tracing on the commands that follow. To disable it, the scripter simply comments out the line. It is a good habit to include this line within scripts so that debugging is readily available.

As an example, the mkraddb script is reintroduced. Mkraddb takes a plain text file as its argument and generates a RADIUS database from its contents:

#!/bin/sh
#
# mkraddb: generate the RADIUS configuration of a set of users
#
#

# Uncomment the following line for debugging
set -x

# Global Vars
PROG=`basename ${0}`
USAGE="${PROG}: usage: ${PROG} users_file"
DATE=`date +%m%d%y`
RADDBUSERS="raddb.users.${DATE}"

######################################################################
# buildRaddbUsers
######################################################################

buildRaddbUsers () {

cat >>${RADDBUSERS} <<EOF
${name}-out Password = "ascend",
User-Service = Dialout-Framed-User
User-Name = ${name},
Framed-Protocol = MPP,
Framed-Address = ${ipaddr},
Framed-Netmask = 255.255.255.255,
Ascend-Metric = 2,
Framed-Routing = None,
Ascend-Route-IP = Route-IP-Yes,
Ascend-Idle-Limit = 30,
Ascend-Send-Auth = Send-Auth-CHAP,
Ascend-Send-Passwd = "password",
Ascend-Receive-Secret = "password"

EOF

}

######################################################################
# Main
######################################################################

if [ $# -ne 1 ]; then
echo ${USAGE}
exit 1
fi

# Create the configs
while read name ipaddr
do
if [ "${name}" = "#" -o "${name}" = "" ]; then
# continue if the line is a comment or is blank
continue
fi

# Create the RADIUS users information
buildRaddbUsers

done <${1}

# Clean up
exit 0

The eighth line is particularly important because it demonstrates the use of set -x. The script lists this command prior to any others. By doing so, it enables tracing on the rest of the script. An astute reader might decide that rather than making it the first command, it could be enabled by making it an option to the program. For example, entering mkraddb -d at the command line might cause the script to dynamically execute tracing. This in fact can be done, but it should be noted that it would also disable the debugging of option handling itself.

For the example, a programmer runs mkraddb with tracing enabled. The script is passed the file users whose contents are first displayed for comparing with the trace output. The trace then follows. Line numbers are included with the trace for reference during the explanation of the results; line numbers are not part of the output when a script is run with set -x. The directory listing is also shown at the end simply to demonstrate successful creation of the RADIUS profiles.

$ ls
mkraddb users
$ cat users
bob 192.168.0.1
joe 172.16.20.2
mary 10.0.25.3
$ mkraddb users
1 ++ basename ./mkraddb
2 + PROG=mkraddb
3 + USAGE=mkraddb: usage: mkraddb users_file
4 ++ date +%m%d%y
5 + DATE=090598
6 + RADDBUSERS=raddb.users.090598
7 + [ 1 -ne 1 ]
8 + read name ipaddr
9 + [ bob = # -o bob = ]
10 + buildRaddbUsers
11 + cat
12 + read name ipaddr
13 + [ joe = # -o joe = ]
14 + buildRaddbUsers
15 + cat
16 + read name ipaddr
17 + [ mary = # -o mary = ]
18 + buildRaddbUsers
19 + cat
20 + read name ipaddr
21 + exit 0
$ ls
mkraddb raddb.users.090598 users

The trace starts right where the program sets its global variables. Lines one through six show the process. Referring back to the script's code above, the first variable is PROG, which stores the script's name. To set PROG's value, the basename command is run against the zeroeth positional parameter. The trace's first line shows this action. One interesting item to note is the expansion of ${0} into ./mkconfig. The second line displays PROG being set to the results of the first action. Then line three shows USAGE being set. USAGE gets a string that explains how to properly execute the program. The string's value uses PROG to help create the message as can be seen by comparing the code versus line three's output. Lines four through six demonstrate much of the same. Four and five show that DATE gets assigned the result of executing the date command, and six then uses DATE's value to generate the name of the script's resultant file stored in RADDBUSERS.

Line seven displays the trace of the next code block. After setting the global variables, the script then checks that it is executed properly. This is done with the if block that immediately follows the comment that marks where the main program execution begins. The if blocks tests the command line options to be certain that an argument was passed to the script. Line seven shows the evaluation of the if's test, which checks to see if $# is not equal to one. If it is not, the script prints the USAGE error message and terminates so that the user can correct the problem. In this instance, $# in fact evaluates to one making the test false. Since it is false, the user ran the program correctly, and the script proceeds past the if block into its real work.

The rest of the example shows what a loop looks like when being traced. It can effectively be seen in blocks of four lines each. Lines eight through eleven are the first execution of the script's while loop. The loop keys off of the line by line processing of the file passed as the script's first argument. As long as their is a non-empty line in the file, the loop stuffs the line's value into the intermediate variables name and ipaddr. This is exactly what line eight shows. The read command reads from users, finds bob and bob's IP address, stuffs these two values into name and ipaddr, and then returns true to while. Since while receives a true value for its test, the loop proceeds.

The ninth line displays the if block contained within the while loop. This block checks to see if a comment or blank line was read from the file. It does so by comparing the value of name against the strings # and "", the null string. As can be seen on line nine, bob is definitely not the same as either of these test values, and so the program moves forward. Its next step is to call the function buildRaddbUsers as is done on line 10.

Line eleven is the complete trace of the function's entire execution. All the trace shows is that the script calls the cat command. This is a far cry from what the function is really doing. Looking at the function's code shows that it uses cat in a complex redirection scheme to create the user's profile. To be more specific, it appends the lines between the end of file markers, EOF, into the RADDBUSERS file. During this process, it substitutes the values of the intermediate variables name and ipaddr in their appropriate places within the user's RADIUS profile. But all of this action is hidden within the trace as simply an execution of cat.

The trace then shows the next interations of the loop. To summarize, the loop performs the same process on the next two lines in users. After adding the two other users listed in the file, the loop executes one last read at line 20 of the trace. The final read returns a false value to the loop since there are no more lines to read from users. At this point, the loop terminates and picks up at the script's exit point shown by line 21.

As a final note, scripters are reminded to recomment the set -x statement after performing debugging. It is very easy to do all sorts of work to get a program to run properly and then to forget to turn off the trace. Needless to say, it could be quite confusing to users who then execute the script and get all sorts of output that is really nothing more than gibberish to the untrained eye.

Command Line Debugging

One of the primary advantages of shell scripting over traditional programming languages is the ability to run the same commands given in a script at the command line. Any command in a shell script can be run at the UNIX command prompt and vice versa. This feature makes the shell good for rapid prototyping because a programmer can try constructs in a live environment before commiting them to a script.

By the same token, this makes the command line a useful debugging tool. If a scripter suspects a portion of a script to be causing an error, the programmer may run the command block at the UNIX prompt. It can be done so in a controlled manner. A scripter can preset variables or functions and then issue commands one at a time. After each step, the scripter has the opportunity to check the state of variables and examine the results of the previously issued command.

There are many examples given in this book that use the command line to demonstrate programming techniques. Rather than show yet another, this subsection finishes by listing some good cases of when this technique is useful. Programmers can use command line debugging:

To check a variable's value when it is being set by file name expansion or command substitution.
To build filters step-by-step.
For building or testing if statements, case blocks, and loops.
To determine all the necessary options to a command before entering it in a script.
For testing a command block to identify where an error is occurring.

Pausing Program Execution

As stated earlier in this chapter, one of the features of a debugger is the ability to set stop points in a program. When a user loads the program into the debugger and runs it, the debugger pauses the program at each stop point. With the program stopped, the user gets the opportunity to check the program's state and understand how well it is functioning.

The shell does not have a debugger to do the same thing, but it can be emulated by a combination of using set -x and the strategic placement of read statements in the script. Set -x causes the shell to print each command as it executes them from a script, and a read statement forces the program to stop and await user input. Since the program stops at the read command, the programmer gets a chance to review the script's execution up to that point. Hence, a read acts like a stop point. When the user is ready to continue to the next stop point, a bogus value can be entered at the prompt. If no argument follows the read command, the shell simply discards the bogus entry. Actually, no value needs to be provided. Hitting enter suffices. If a programmer solves the problem being investigated, the user may alternatively terminate the program's execution by entering a control character.

To demonstrate, here is a script that counts to five:

$ cat count
#!/bin/sh
set -x
COUNT=0
while [ ${COUNT} -le 5 ]
do
echo "COUNT=${COUNT}"
read
COUNT=`expr ${COUNT} + 1`
done

Needless to say, this is hardly a complex script, but it shows the technique well enough. The first thing to notice is that the script enables tracing immediately with set -x. The second is the placement of read for pausing the script. The programmer chooses to embed it within the loop. In fact it halts the program just before it changes COUNT's value. Presumably, the programmer wants to watch COUNT by checking its value before attempting to increment it. Executing the script shows the results.

$ count
+ COUNT=0
+ [ 0 -le 5 ]
+ echo COUNT=0
COUNT=0
+ read

++ expr 0 + 1
+ COUNT=1
+ [ 1 -le 5 ]
+ echo COUNT=1
COUNT=1
+ read

++ expr 1 + 1
+ COUNT=2
+ [ 2 -le 5 ]
+ echo COUNT=2
COUNT=2
+ read

++ expr 2 + 1
+ COUNT=3
+ [ 3 -le 5 ]
+ echo COUNT=3
COUNT=3
+ read

++ expr 3 + 1
+ COUNT=4
+ [ 4 -le 5 ]
+ echo COUNT=4
COUNT=4
+ read

++ expr 4 + 1
+ COUNT=5
+ [ 5 -le 5 ]
+ echo COUNT=5
COUNT=5
+ read

++ expr 5 + 1
+ COUNT=6
+ [ 6 -le 5 ]

In the first block above, the trace shows COUNT being set to its initial value of zero. Following that is the first test of the variable in the while loop. The test passes, so the script prints the the variable's value shown by the echo and the resulting output. The block ends with a read. At that point, the shell pauses execution pending input from an operator. Now these actions happen much faster than it takes to read this paragraph and compare it against the text above, but by using the read to stop the script, the operator has a chance to review it. After doing so, the operator hits the enter key shown by the blank line. The program reads the return character entered and resumes its execution.

The next trace block is much the same as the first except that it shows the incrementing of COUNT. First, the expr command adds one to the current value of COUNT, and then the shell assigns the new value to the variable. The loop tests the new value, still finds that it is within range, prints it, pauses at the read once more, and awaits user input. The process repeats until the variable's value passes five and then the script completes normally.

Null Commands

Sometimes, a good way to debug a script is to do nothing at all. The shell provides a null command in the form of a colon. This command can be useful for checking the various tests in an if block, the switches of a case, or a loop's exit condition. The idea is that a programmer can write the framing of an if, case, or loop but provide the null command for the internal processing. Then the script can be run to allow the testing of the code without sweating the details of the its real function.

To demonstrate, below is the beginning of a script to check the disk usage for a set of user accounts.

$ cat ducheck
#!/bin/sh

PROG=`basename ${0}`
USAGE="${PROG}: usage: ${PROG} name1 ... nameN"

if [ $# -gt 0 ]; then
:
else
echo ${USAGE}
exit 1
fi

The script's author currently has only written the code for verifying that the correct number of arguments are passed to the program. If they are, then the script executes the null command, breaks out of the if block, and terminates with no output. Eventually, when the programmer determines the command to gather the disk usage, the user can replace the colon with the appropriate functions. A test run of ducheck shows that the script behaves accordingly when it is passed arguments.

$ ducheck root securid rsayle
$

But for the time being the scripter wishes only to test the proper execution of the if statement when the arguments to the script are incorrect. In particular, the author wants to check that the script prints an error message when no arguments have been passed.

$ ducheck
ducheck: usage: ducheck name1 ... nameN
$

As can be seen, the script handles an empty argument list just as the programmer intended.

Interpretting Errors

To conclude this book, the final section shows examples of syntax errors and how the shell reports them. The reader should note that due to variations in UNIX, the specific error messages may not be exactly the same. Still, the examples do provide some insight into common mistakes, and hopefully, the wary scripter can learn them and identify them during debugging.

The examples focus on a script called tgzit. The script's intended function is to take a list of files and directories as arguments and then to store them into a compressed tape archive. First, the correct script is shown for comparison against the errant versions.

#!/bin/sh

# global variables
PROG=`basename ${0}`
USAGE="${PROG}: usage: ${PROG} archive file1 ... fileN"

# check the command line arguments
if [ $# -lt 2 ]; then
echo ${USAGE}
exit 1
fi

# get the archive name from the command line
ARCHIVE=${1}
shift

# build the archive
tar cf ${ARCHIVE}.tar "$@"
gzip ${ARCHIVE}.tar
mv ${ARCHIVE}.tar.gz ${ARCHIVE}.tgz

By this point, readers should be well versed in shell scripting and should be able to decipher the actions above. For the first example, our scripter tries running a slightly different version of tgzit and gets the following error message.

$ tgzit program-archive ./bin ./include ./source
./tgzit: syntax error near unexpected token `fi'
./tgzit: ./tgzit: line 11: `fi'

The shell is specific in the problem here. It states that it encountered the termination of an if block somewhere near line 11 within the script. This at least points the user at a particular code block. Checking the contents of the script shows the error.

$ cat -n tgzit
1     #!/bin/sh
2
3     # global variables
4     PROG=`basename ${0}`
5     USAGE="${PROG}: usage: ${PROG} archive file1 ... fileN"
6
7     # check the command line arguments
8     if [ $# -lt 2 ] then
9       echo ${USAGE}
10      exit 1
11    fi
12
13    # get the archive name from the command line
14    ARCHIVE=${1}
15    shift
16
17    # build the archive
18    tar cf ${ARCHIVE}.tar "$@"
19    gzip ${ARCHIVE}.tar
20    mv ${ARCHIVE}.tar.gz ${ARCHIVE}.tgz

At the eleventh line is the fi statement. Now this is not the error, and it requires a review of the entire if block to determine the problem. In this case, the error is fairly subtle. Looking back at the test condition on line eight, a careful examination reveals that the scripter failed to terminate the test statement with a semi-colon; hence, the complaint of the unexpected token. The fi was unexpected because the test was not punctuated correctly.

This example not only demonstrates this common typo, but it also emphasizes that the shell can detect and report syntax errors, but it does not necessarily report the error's exact location. Programmers must remember to consider code blocks rather than single lines in order to find problems.

Another common error involves misquoting. Here, a scripter executes tgzit and receives an error indicating this condition.

$ tgzit program-archive ./bin ./include ./source
./tgzit: unexpected EOF while looking for `"'
./tgzit: ./tgzit: line 20: syntax error

The interesting point to note is that the shell reports the correct problem, but it gives a false indication of where to find the error. Reviewing the source for this buggy version of tgzit, line 20 happens to be the final line in the script and, in fact, has no quotes whatsoever.

$ cat -n tgzit
1    #!/bin/sh
2
3    # global variables
4    PROG=`basename ${0}`
5    USAGE="${PROG}: usage: ${PROG} archive file1 ... fileN
6
7    # check the command line arguments
8    if [ $# -lt 2 ]; then
9      echo ${USAGE}
10     exit 1
11   fi
12
13   # get the archive name from the command line
14   ARCHIVE=${1}
15   shift
16
17   # build the archive
18   tar cf ${ARCHIVE}.tar "$@"
19   gzip ${ARCHIVE}.tar
20   mv ${ARCHIVE}.tar.gz ${ARCHIVE}.tgz

Quoting errors can be extremely difficult to find in especially large scripts because the shell does not pin-point the error's location. Fortunately, for this script, it is easy to see that at the end of the fifth line, the scripter forgot to close the double-quotes that set the USAGE variable. Still the shell does not report the error until after reading the entire script because it is perfectly legal for quoted strings to extend over multiple lines. Unfortunately for scripters this fact makes it hard to find such errors. They must either inspect a script by eye or hopefully can employ a command in their text editor that shows matching opening and closing punctuation.

The next example is not really a syntax error as much as it is logical, but it is a frequent enough occurrence that it warrants attention. In this case, the programmer forgets that the proper usage of the script calls for at least two arguments: the first being the archive in which to store files and the rest being the files to archive.

$ cat tgzit
#!/bin/sh

# global variables
PROG=`basename ${0}`
USAGE="${PROG}: usage: ${PROG} archive file1 ... fileN"

# check the command line arguments
if [ $# -lt 1 ]; then
echo ${USAGE}
exit 1
fi

# get the archive name from the command line
ARCHIVE=${1}
shift

# build the archive
tar cf ${ARCHIVE}.tar "$@"
gzip ${ARCHIVE}.tar
mv ${ARCHIVE}.tar.gz ${ARCHIVE}.tgz

The problem is in the test for the correct number of arguments. The scripter incorrectly checks for at least one argument. The program assumes that the first argument is the archive to be built. If a user executes the tgzit with just one argument, tar has nothing to put in the archive.

$ tgzit program-archive
tar: Cowardly refusing to create an empty archive
Try `tar --help' for more information.
prog-archive.tar: No such file or directory
mv: prog-archive.tar.gz: No such file or directory

The script itself does not yield the error. Instead, the tar command notes that it has nothing to do. The script continues past the tar statement, but the remaining commands also have nothing to do since the archive does not exist. They complain accordingly. Scripters must be careful to check and test their scripts' arguments.

Another common error scripters make is simply a typo. It is very easy to forget to close punctuation properly or to create malformed statements all together.

$ cat tgzit
#!/bin/sh

# global variables
PROG=`basename ${0}`
USAGE="${PROG}: usage: ${PROG} archive file1 ... fileN"

# check the command line arguments
if [ $# -lt 2]; then
echo ${USAGE}
exit 1
fi

# get the archive name from the command line
ARCHIVE=${1}
shift

# build the archive
tar cf ${ARCHIVE}.tar "$@"
gzip ${ARCHIVE}.tar
mv ${ARCHIVE}.tar.gz ${ARCHIVE}.tgz

The syntax error here is quite subtle. The shell, however, still catches it.

$ tgzit program-archive ./bin ./include ./source
./tgzit: [: missing `]'

The shell reports that there is a problem with a test. Luckily, for this script, there is only one test to examine; namely, the if block that checks the program's arguments. The scripter forgot to place a space between the two and the closing square bracket. Had the script contained multiple tests, the error might be quite elusive. The only way to truly find the problem is to search through each test.

As a final note about debugging, the theme of this chapter is that although the Bourne shell detects and reports syntax errors, it is not very helpful in indicating exactly where they occur. Scripters are hereby warned. But with the hints given above and with plenty of practice, a good shell scripter can catch them rather quickly.