setEnvrionment () { ROOT=${PWD} LIB=${ROOT}/lib BIN=${ROOT}/bin }There exists a few rules that must be followed in order to properly declare a function:
$ FUNC="This is a variable" $ FUNC () { > echo "This is a function" > } $ echo ${FUNC} bad substitution $ FUNC This is a functionThe functional declaration of FUNC appears to be the active one, but is this always the case? If a function and a variable have the same name, will the function always be the active handle?
$ FUNC () { > echo "This is a function" > } $ FUNC="This is a variable" $ echo ${FUNC} This is a variable $ FUNC FUNC: not foundApparently, the answer is no. Closer inspection of the examples shows that in both cases the shell uses the last declaration. Hence, a corollary to rule one states that in the event that a variable and a function have the same name, the most recent declaration becomes active. For the sake of completeness, the same thing can be said about declaring two variables with the same name or two functions with the same label. The bottom line remains: a scripter should choose unique function and variable labels.
$ cat example-func #!/bin/sh setEnvironment () { ROOT=${PWD} LIB=${ROOT}/lib BIN=${ROOT}/bin } echo "Trying to print environment..." echo ${ROOT} ${LIB} ${BIN} setEnvironment echo "Trying to print environment again..." echo ${ROOT} ${LIB} ${BIN} $ example-func Trying to print environment... Trying to print environment again... /home/rsayle /home/rsayle/lib /home/rsayle/binThe fact that the parentheses following the function label is empty is a bit misleading. Normally, in languages such as C and C++, the parentheses delimit an argument list, and an empty set of parentheses indicates that there are no arguments. But for the shell, the parentheses are just the method of declaration. In fact, all Bourne shell functions accept arguments. These arguments are accessible using positional parameter syntax.
$ cat func-args #!/bin/sh setEnvironment () { ROOT=${1} LIB=${ROOT}/lib BIN=${ROOT}/bin } setEnvironment /tmp echo "Trying to print environment..." echo ${ROOT} ${LIB} ${BIN} $ func-args Trying to print environment... /tmp /tmp/lib /tmp/binComparing this to the previous example, this time setEnvironment does not hard code ROOT's value. It uses the first argument passed to the function. The invocation shows the results. With /tmp acting as the parameter, setEnvironment assigns values as expected.
On the surface, shell functions appear more like procedures or routines. After all, a function generally returns a value. Actually, every shell function does return a value; although, it may not be readily apparent what the value is. Functions return the exit status of the last command executed much like when a script exits with no exit command; it uses the status of the last command issued. This similarity goes further. A script controls its exit status by issuing an exit with a non-negative value. On the other hand, functions do not use exit because it is designed to terminate the shell. Instead, functions use return.
The return command stops execution of a function returning program control to the point in the script where the function was called. Script execution continues from where the function was invoked. The format of return follows return n where n is any non-negative integer. Providing a return value is optional just as providing a status code to exit is optional. If no code is given, return defaults to the value returned by the last command executed.
$ isADir () { > if [ ! -d ${1} ]; then > return 1 > fi > } $ isADir ${HOME} $ echo $? 0 $ isADir notADir $ echo $? 1The exercise above declares the function isADir. The function checks an argument passed to it and determines if the argument's value represents a directory as shown by the function's first statement. If it is not a directory, the function returns an error value of one; otherwise, it should return zero as given by the execution of the if statement. The function is run twice. First with the home directory as its argument and, second, with a fictitious directory name. The special parameter $? is printed in order to show the return values of each trial.
The short function just examined is a good one to demonstrate how to combine functions with test conditions. Many programmers familiar with traditional languages understand that a function can be embedded within a test in order to provide branching. The technique has the advantage of allowing multiple commands within one line for the sake of compactness. In some cases, it can also improve program efficiency by storing the value temporarily instead of having to create a persistent variable to hold the results. The same technique can be employed by the shell, but it is important to realize that quotes must be used.
$ if [ isADir ${HOME} ]; then > echo yep > fi test: argument expected $ if [ "isADir ${HOME}" ]; then > echo yep > fi yepSo it is perfectly legal to use functions as arguments themselves, but the syntax can become a bit unwieldy. Of course, the value can always be assigned to a variable and the variable checked, but there is a better alternative. Certainly, the special parameter $? provides access to a function's return value. It is therefore probably best to simply execute the function as a stand alone action and then immediately check the result using $?.
$ cat parent #!/bin/sh parentFunc () { echo "This is the parent's function" } parentFunc echo "Calling child..." child echo "Attempting to call childFunc from parent..." childFunc $ cat child #!/bin/sh childFunc () { echo "This is the child's function" } childFunc echo "Attempting to call parentFunc from child..." parentFunc $ parent This is the parent's function Calling child... This is the child function Attempting to call parentFunc from child... ./child: parentFunc: not found Attempting to call childFunc from parent ./parent: childFunc: not foundThe two scripts above demonstrate the fact that functions are local to the current shell context. Following the program's flow, the main shell, parent, declares a local function and calls it. The function displays a short message indicating that parentFunc executed. The program continues by calling the child subscript. Child begins execution imilarly. It also declares a local function, childFunc, which prints a message showing that the subshell's function ran. The subshell continues by trying to access parentFunc. Child prints the error message parentFunc: not found proving that a subshell cannot use functions declared by their parents. The subshell ends, and execution continues within parent at the echo. At this point, the supershell attempts to call childFunc. Again, the function is undefined for the current shell so it is also clear that supershells cannot use functions defined by their subshells.
One might expect similar behavior to be true for changes to variables within functions. After all, in other programming languages, variables declared in functions are local to those functions, and generally speaking, variable values passed to functions are copied into temporary storage for local manipulation. But this is not the case. Changes made to variables persist beyond execution of the function. Moreover, variables declared in functions gain immediate global scope.
$ cat changes #!/bin/sh changeit () { AVAR="Changed Value" NEWVAR="New Value" } AVAR="Original Value" echo "AVAR = ${AVAR}" echo "NEWVAR = ${NEWVAR}" changeit echo "AVAR = ${AVAR}" echo "NEWVAR = ${NEWVAR}" $ changes AVAR = Original Value NEWVAR = AVAR = Changed Value NEWVAR = New ValueWhen this script first prints AVAR and NEWVAR, the variables have their initial values; namely AVAR equals Original Value and NEWVAR is null. The program then runs the changeit function which resets the variables to different values. The function ends and returns control to the main program. The same print commands are reissued, and a quick inspection of the output shows that AVAR's value is Changed Value and NEWVALUE is suddenly defined.
To carry the examination of functional behavior even further, a third example considers nested functions. It is quite legal to declare functions within functions. Once again, a typical programmer might guess that a nested function is visible within the scope of its enclosing function. This guess is wrong. In the following example, the script nesting first declares afunc. Within afunc, the program defines the function nested. The script then calls nested, afunc, and nested yet again. A classical program would hide nested within afunc; it would be expected that the output from nested would never be seen. But the results show that as soon as afunc has run, nested has become visible to the script.
$ cat nesting #!/bin/sh afunc () { nested () { echo "This is the nested function" } echo "This is afunc" } nested afunc nested $ nesting ./nesting: nested: not found This is afunc This is the nested functionAs the previous three examples demonstrate, shell function behavior is atypical. Yet as has been demonstrated, it is not unpredictable. An easy guideline to remember is that the usual scoping rules do not necessarily apply to shell functions. Aside from this simple rule, it takes practice and careful analysis of scripts to prevent errors caused by functions.
With all this function bashing, one might wonder why functions should be used at all? First of all, the intention is not to discourage the use of fuctions. It is just a statement of the facts. On the contrary, functions are a very useful tool. The primary reason for using them is quite classical. Just as in every other programming language, functions allow scripters to organize actions into logical blocks. It is easier to think of a program a series of steps to perform and then to expand those steps into functions that perform the necessary actions. This is much better than trying to list every single command in order. The same commands instead can be grouped into functions, and then the program calls the functions. It is infintely easier to program this way and to follow the script's flow.
In addition, functions can improve a script's performance. Rather than employ functions, a novice might consider grouping logical blocks into subscripts which the main script uses. This technique will work just fine, but the program's execution time will take a hit. When a script calls a subscript, the shell must find the subscript on disk, open it, read it into memory, and then execute it. This process happens every time a subscript is called eventhough the subscript may have been used previously. Functions are read once into memory as soon as they are declared. They have the advantage of one time read for multiple execution.
$ cat calls-subscript #!/bin/sh doit once doit twice doit thrice $ cat doit #!/bin/sh echo "${1}" $ cat calls-function #!/bin/sh doitFunc () { echo "${1}" } doitFunc once doitFunc twice doitFunc thriceThe example lists two scripts, calls-subscript and calls-function, that effectively do the same thing. Each prints the words once, twice, thrice, but the method they use to do the printing is different. Calls-subscript uses the script doit to print whereas calls-function uses the doitFunc function. The UNIX time program can be applied to the scripts in order to see how fast each performs.
$ time calls-subscript once twice thrice real 0.2 user 0.0 sys 0.1 $ time calls-function once twice thrice real 0.0 user 0.0 sys 0.0The script that uses a function is faster. It is not even measurable by the system. On the other hand, calls-subscript executes on the order of a couple tenths of a second.
$ cat checkDir
checkDirFunc () {
if [ ! -d ${1} ]; then
echo "${1} is not a directory"; exit 1
fi
}
$ cat checkHome
#!/bin/sh
. checkDir
checkDirFunc
echo "Of course it's a directory!"
$ checkHome ${HOME}
Of course it's a directory!
$ checkHome notAdir
notAdir is not a directory
The example above uses two scripts: checkDir and checkHome. Notice that checkDir does not contain the #!/bin/sh directive. It is not meant to be run as a stand-alone program. (This does not prevent the file from being executed if called by a running shell. The script will simply be run within the shell. The only way to truly prevent it from being executed is to set the correct file permissions such that it is not an executable file.) On the other hand, checkHome is written to execute within its own shell. Moreover, checkHome sources checkDir as can be seen by the line . checkDir. By sourcing the checkDir file, checkHome then contains the checkDirFunc function which tests the first positional parameter to see if it is a directory. If it is not a directory, the if statement prints a message stating the fact and exits the program. If it is a directory, program execution continues past the function to the final echo statement letting the user know that the argument is a directory. The operation of the entire script is shown by the two examples of running the script against the user's home directory and then against the bogus string notAdir.
There is one final comparison that can be made here. The previous subsection discusses the advantage of using functions versus calling subscripts. Naturally, one might ask how sourcing functions compares to these. Consider three methods of counting to 1000. The first method declares a function called printCount:
$ cat calls-function
#!/bin/sh
COUNT=1
printCount () {
echo ${COUNT}
}
until [ ${COUNT} -gt 999 ];
do
printCount
COUNT=`expr ${COUNT} + 1`
done
echo "`basename ${0}`: Done counting to 1000"
In the script calls-function, the printCount function simply displays the current value of COUNT. An until loop iincrements and monitors the value until it reaches a thousand by determining whether COUNT is greater than 999. At that point, the program terminates with a short message that it finished counting. Timing the script's execution yields the following results. (Note that the actual output has been trimmed to the results of time in the interest of saving space.)
$ time calls-function
calls-function: Done counting to 1000
4.86user 6.57system 0:11.68elapsed
Now consider the second method of counting to 1000. Here, the script sources-function imports the same function, printCount, from the file function; it sources the code used to print the value of COUNT.
$ cat sources-function
#!/bin/sh
COUNT=1
. function
until [ ${COUNT} -gt 999 ];
do
printCount
COUNT=`expr ${COUNT} + 1`
done
echo "`basename ${0}`: Done counting to 1000"
$ cat function
printCount () {
echo ${COUNT}
}
Once again, an until loop watches and terminates once COUNT reaches 1000. Comparing the execution time to calls-function, it can be seen that there is a slight performance penalty for sourcing. This can be explained by the fact that the file function, when it is sourced, has to be opened and its contents committed to memory before the function can be used. On the other hand, in calls-function the shell reads printFunction during its initial pass over the contents of the script. It does not incur the penalty of having to do the extra file read that sources-function does.
$ time sources-function
sources-function: Done counting to 1000
4.96user 6.61system 0:11.80elapsed
Finally, compare the first two methods to calls-subscript. This last script exports COUNT and then uses the program subscript to display COUNT's value:
$ cat calls-subscript
#!/bin/sh
COUNT=1
export COUNT
until [ ${COUNT} -gt 999 ];
do
subscript
COUNT=`expr ${COUNT} + 1`
done
echo "`basename ${0}`: Done counting to 1000"
$ cat subscript
#!/bin/sh
echo ${COUNT}
Running calls-subscript through the time program shows:
$ time calls-subscript
calls-subscript: Done counting to 1000
10.95user 13.37system 0:24.83elapsed
In this last example, there is a significant increase in the program's performance. It takes roughly twice as long to complete as the first two. This can be attributed to the fact that each time calls-subscript uses subscript, the shell must open the file, spawn a subshell, execute the echo statement, and then return control back to the main program.
From this exercise it is plain to see that including function code directly
into scripts is the most optimal. It is definitely a bad idea to
divide scripts into many subscripts because of the performace penalty.
Try to use functions whenever it seems like a subscript fits. Now
there is something to be said for sourcing commands from another file.
Sourcing does not hinder program execution much and it does allow the organization
of reusable code blocks into script libraries. Still, these same
code blocks could simply be copied directly into a script's body in the
name of speed. But that is a matter of choice. After all, the
enhancement of speed could come at the cost of readability.