roguelazer's website: *nix tip of the day: waiting in scripts

Scripting is what makes Unix-like operating systems great. Every *nix, be it Linux, BSD, OS X, AIX, Solaris, or whatever other random distribution you can come up with, comes with a capable shell (or three) and a good set of basic utilities. Where a Windows administrator has to either fall to the horror that is Batch files, write code in a big, heavy programming language language, or submit to the terrible dominance of “management utilities”, a Unix system administrator has tons of the tools at his disposal to fix and automate things. I could talk about scripting forever (it is a substantial portion of my job), but today I'm just going to talk about one small facet: waiting for things to happen.

Waiting for Completion

In the course of writing a script, it's not unusual to want to do things asynchronously. If you have to download three different files, then you can probably download them at the same time without saturating your Internet connection. In the course of using the shell, one of the first things you learn about is simple job control with the ampersand. But putting an ampersand at the end of your command line, it will run in the background and you can get work done. For example:

roguelazer@sietchtabr % wget http://path/to/big/file.tar.bz2 &
[1] 9500
roguelazer@sietchtabr % echo "I'm doing additional work"
I'm doing additional work
roguelazer@sietchtabr %
[1] + done wget

But, of course, you can't just fire off jobs and forget about them when you're writing a script. You generally want to know when they've finished so you can, you know, do something with their output. In C, after forking, you use the wait(2) and waitpid(2) system calls to do this; in a Bourne-compatible shell, you use the wait(1) builtin. When called with no arguments, it waits for all backgrounded children; when called with an argument list of PIDs, it waits for the named PIDs. So you might write the following script:

#!/bin/bash

wget http://path/to/file1.tar.gz &
wget http://path/to/file2.tar.gz &
wait
if [ "$(md5sum file1.tar.gz | awk '{print $1}')" != "$(md5sum file2.tar.gz | awk '{print $1}')" ] ; then
    print "ERROR: file1 differs from file2"
fi

And, of course, this works with subshells as well as external programs:

#!/bin/bash

(
   thing1
   thing2
) &
(
   thing3
   thing4
) &
wait
echo "Things 1-4 all finished"

Waiting for Signals

Signals are one of the fundamental interprocess communication methods in Unixen. Signals let one process send a message to another, reliably. The system uses them to tell you everything from when your child processes quit to when you exceed ulimits to when timers go off. In C, you would use the signal(2) system call to set up signal handlers and the kill(2) system call to send them. In the shell world, we use trap(1) to listen for them and kill(1) to send them. For example, if we were to name the following file test.sh:

#!/bin/bash

moo() {
    echo "MOO"
    exit 1
}

trap moo USR1

while true ; do
    sleep 1
done

We could then run it in one window while calling pkill -USR1 -f test.sh, and we would see the "moo" print out.

Waiting for the File System

(This tip is Linux-only; sorry BSD/OS X users)

It happens now and again that you want to wait for a file system event. This came up for me just the other day; I was writing a wrapper script for apache2ctl which would perform an action after apache had completely started up. However, apache2ctl returns immediately. I thought for a moment, and then remembered that apache writes out a PID file after it had started up. So I just needed to wait for that, and I'd be golden.

In the old days, we would have to use a busy loop for this. Something like the following:

#!/bin/bash
apache2ctl -f ~/.httpd/conf/httpd.conf -k start
slept=0
while [ ! -e ~/.httpd/run/httpd.pid ] ; do
    sleep 1
    slept=$(( $slept + 1 ))
    if [ $slept -ge 5 ] ; then
       exit 1
    fi
done
# do thing with apache here

This is ugly! Thankfully, we have a better way now. Linux has included a functionality called inotify since 2005 which lets you tell the kernel to tell you when things happen in the file system. Perfect! But using the C API for it is rather a pain in the butt. Oh, if only we could have a handy shell utility to use. Oh wait, we can. The package is called inotify-tools, and the most important tool is inotifywait(1). Let's go back to our earlier example with httpd.pid. We can now use the following:

#!/bin/bash
apache2ctl -f ~/.httpd/conf/httpd.conf -k start
inotifywait -qq -e create -e modify ~/.httpd/run/ -t 5
if [ $? -eq 2 ] ; then
    exit 1
fi
# do thing with apache here

This version is much better than the busy-wait version. inotifywait has a bunch of options, and I encourage you to check it out of you're doing any kind of filesystem monitoring.

Conclusion

So, this is a little introduction on some neat ways to add barrier points to your asynchronous bits of your shell scripts. There's a lot more (waiting for network things with netcat, using a system event system like upstart, or even a message-bus like dbus), but this should get you started. I hope you've found this useful, and that you'll join me in my next installment of “*nix Tip of the Day”. Ciao!

*nix Tip of the Day: Waiting in Scripts

Waiting for Completion

Waiting for Signals

Waiting for the File System

Conclusion