Linix Tip of the Day: SystemTap

Posted Wed 16 February 2011 09:08 under category tips

The other day, one of my co-workers, Evan, presented an interesting problem to me. Every day, at some point, a file named ] gets created in his home directory. He assumes that it is being created by a script with a typo in it... somewhere. But how to find out? It's a hard thing to grep for.

My initial solution was to use inotify (which you might remember from a previous post) combined with libnotify to alert him if it happens while he was at his computer. It looked like the following:

$ inotifywait -t 0 --exclude='.*[^]]$' $HOME && notify-send "something just created ]"

This is an okay solution (especially in that it took almost no time to write), but it doesn't actually give any useful information unless you're sitting at the computer and can do some manual debugging (to try and find what processes are in cron for that time, maybe do an lsof). I could add those things to the script, but there's still enough of a race condition inherent in shell programming that it's unlikely they'd be successful. How to solve this problem?

Enter SystemTap. SystemTap is like Solaris's DTrace: a tool for monitoring and acting on events a the kernel level, without the added bulk of a debugger. Except SystemTap is cooler. It uses the really neat kprobes functionality to tap into the kernel (which I encourage any of you familiar with systems-level programming to read up on, because they're a lovely hack), it has a clean and typesafe compiled language, and it has a decent standard library.

Installation of SystemTap varies based on distribution, but it's described in pretty good detail on the SystemTap wiki. Generally, you install debugging symbols for your kernel and the userspace systemtap compiler and runtime. Once you have it installed, you can probably add yourself to a group in order to be able to run stap scripts, or just do it as root. Either way.

So, what did my solution look like in SystemTap? Behold:

probe syscall.open {
    if ((filename == "]") || (filename == "/home/evan/]")) printf("%s by %s (pid %d), parent %s (ppid %d)\n", filename, execname(), pid(), pexecname(), ppid())
}

Run that under stap and you get a nice summary of the what is creating the files. Straightforward, and without crippling overhead (just two strcmps per open call). Cool beans!

Of course, this is just the tip of what you can do with SystemTap. For example, Debian Developer/Mozilla Contributor Mike Hommey wrote up a good summary on tracking disk I/O with SystemTap. And SystemTap provides a good base of example scripts for your tracing pleasure.

Feel free to let me know if y'all find any other cool ways to use this technology.


Comments