Friday, March 7, 2008

Ruby exceptions vs. logging

A while ago, I ran into a little problem with some software. We had some Ruby scripts running from cron. Going by the logs, they would start up and just go to the bit bucket never to be heard from again. After checking the code to try to figure out what was going wrong, I decided to just run them by hand to see what would happen.

BLAM!!! A nice big unhandled exception, which of course in Ruby makes whatever script you are running exploded in a rather show-stopping manner. Why wasn't this logged?

It turns out that this is all in how an exception actually works. If you don't handle an exception, the program keeps going up the call stack (function A calls function B, etc) until it finds something that can handle the exception. If nothing handles the exception, the program just stops right then & there and your exception becomes a fatal error.

This problem is possible in multiple scripts, and we didn't have time to rewrite them all the catch all possible exceptions, which sounds too time consuming anyway.

Here's the solution I ended up with.

It turns out in Ruby when a program exits normally or because of an unhandled exception, it sends the EXIT signal. So all you have to do is catch the EXIT signal and log any exceptions that come across.

This is one time that Ruby shows its Perl influence. The $! variable is whatever exception is present. So, trap the signal and check $! for the exception (aka the error).

There is one final twist on this, and it turns out a normally running Ruby program relies on exception handling, even if you don't know you're using exceptions.

If you do a call to the exit() function in your code, that actually throws a SystemExit exception, which causes Ruby to immediately stop your program.

So this is the code I ended up putting in our logger object, which is just an extension of the standard Ruby logger:

#Catch & log unhandled exceptions
Signal.trap("EXIT") {
unless $! == "SystemExit" || $!.nil?
self.fatal("Unhandled exception: #{$!}: #{$!.to_s}")

The log entries would end up looking something like:

Unhandled exception: FooException: Something went terribly wrong in the get_important_data method and the program cannot continue.

In retrospect, there are probably two better ways this could be done.

1. Make each program have an "execute" method of some sort that kicks off all the logic for the rest of the program. You could then wrap exception handling code around the call to "execute" and this would grab all your exceptions that aren't handled elsewhere in your code.

2. Assuming these are all scripts of some sort, likely to be run from cron, you could have a script that runs scripts. It could log anything that goes to standard output as "info" messages, standard error as "warning" messages, and any script that exits with a non-zero status (meaning an error) could be logged as a fatal error.