I have been pondering exceptions quite a lot lately. Especially how to break the bad news to the user. From a user’s point of view there are three categories of errors:
1. User or domain errors
These errors occur when the user enters invalid data, or tries to perform an action that the domain model does not allow. Typical examples of this kind of error are things like trying to enter a duplicate key or trying to delete an item that is still being referenced. Note that these kinds of errors are usually not even worth logging. These types of errors can usually be recovered by the user himself, either by choosing an alternative action, or by manipulating the model in such a way that the action becomes valid. While the exception usually describes what went wrong, the error message displayed to the user should also explain what can be done to correct the situation in such a way that the desired. Ideally the error dialog should also allow the user to directly apply those fixes in order to achieve his original goal. Prime example of such an implementation: Eclipse quick-fixes. While telling you what you did wrong, they also offer to repair your mistakes. In fact they are so convenient and helpful that I have come to rely on them when writing code: Write pseudo-code, then let quick-fix sort out the details.
2. Infrastructure errors
These kinds of errors are usually intermittent failures of one of the subsystems the application is running on. Usually these are network outages, a full disk or a server on fire. Here a detailed error message is helpful to enable the user to track down and rectify the problem. Depending on the type of application and business environment, it might also be a good idea to notify the responsible systems administrator that a problem has developed, so he can get a little head start before the hordes of users demand to know why they can’t access their TPS reports. In rare cases there might be backups or fail-overs in place which can be used in case of such infrastructure problems.
3. Programmer errors
While we all strive to avoid it, there are going to be programmer errors (internal errors aka bugs). These are basically all errors not in one of the above categories. The most common manifestation of these types of errors is the dreaded NullPointerExceptions. But we’ve all seen the code comments declaring: “This should never happen.” Sometimes assertions and assumptions we make when writing software turn out to be wrong. The best we can do in these situations is to fix the problems as soon as possible. Ideally the application in questions gathers as much information surrounding the error as possible, then allows the user to report this information, in addition to his notes about what he was doing at the time of the failure. The message to the user should be along the lines, “We are sorry for causing you trouble. Let’s make sure this particular problem does not show up in the future.” The idea is to show users that their feedback is not a pointless exercise, but an effective way to make their daily lives a little better by improving their tools.
The problem with many exception hierarchies in libraries and applications alike is that they distinguish exception classes at component, interface or service boundaries. If the StockTickerService returns a CannotReadStockSymbolDataException, the caller has oftentimes no way to distinguish between user error (Stock symbol does not exist), infrastructure error (server unreachable) or internal error (could not handle values above 255). This makes handling these exception at the user interface level quite challenging. Sure you can take the easy way out and display the message to the user, maybe log it and be done with it, but the application itself has no way to distinguish between the different levels of severity. A typo in the input field is nothing to write home about, while not being able to handle values above 255 is a systematic failure that might render the system unusable.
My point here is this: Being able distinguish the type of exception is often helpful when trying to recover from an error. But in my experience errors are not seldom recoverable. And in these cases it is more important to know who to inform about the failure. This makes the triage of errors much simpler, administrators and developers get notified about problems in their areas of responsibility, but are not flooded with reports about “trivial” user typos that drown out the real issues.