14 October 2009

A Naming Service for Overlay Networks

I earned a Master of Computer Science degree at the University of Virginia a few years back. My master's project was to write a secure, peer-to-peer name resolution protocol for the HyperCast overlay network system.

There was some flux in life around the time that I finished my degree. Our daughter was born and my academic adviser left the country within a week or two after I presented my work. All of that was followed by a job search, and, well, the report got lost in the shuffle.

So, some four years later, I'm now posting the project material to the web. The HyperCast code, including my project (just search for Greg in the code) is available on SourceForge. Enjoy.

13 October 2009

Changing "Foundational" Code with "Soft" Fixes

Software systems often have parts that are more "foundational" than others. Lots of other parts of the system depend on these foundational portions. A base class that has many descendant types is a good example of foundational code.

Now, in a properly architected, designed, and factored system, such foundational portions should not be a problem. Best practices will be observed; anti-patterns will be avoided. If you work in such systems: congratulations! Write books and blogs and go forth. Stop reading now. Have a nice day.

For the rest of us: eventually, some change will be suggested for a foundational portion of the system. How can this be done? The mere suggestion will cause other developers to say that too many things are touched by the foundational portions, best not to change things.

Others will say that the fundamental part should be improved because the portion in question is so foundational. Best to build a house on a solid foundation. Surely the political debates will rage in the wake of the suggested change.

How can the debate be steered towards best-practices and higher-quality software? Is there some middle ground between not touching the foundational code and making a radical change that could cause the already shaky system to fail in unanticipated ways?

What if the code is improved, but not the "whole way?" Let's consider an example (I'm thinking about Java as an implementation language here, but that's just a detail): suppose that you have some base class B. B was written years ago by a programmer armed with a humanities degree and a "Learn Java in X days" book where X is far less time than you spent working on your CS degree (and even you're still not totally sure what the volatile keyword does).

Class B was prolific and has on the order of 50 descendant classes. Changing the poorly written B could be disastrous.

Suppose an issue arises with a particular field F of B. Field F is not initialized at construction time and so begins life as null. Now, B has an accessor and a mutator for F, so other parts of the code can change F. To make things interesting, our code poet skipped day three of his book in which he was instructed to validate his data. This means that the mutator has no guard clauses to ensure that changes to F occur with valid values.

Then one fine day, suppose that the dreaded NullPointerException rears its ugly head because some calculation assumed that F is non-null, but alas, it is null. What happens next?

The race is on to "fix the bug," and eventually some change is made in some descendant class for the particular case in which F is null - some believe that all is well again. Others know that little has been gained because F might be null in other cases.

Then comes the suggestion, "why not simply initialize field F at B construction-time and then ensure the validity of field F by proper checks when it is changed?" Crazy Talk! Type B is too foundational! Too many things depend on it!! Who knows what will happen?!! Changing it could cause the lights to go out!!!

How do we firm up the foundation while allaying fears? Suppose the "change" is "soft." Suppose that a parameter is added to all B constructors that accepts a value for F so that F can be properly initialized. Now recompile and look for errors. All of the B subtypes should be broken: fix them by passing null, or a valid value if you can easily find one at the construction point. Also, augment the B constructor and the F mutator with guard clauses that check for null. But here's the soft part: the guard clause don't do anything like throw an IllegalArgumentException, instead they generate a log message, or raise some system operator alarm, or send an email to some portion of the development team.

The soft change shouldn't change the current behavior of the system. Those who were afraid to make the change win - because nothing really changed (ok, the time taken to write to the log, or send the email could cause some crazy timing bug. Technically, there was a change, but only a tiny one that is not likely to result in catastrophe).

These checks allow you to study the system as it runs and determine where null values are generated. Analyze them one at a time in the child classes and refactor so that null is no longer passed. Eventually, you should work through them, and then, when the team can see that nulls are not being passed anymore, you can confidently change the guard clauses in B to be "hard" and throw an exception like IllegalArgumentException.