Greg's Blog

14 October 2009

A Naming Service for Overlay Networks

I earned a Master of Computer Science degree at the University of Virginia a few years back. My master's project was to write a secure, peer-to-peer name resolution protocol for the HyperCast overlay network system.

There was some flux in life around the time that I finished my degree. Our daughter was born and my academic adviser left the country within a week or two after I presented my work. All of that was followed by a job search, and, well, the report got lost in the shuffle.

So, some four years later, I'm now posting the project material to the web. The HyperCast code, including my project (just search for Greg in the code) is available on SourceForge. Enjoy.

13 October 2009

Changing "Foundational" Code with "Soft" Fixes

Software systems often have parts that are more "foundational" than others. Lots of other parts of the system depend on these foundational portions. A base class that has many descendant types is a good example of foundational code.

Now, in a properly architected, designed, and factored system, such foundational portions should not be a problem. Best practices will be observed; anti-patterns will be avoided. If you work in such systems: congratulations! Write books and blogs and go forth. Stop reading now. Have a nice day.

For the rest of us: eventually, some change will be suggested for a foundational portion of the system. How can this be done? The mere suggestion will cause other developers to say that too many things are touched by the foundational portions, best not to change things.

Others will say that the fundamental part should be improved because the portion in question is so foundational. Best to build a house on a solid foundation. Surely the political debates will rage in the wake of the suggested change.

How can the debate be steered towards best-practices and higher-quality software? Is there some middle ground between not touching the foundational code and making a radical change that could cause the already shaky system to fail in unanticipated ways?

What if the code is improved, but not the "whole way?" Let's consider an example (I'm thinking about Java as an implementation language here, but that's just a detail): suppose that you have some base class B. B was written years ago by a programmer armed with a humanities degree and a "Learn Java in X days" book where X is far less time than you spent working on your CS degree (and even you're still not totally sure what the volatile keyword does).

Class B was prolific and has on the order of 50 descendant classes. Changing the poorly written B could be disastrous.

Suppose an issue arises with a particular field F of B. Field F is not initialized at construction time and so begins life as null. Now, B has an accessor and a mutator for F, so other parts of the code can change F. To make things interesting, our code poet skipped day three of his book in which he was instructed to validate his data. This means that the mutator has no guard clauses to ensure that changes to F occur with valid values.

Then one fine day, suppose that the dreaded NullPointerException rears its ugly head because some calculation assumed that F is non-null, but alas, it is null. What happens next?

The race is on to "fix the bug," and eventually some change is made in some descendant class for the particular case in which F is null - some believe that all is well again. Others know that little has been gained because F might be null in other cases.

Then comes the suggestion, "why not simply initialize field F at B construction-time and then ensure the validity of field F by proper checks when it is changed?" Crazy Talk! Type B is too foundational! Too many things depend on it!! Who knows what will happen?!! Changing it could cause the lights to go out!!!

How do we firm up the foundation while allaying fears? Suppose the "change" is "soft." Suppose that a parameter is added to all B constructors that accepts a value for F so that F can be properly initialized. Now recompile and look for errors. All of the B subtypes should be broken: fix them by passing null, or a valid value if you can easily find one at the construction point. Also, augment the B constructor and the F mutator with guard clauses that check for null. But here's the soft part: the guard clause don't do anything like throw an IllegalArgumentException, instead they generate a log message, or raise some system operator alarm, or send an email to some portion of the development team.

The soft change shouldn't change the current behavior of the system. Those who were afraid to make the change win - because nothing really changed (ok, the time taken to write to the log, or send the email could cause some crazy timing bug. Technically, there was a change, but only a tiny one that is not likely to result in catastrophe).

These checks allow you to study the system as it runs and determine where null values are generated. Analyze them one at a time in the child classes and refactor so that null is no longer passed. Eventually, you should work through them, and then, when the team can see that nulls are not being passed anymore, you can confidently change the guard clauses in B to be "hard" and throw an exception like IllegalArgumentException.

07 October 2009

A Minimal Java Project Reading List

This is my reading list for "enterprise" projects that use Java as the primary programming language. The list is short, but it has served me well.

This "bare-bones" reading list is designed primarily for programmers who (like me) work on large-scale, "legacy" Java code-bases. The list could serve as the foundation for a more expanded list with "more advanced topics."

  • "Effective Java," 2nd edition, Bloch - this is the one book to have if you have only one book.
  • "Java Concurrency in Practice," Goetz, et. al. this is a must have for concurrent enterprise apps.
  • "The Java Programming Language," 4th ed. Refer to it often.
  • (freely available) "The Java Language Spec," http://java.sun.com/docs/books/jls/ Refer to it just as often.

That's it for the core, "must-have" Java stuff, but you'll need to understand patterns as well:

  • "Design Patterns," Gamma, et. al.
  • "Patterns of Enterprise Application Architecture," Fowler
And let's not forget testing:
  • "xUnit Test Patterns," Meszaros

Then there's the "recommended" section for Java, a little dated, but still relevant:

  • "Hardcore Java," Simmons
  • "Better, Faster, Lighter Java," Tate & Gehtland

That's it. Happy reading!

16 October 2008

Static Analysis Resources

I compiled these links while putting together a talk for the Capital District Java Users Group (CDJDN) near Albany, NY. My presentation slides are posted on the group's website.

01 August 2008

Compiling package-info.java over and over again...

Java packages may have a corresponding package-info.java source file. This file provides a central location for package-wide information. A Java compiler will compile the package-info.java source file just like any other Java source file... sort of.

The thing is, the package-info.java source file is a special sort of source file:

  • Its contents typically consist of Javadoc documentation comments that describe the package. In fact, the package-info.java feature replaces the legacy package.html file that served the same purpose.
  • Sometimes a package is annotated in its package-info.java file.
  • Usually, package-info.java is devoid of source code proper.

See The Java Language Specification, 3rd ed. sec 7.4.1.1 "Package Annotations" for more information on the intention of the package-info.java feature.

No corresponding package-info.class file is produced by the compiler when a package-info.java file contains only Javadoc comments. Why is this an issue? It's an issue because most build tools interpret the presence of an up-to-date .class file as a reason not to compile a corresponding .java file. Without a package-info.class, a build system might compile package-info.java every time the build is invoked regardless of whether package-info.java has changed. Such unnecessary compiler invocations have the effect of increasing build execution time. This is not such a big deal for small projects, but the impact on build time can become annoying as the size of a codebase grows.

In my environment a corresponding package-info.class file is created only if at least one package annotation exists. Intuitively, it would seem that such annotations would need to have retention policy "stronger" than java.lang.annotation.RetentionPolicy.SOURCE. However, I observe the generation of a package-info.class file even if no annotation exists with a stronger retention policy (RetentionPolicy.CLASS (the default) or RetentionPolicy.RUNTIME).

The presence of source code proper does not cause a package-info.class file to be produced. Rather, class files with a names appropriate to the types defined in the source code are generated. For example, if I write a package private Foo class in package-info.java, then Foo.class is generated.

So then, how can a build be protected from unnecessarily invoking the compiler due to the lack of a package-info.class file? One solution to this issue is to annotate each package with an annotation that serves no purpose other than to force the generation of a corresponding package-info.class file.

The annotation I wrote for this solution is called ForcePackageByteCode. I gave it a retention policy of RetentionPolicy.CLASS which clearly states that it needs to be in a .class file. Just annotate the package statement in a package-info.java file with ForcePackageByteCode:

/**
 * This package handles all of the foo.
 */
@ForcePackageByteCode
package com.example.foo;

Admittedly, this solution is not ideal because it creates extraneous output for the sole purpose of optimizing future build invocations. On the plus side, the size of the extraneous files is relatively small. This minor cost in size can be well worth the build time savings over the course of a project.

It would be nice if release candidate builds could "turn-off" the extraneous file generation by simply changing the ForcePackageByteCode retention policy to RetentionPolicy.SOURCE. Instead, a global search and replace opertion that changes all occurrences of the annotation to a comment for all package-info.java files will do the trick:

@ForcePackageByteCode//@ForcePackageByteCode.

There are other solutions to this problem. The Apache Ant build tool is addressing this issue with built-in tool support. The Ant team has devised some heuristics for deciding when to ignore package-info.java files during compilation. Such work is helpful, but relying on support from a particular build tool has the drawback of being a partial solution if source is built in several build environments.

Alternatively, the need for a solution like the ForcePackageByteCode annotation is reduced if there is another reason for a project to annotate packages. For example, the static byte code analysis tool FindBugs supports the edu.umd.cs.findbugs.annotations.DefaultAnnotation annotation. This annotation declares that specified annotations be applied to all classes, fields, and methods of a package. For example, annotating a package with @DefaultAnnotation(NonNull.class) tells FindBugs that the NonNull annotation should be applied to all classes, fields, and methods of a package. Applying DefaultAnnotation to a package-info.java file causes a package-info.class to be produced obviating the need for ForcePackageByteCode.

ForcePackageByteCode source code is available as part of the Virtual Team Tools project.

31 July 2008

Java ArrayWrapperList

I have some Java networking code that uses the Java NIO (java.nio) package and the Java Concurrency utilities (java.util.concurrent) package. My code is structured to enqueue pending output on a write buffer queue. The write buffer queue is a sequence of ByteBuffers implemented as a java.util.concurrent.LinkedBlockingDeque<ByteBuffer>.

Now, I want to perform a gathering write operation of pending ByteBuffers in the write buffer queue. The BlockingDeque has a drainTo(Collection<? super E>) method that removes elements from the Deque and places them in a specified Collection of the appropriate type. Here's the snag: the java.nio.channels.SocketChannel write() method requires a ByteBuffer[], not a Collection<ByteBuffer>.

Converting from a Collection<ByteBuffer> to a ByteBuffer[] seems as simple as calling the java.util.Arrays.asList(T...) method. Just wrap the target array as a List then call write() with the filled array:

ByteBuffer[] writeBufferArray = new ByteBuffer[BUFFER_COUNT]; // write() needs this
List<ByteBuffer> writeBufferList = Arrays.asList(writeBufferArray); // drainTo() needs this
int writeBufferCount = writeBufferDeque.drainTo(writeBufferList); // fills writeBufferArray?
socketChannel.write(writeBufferArray, 0, writeBufferCount);

Unfortunately this does not work because an UnsupportedOperationExcepion is raised on the call to drainTo(). This happens because the List produced by Arrays.asList(T...) does not permit append operations. It enforces this restriction by not implementing the java.util.AbstractList.add(int, Object) method.

It would be easy enough to copy the ByteBuffers to a "temporary" Collection<ByteBuffer>, then call the toArray() method, but that results in a "double copy." Each ByteBuffer reference would be copied from the BlockingDeque to the temporary Collection<ByteBuffer>, then copied from the temporary Collection<ByteBuffer> to the array produced by toArray().

Avoiding the double copy means that some way must be found to write directly to a ByteBuffer[] from drainTo(). Since drainTo() demands a Collection, some alternative array wrapper must be used.

One potential candidate is the venerable java.util.ArrayList. An ArrayList properly encapsulates an array by creating an internal copy of any array given to it and producing a defensive copy when toArray() is invoked on it. ArrayList does its job well, but it's not suited to this task because it performs copy operations to and from its own array, not a client provided array.

My solution for a List that writes-through to an externally accessible array is called ArrayWrapperList. It works like this:

ByteBuffer[] writeBufferArray = new ByteBuffer[BUFFER_COUNT];
List<ByteBuffer> writeBufferList = new ArrayWrapperList<ByteBuffer>();
writeBufferDeque.drainTo(writeBufferList); // writes-through directly to writeBufferArray
socketChannel.write(writeBufferArray, 0, writeBufferCount);

Conceptually, the code behaves as though writeBufferList.toArray(writeBufferArray) is invoked immediately after the call to drainTo(). It should be noted that in my production version I invoke drainTo(Collection<? super E>, int) not drainTo(Collection<? super E>). The upper bound on the drained elements ensures that the size of writeBufferArray is not exceeded.

ArrayWrapperList does not encapsulate the array it manages like java.util.ArrayList. Rather ArrayWrapperList trades-off encapsulation for the ability to write directly to a client accessible array.

ArrayWrapperList source code is available as part of the Virtual Team Tools project.