Let me state two facts:
- My computer has 2 cores (there will be more in the future).
- My IDE feels sluggier with every version.
Applications feel fast if the time between user action and application reaction is short. Ben Galbraith suggests that this threshold is around 200 ms for web applications, for desktop applications the boundary is lower. A rule of thumb places it around 100 ms, so I there must be a yellow range in between.

My IDE crosses this threshold more often than it should, so it’s sluggish. What’s the reason and what can be done about it?
It seems like a fundamental law in software engineering that every new version of any program executes more code than the previous version.
Java 5 introduced the java.util.concurrent API that made corse grained parallelism easier. The combination of thread pools, Futures and Barriers covered a wide range of tasks between CPU and I/O bound.
To me as UI programmer this had little consequence. For a while I could still state that Java+SWT is fast, consuming my part of that lunch that is not free any more but was cheap enough anyway in the doses I required.
This is changing. A few years ago we saw a pain barrier reached and solved in startup time of various operating systems. Today my pain barrier is reached in many of the functions that are executed as reaction to user clicks and key presses in my IDE. Undoubtedly it is only a matter of time when this will apply to smaller desktop applications, too. They must react, and using multiple cores is the solution at hand.
Other languages and frameworks already have good support for multicore programming, Java 7 will include the ForkJoin framework for Java. In the wild, it is a library known as jsr166y that can also be used with Java 6.
In the following I’ll discuss a little example that shows how to use the ForkJoin framework.
A simple easily parallelizable example is assigning random values in a large array. It’s also an example that can not be solved with raw processor power but accesses memory.
First, care must be taken where the random number comes from. Even a sequential solution using ForkJoins ThreadLocalRandom was about twice as fast than one with Math.random(). It’s also more comfortable to use.
int[] result = new int[ARR_SIZE];
for( int i = 0; i < result.length; i++ ) {
result[i] = ThreadLocalRandom.current().nextInt( result.length );
}
ForkJoin provides a class named ForkJoinPool as executor service for tasks. The constructor can be configured with the parallelism (number of threads) to be used. The default constructor takes the number of available processors as returned by Runtime.availableProcessors(). The pool is then given tasks for execution.
ForkJoinPool pool = new ForkJoinPool();
pool.invoke( new ArrayGenerator( result, 0, result.length ) );
The ArrayGenerator is a RecursiveAction. The RecursiveAction differs from the RecursiveTask mainly in the return value of the compute() method. It’s implementation is a straightforward recursive algorithm. First, there is an exit condition that solves the remaining problem sequentially. If that is not hit yet, the remaining problem is divided for two other RecursiveActions. Those are given to invokeAll, which blocks execution in this task until all subtasks are executed. In my experiments, the definition of the exit condition didn’t matter as long as all cores where busy at least once.
public class ArrayGenerator extends RecursiveAction {
private int[] nums;
private final int start, end;
public ArrayGenerator( int[] nums, int start, int end ) {
this.nums = nums;
this.start = start;
this.end = end;
}
@Override
protected void compute() {
if( arrayIsSmallEnough() ) {
assignSequential();
} else {
int middle = start + (end - start)/2;
ArrayGenerator left = new ArrayGenerator(nums, start, middle);
ArrayGenerator right = new ArrayGenerator(nums, middle, end);
invokeAll( left, right );
}
}
...
}
On my machine with 2 cores I achieved a speedup of 1.24 over the sequential solution. While that’s not as good as I hoped it still is a significant improvement that has the chance to move my clicks back down into the green area. I suspect that more cores could achieve a better speedup, so there is free lunch again once I design my program for concurrent execution.
Programming with this framework seems easy enough to me to see it adopted widely. However, we must rethink our problems and search for chances to parallelize.
I implemented the same thing with ThreadPoolExecutors and Futures. While the execution time did not differ there, it took me 3 times as long to implement it.