Ralf is a software engineer with a history as Eclipse committer and project lead.
In recent years, he devoted himself to JavaScript technologies and helped pulling off …
If you still find yourself defining methods like this
public String[] getParameters();
in an interface, you should think again. Arrays are not just old-fashioned, there are good reasons to avoid exposing them. In this article, I’ll try to summarize the main drawbacks of using arrays in Java APIs.
[ Looking for more tips on writing clean code? See our Software Craftsmanship page. | Polish up your coding with the help of experts in Eclipse Training. ]
Let me start with perhaps the most unexpected thing:
Update: It has been pointed out in the comments that iterating over an ArrayList is significantly slower than iterating over an array. I was surprised to find that the extra costs of the list iterator (mainly caused by checks for concurrent modification) can outweigh the savings I’ve explained here. This counters my point that arrays inhibit performance, so I’ve adjusted the title. Thanks go to Peter Drake for bringing in this aspect. However, the benchmark presented below is still valid. It’s true that interfaces with arrays are not necessarily faster. Depending on the consumers of an interface, lists may result in better overall performance.
You may think that working with arrays is the fastest possible because arrays are the low-level data structure used in most Collection implementations. How can using a plain array be slower than using an object that contains an array?
Let’s start with this common idiom that certainly looks familiar to you:
public String[] getNames() {
return namesList.toArray( new String[ namesList.size() ] );
}
This method creates an array from a modifiable collection used to keep the data internally. It tries to optimize the array creation by providing an array of the correct size. Interestingly, this “optimization” makes it slower than the simpler version below (see the green vs. the orange bar in the chart):
public String[] getNames() {
return namesList.toArray( new String[ 0 ] );
}
However, if the method returns a List
, creating the defensive copy is yet faster (the red bar):
public List getNames() {
return new ArrayList( namesList );
}
The difference is that an ArrayList stores its items in an Object[]
array and use the untyped toArray
method which is a lot faster (the blue bar) than the typed one. This is typesafe since the untyped array is wrapped in the generic type ArrayList<T>
that is checked by the compiler.
This chart shows a benchmark with n = 5 on Java 7. However, the picture does not change much with more items or another VM. The CPU overhead might not seem drastic, but it adds up. Chances are that consumers of an array have to convert it into a collection in order to do anything with it, then convert the result back to an array to feed it into another interface method etc.
Using a simple ArrayList
instead of an array improves performance, without adding much footprint. ArrayList adds a constant overhead of 32 bytes to the wrapped array. For example, an array with ten objects requires 104 bytes, an ArrayList 136 bytes.
With Collections, you may even decide to return an unmodifiable version of the internal list:
public List getNames() {
return Collections.unmodifiableList( namesList );
}
This operation performs in constant time, so it’s much faster than any of the above (yellow bar). This is not the same as a defensive copy. An unmodifiable collection will change when your internal data changes. If this happens, clients can run into a ConcurrentModificationException
while iterating over the items. It can be considered bad design that an interface provides methods that throw an UnsupportedOperationException
at runtime. However, at least for internal use, this method can be a high-performance alternative to a defensive copy – something that is not possible with arrays.
Java is an object oriented language. The central idea of object orientation is that objects provide a set of methods to access and manipulate their data fields instead of manipulating the data fields directly. These methods make up an interface that explains what you can do with the object.
Because Java has been designed for performance, primitive types and arrays have been mixed into the type system. Objects use arrays internally to store data efficiently. However, even though arrays represent a modifiable collection of elements, they do not provide any methods to access and manipulate these elements. In fact, there’s not much you can do with an array except accessing and replacing its elements directly. Arrays don’t even implement toString
and equals
in a meaningful way, while collections do:
String[] array = { "foo", "bar" };
List list = Arrays.asList( array );
System.out.println( list );
// -> [foo, bar]
System.out.println( array );
// -> [Ljava.lang.String;@6f548414
list.equals( Arrays.asList( "foo", "bar" ) )
// -> true
array.equals( new String[] { "foo", "bar" } )
// -> false
In contrast to arrays, the Collection API provides many useful methods to access the elements. Users can check for contained elements, extract sub lists or compute intersections. Collections can add certain features to the data layer, such as thread-safety, while keeping the implementation internal.
By using an array, you define where the data is stored in memory. By using a Collection, you define what users can do with the data.
If you rely on complier-checked type safety, be careful with object arrays. The following code crashes at runtime, but the compiler cannot find the problem:
Number[] numbers = new Integer[10];
numbers[0] = Long.valueOf( 0 ); // throws ArrayStoreException
The reason is that arrays are “covariant”, i.e. if T
is a subtype of S
, then T[]
is a subtype of S[]
. Joshua Bloch covers all the theory in his great book Effective Java, a must-read for every Java developer.
Because of this behavior, interfaces that expose typed arrays may allow for implementations that return a subtype of the declared array type, leading to weird runtime exceptions.
Bloch also explains that arrays are incompatible with generic types. Since arrays enforce their type information at runtime, while generics are checked at compile time, generic types cannot be put into arrays.
Generally speaking, arrays and generics don’t mix well. If you find yourself mixing them and getting compile-time errors or warnings, your first impulse should be to replace the arrays with lists.
- Joshua Bloch, Effective Java (2nd ed.), Item 29
Arrays are a low-level language construct. They should be used in implementations but they should not be exposed to other classes. Using arrays in interface methods counters object orientation, it leads to inconvenient API, and it may weaken type safety and performance.
Ralf is a software engineer with a history as Eclipse committer and project lead.
In recent years, he devoted himself to JavaScript technologies and helped pulling off …