3 Good Reasons to Avoid Arrays in Java Interfaces

If you still find yourself defining methods like this

public String[] getParameters();

in an interface, you should think again. Arrays are not just old-fashioned, there are good reasons to avoid exposing them. In this article, I’ll try to summarize the main drawbacks of using arrays in Java APIs.

[ Looking for more tips on writing clean code? See our Software Craftsmanship page. | Polish up your coding with the help of experts in Eclipse Training. ]

Let me start with perhaps the most unexpected thing:

Arrays lead can lead to poor performance


Update: It has been pointed out in the comments that iterating over an ArrayList is significantly slower than iterating over an array. I was surprised to find that the extra costs of the list iterator (mainly caused by checks for concurrent modification) can outweigh the savings I’ve explained here. This counters my point that arrays inhibit performance, so I’ve adjusted the title. Thanks go to Peter Drake for bringing in this aspect.
However, the benchmark presented below is still valid. It’s true that interfaces with arrays are not necessarily faster. Depending on the consumers of an interface, lists may result in better overall performance.

You may think that working with arrays is the fastest possible because arrays are the low-level data structure used in most Collection implementations. How can using a plain array be slower than using an object that contains an array?

Let’s start with this common idiom that certainly looks familiar to you:

public String[] getNames() {
  return namesList.toArray( new String[ namesList.size() ] );
}

This method creates an array from a modifiable collection used to keep the data internally. It tries to optimize the array creation by providing an array of the correct size. Interestingly, this “optimization” makes it slower than the simpler version below (see the green vs. the orange bar in the chart):

public String[] getNames() {
  return namesList.toArray( new String[ 0 ] );
}

However, if the method returns a List, creating the defensive copy is yet faster (the red bar):

public List getNames() {
  return new ArrayList( namesList );
}

The difference is that an ArrayList stores its items in an Object[] array and use the untyped toArray method which is a lot faster (the blue bar) than the typed one. This is typesafe since the untyped array is wrapped in the generic type ArrayList<T> that is checked by the compiler.

toArray 3 Good Reasons to Avoid Arrays in Java Interfaces

This chart shows a benchmark with n = 5 on Java 7. However, the picture does not change much with more items or another VM. The CPU overhead might not seem drastic, but it adds up. Chances are that consumers of an array have to convert it into a collection in order to do anything with it, then convert the result back to an array to feed it into another interface method etc.

Using a simple ArrayList instead of an array improves performance, without adding much footprint. ArrayList adds a constant overhead of 32 bytes to the wrapped array. For example, an array with ten objects requires 104 bytes, an ArrayList 136 bytes.

With Collections, you may even decide to return an unmodifiable version of the internal list:

public List getNames() {
  return Collections.unmodifiableList( namesList );
}

This operation performs in constant time, so it’s much faster than any of the above (yellow bar). This is not the same as a defensive copy. An unmodifiable collection will change when your internal data changes. If this happens, clients can run into a ConcurrentModificationException while iterating over the items. It can be considered bad design that an interface provides methods that throw an UnsupportedOperationException at runtime. However, at least for internal use, this method can be a high-performance alternative to a defensive copy – something that is not possible with arrays.

Arrays define a structure, not an interface

Java is an object oriented language. The central idea of object orientation is that objects provide a set of methods to access and manipulate their data fields instead of manipulating the data fields directly. These methods make up an interface that explains what you can do with the object.

Because Java has been designed for performance, primitive types and arrays have been mixed into the type system. Objects use arrays internally to store data efficiently. However, even though arrays represent a modifiable collection of elements, they do not provide any methods to access and manipulate these elements. In fact, there’s not much you can do with an array except accessing and replacing its elements directly. Arrays don’t even implement toString and equals in a meaningful way, while collections do:

String[] array = { "foo", "bar" };
List list = Arrays.asList( array );
 
System.out.println( list );
// -&gt; [foo, bar]
System.out.println( array );
// -&gt; [Ljava.lang.String;@6f548414
 
list.equals( Arrays.asList( "foo", "bar" ) )
// -&gt; true
array.equals( new String[] { "foo", "bar" } )
// -&gt; false


In contrast to arrays, the Collection API provides many useful methods to access the elements. Users can check for contained elements, extract sub lists or compute intersections. Collections can add certain features to the data layer, such as thread-safety, while keeping the implementation internal.

By using an array, you define where the data is stored in memory. By using a Collection, you define what users can do with the data.

Arrays are not typesafe

If you rely on complier-checked type safety, be careful with object arrays. The following code crashes at runtime, but the compiler cannot find the problem:

Number[] numbers = new Integer[10];
numbers[0] = Long.valueOf( 0 ); // throws ArrayStoreException

The reason is that arrays are “covariant”, i.e. if T is a subtype of S, then T[] is a subtype of S[]. Joshua Bloch covers all the theory in his great book Effective Java, a must-read for every Java developer.

Because of this behavior, interfaces that expose typed arrays may allow for implementations that return a subtype of the declared array type, leading to weird runtime exceptions.

Bloch also explains that arrays are incompatible with generic types. Since arrays enforce their type information at runtime, while generics are checked at compile time, generic types cannot be put into arrays.

Generally speaking, arrays and generics don’t mix well. If you find yourself mixing them and getting compile-time errors or warnings, your first impulse should be to replace the arrays with lists.

- Joshua Bloch, Effective Java (2nd ed.), Item 29

Summary

Arrays are a low-level language construct. They should be used in implementations but they should not be exposed to other classes. Using arrays in interface methods counters object orientation, it leads to inconvenient API, and it may weaken type safety and performance.

7 Responses to “3 Good Reasons to Avoid Arrays in Java Interfaces”

  1. Yann says:

    Please provide the performance testing code you used.

  2. Peter Drake says:

    Very interesting!

    What about iteration? Is it faster to iterate over an array (which involves no object creation or method calls) or an ArrayList? How much?

  3. Peter Drake says:

    Very interesting!

    Two counterpoints:

    - Iterating through an ArrayList takes much longer than iterating through an array.

    - You can’t put fast, small primitives in an ArrayList.

  4. Ian Bull says:

    @Peter
    Do you have any metrics on iteration through an array vs a collection? This would certainly be valuable to see. Another advantage is that you could potentially back the iterator with some other datastream (say a DB) and instead of loading all the results up front, you could page them. I’m not sure I would design an API around this hypothetical case though.

    As for primitives, yes, collections and primitives are really awkward. I’ve found with Java 8, things can get even more subtle (and not in a good way).

  5. Michal Chmielarz says:

    Interesting chart you’ve put. Could you provide code of the performance tests, please?

  6. Ralf Sternberg says:

    I’ve uploaded the benchmark code to [1]. It’s based on caliper 0.5. The chart has been created with d3 [2] using a little tool that I’ll share after some polishing in the next days.

    [1] https://gist.github.com/ralfstx/10641850#file-arrayvslistbenchmark
    [2] http://d3js.org/

  7. Ralf Sternberg says:

    @Peter, thanks for this hint. I’ve run some benchmarks for array vs. list iteration and I was surprised how big the difference is. I guess that’s the price for the concurrent modification checks. This kind of counters my point that arrays lead to poor performance. I’ll run some more tests and add an update to the post.

7 responses so far

Written by . Published in Categories: EclipseSource News, Editors choice

Author:
Published:
Apr 11th, 2014
Follow:

Twitter Google+ GitHub