A Fast and Minimal JSON Parser for Java

A Fast and Minimal JSON Parser for Java

In the RAP project, reading and writing JSON are critical operations, since the server processes and creates JSON messages for a large number of clients at a high rate. For this reason, we need something fast for this job. When we switched to JSON, we included the org.json parser, which is reasonably small but not famous for its performance.

There are many better JSON libraries out there, but most do much more than we need. We really only need a bare-bones parser that can read JSON into a simple Java representation and generate JSON from Java. As we like to keep the core library self-contained, we don’t want a dependency to an external JSON library.

One winter Sunday, I started to write a JSON parser just for the fun of it, and was quickly surprised how simple it is to parse JSON.

[ Looking for more useful tools? See our Eclipse Tools page. | Need expert advice for your project? Our Developer Support is here to resolve your questions. ]

Why is JSON parsing so easy? That’s because the first character of every token uniquely defines its type ('[' for an array, '"' for a string, 't' or 'f' for a boolean, and so forth). There’s no backtracking involved. It went so well that I decided to continue and create a JSON parser tailored to our needs. Which are:

  • Fast – we read and create so much JSON that the parser directly affects the server performance
  • Lightweight – it should deal with memory sparingly as we deal with lots of messages
  • Minimal – the less code the better, as we have to maintain it
  • Simple to use – we’ll expose the API for custom component developers, so it should be simple and clear
  • No dependencies – only Java 5

The result is called minimal-json and it’s already included in RAP. It is fast, lightweight, consists only of 10 classes and I hope it’s simple to use:

Usage

You can read a JSON object or array from a Reader or from a String:

JsonObject jsonObject = JsonObject.readFrom( reader );
JsonArray jsonArray = JsonArray.readFrom( string );

Once you have a JsonObject, you can access its contents using the get() method:

String name = jsonObject.get( "name" ).asString();
int age = jsonObject.get( "age" ).asInt(); // asLong(), asDouble(), ...

The elements of a JSON array can be accessed in a similar way:

String name = jsonArray.get( 0 ).asString();
int age = jsonArray.get( 1 ).asInt(); // asLong(), asDouble(), ...

As you can see, the get() method always returns an instance of JsonValue, which can then be transformed to the target type using asString(), asInt(), asDouble(), etc. There’s no automatic conversion to Java types, no instanceof needed. If you’re not sure about the type of a value you can check it using isString(), isNumber(), etc.

Nested arrays and objects can be accessed using asArray() and asObject():

JsonArray nestedArray = jsonObject.get( "items" ).asArray();

You can also iterate over the elements of an JsonArray and the names of a JsonObject, e.g.:

for( String name : jsonObject.names() ) {
  JsonValue value = jsonObject.get( name );
  ...
}

Writing JSON

A JsonObject or JsonArray can output JSON to a Writer or as a string using the toString() method. The JSON is currently not pretty-printed, formatting support might be added later.

jsonObject.writeTo( writer );
String json = jsonArray.toString();

To create a JsonObject or a JsonArray, use the add() methods that exist for the relevant types. These methods return the object instance to allow method chaining:

jsonObject = new JsonObject().add( "name", "John" ).add( "age", 23 );
jsonArray = new JsonArray().add( "John" ).add( 23 );

You may have noticed that also the object has an add() method instead of a put() or set(). That’s because the JsonObject stores and writes its members in the order they are added. It allows you to define the output order, it even allows you to add the same key twice, which is discouraged but not forbidden by the JSON RFC.

To replace an element in an array of object, you first have to remove() the old value and then add() the new one. A replace() method may be added later. However, JsonArray and JsonObject are designed to be containers for reading and writing JSON, not general purpose data structures.

Performance

I’ve compared the time required to read and write a typical RAP message with other popular parser implementations, namely org.json (20091211), Gson (2.2.2), Jackson (1.9.12), and JSON.simple (1.1). Disclaimer: This benchmark is restricted to our use case and my limited knowledge on the other libraries. It may be unfair as it ignores other use cases and perhaps better ways to use these libraries. However, I think these results show that minimal-json can take comparison with state-of-the-art parsers.


Overall performance
I also ran a number of micro-benchmarks using Google caliper while optimizing the implementation. One interesting detail was the choice of the data structure for the JsonObject. Since JSON objects are key-value maps, a HashMap seems like an obvious choice. However, after some experiments, I ended up with two separate ArrayLists for names and values.

Of course, looking up a key in a list requires a linear search while a hash lookup is much quicker. However, creating a HashMap and adding elements has a considerable overhead. It turns out that this overhead squashes the benefits of a HashMap for very small numbers of items (< 10). Since JSON messages in RAP typically contain many objects with only a few items, a HashMap would even impair the overall performance. Moreover, the two ArrayLists require less than a third of the footprint of a HashMap.


Lookup performance
I was able to improve the lookup performance for small item counts by adding a very small hash structure to the names list. This structure consists of a 32-elements byte array. It does not handle collisions but resorts to indexOf() in case of a miss. This version (shown in the middle of the chart above) seems to be a good compromise between HashMap and plain ArrayList. Optimizations for bigger JSON objects are possible, but not needed at the moment.

How to Use it

If you’re looking for a bare-bones JSON parser with zero dependencies, you are welcome to use minimal-json. It goes without saying that it’s developed test-driven and complies with the RFC. The code lives at github and is EPL-licensed. It is also included in RAP.

I didn’t setup a build. If you would like to use the code, I suggest that you simply copy these 10 Java files to your project.

Let me know if you find this useful, fork it on github, and feel free to open an issue if you find a problem.

12 Comments
  • Matthias
    Posted at 2:57 pm, April 18, 2013

    Sounds awesome for parsing small JSON stuff. Thanks.
    I suppose converting to/from custom objects would break the “minimal” description.

  • Jillles van Gurp
    Posted at 3:43 pm, April 18, 2013

    Interesting findings. I’ve been doing my own json project on github on top of a json simple content handler.

    I currently use a LinkedHashMap for my object representation. Additionally, I’ve experimented with some memory efficient ways of storing strings. Basically, I store the utf-8 bytearray instead of the 16 bit java chars that a String holds. Additionally, I reuse instances for object keys. There’s a price for looking them up but it really cuts down on memory for scenarios where you only have a handful of different keys. I have some use cases where I cache millions of json objects in memory and cutting down on memory really helps reducing footprint.

    Your approach for using double array lists seems very interesting and I might give that a try. Also, your findings with different parsers suggest I might want to try something else than json simple.

    If you are interested, the project is here: https://github.com/jillesvangurp/jsonj

  • Cowtowncoder
    Posted at 6:07 pm, April 18, 2013

    It looks like this library can not read the usual InputStream (or even byte[]) input. This means that users will have to detect encoding using external means, as well as take bit of additional performance hit, which is not accounted for by performance tests.
    Same is true for writing JSON, assuming that one can only use Writer or produce a String — most real use cases need to deal with byte streams.

    This is a common oversight, but I hope it can be resolved to make this library more useful. Even if it just means convenience methods for reading/writing UTF-8 (default encoding for JSON) encoded JSON.

    On using two lists: that is one option, as would be use of immutable Maps for small number of entries (functional style). But for real storage savings this is not nearly as compact as using POJOs and actual databinding. So while it is good to advertise compact _Tree_ representation, it would be good to mention that Trees are rather inefficient model for JSON content; similar to how Java POJOs consume much less memory (and are much faster to operate on) than java.util.Maps and Lists.

  • Aaron Digulla
    Posted at 11:11 am, April 29, 2013

    Suggestion for an improvment: I hate String ids. In my own code, I do this:

    StringOption NAME = new StringOption( “name” );
    String name = jsonObject.get( NAME );
    IntOption AGE = new IntOption( “age” );
    int age = jsonObject.get( AGE );

    This works with a mix of generics and method overrides. In your case, method overrides should be enough. It also allows type checking when setting values.

    Since most IDs are constants, I can define them in a central place. As an additional bonus, I can easily find all places in the code where I’m accessing certain values.

  • Aaron Digulla
    Posted at 9:38 am, May 2, 2013

    Re keeping the parser minimal: That makes *your* life easier and make it more horrible for thousands of people … πŸ™‚ Creating an easy-to-use API is more important than having a technically perfect API since consumers never care about perfection. They don’t feel pain if your code is a mess, they only feel the pains of their own code.

    That said, separating this API out will encourage ignorants to ignore it. Why bother creating accessor objects when you can use a String literal?

    Re primitives: I have two getters; one throws an MissingKeyException, the other returns the supplied default value.

    I also do this for reference types (like String). Returning null when a value is missing is convenient until someone refactors the code and suddenly, null values creep out of the local scope.

  • Libor JelΓ­nek
    Posted at 3:02 pm, June 4, 2013

    Great RAP side-effect πŸ™‚ Hungry to incorporate it in my next project! Thanks Ralp!

  • parttimenerd
    Posted at 10:02 pm, June 14, 2013

    Cool library. I like your approach of keeping it simple because I need a simple JSON library for a pet project of mine that I’m able to modify easily, as I’m going to use different Java classes for integers and strings.

  • Simon Mayerhofer
    Posted at 6:23 pm, July 12, 2013

    Hey,

    looks nice and I’m going to use it for my actual project.
    But I really miss the ‘change’ method. because I have to change the value of an object which have to be on the same position like before. and not be put at the end of the array.
    So I would really love this library when you can add this Method.
    I still try to solve my problem on another way.

    regards