Those Leaky Networks
In previous blog posts I’ve blogged about ECF’s upcoming implementation of RFC 119.
In this post, I would like to jump out of the description of RFC 119 and talk about how the implementation of RFC 119 and ECF remote services fit together…as our implementation of RFC119 is layered on top of the ECF remote services API.
I think of remote procedure call as a leaky abstraction. For those that haven’t read Joel Spolsky’s The Law of Leaky Abstractions, I highly recommend it. The reason I would say that it’s a leaky abstraction is that although transparent RPC *looks* like a local method call, it’s clearly not under some situations:
- The remote call is very slow (or blocks) because of a network problem
- The remote call cannot complete because the network fails/goes down
- A parameter to the remote method call cannot be serialized…e.g. trying to make a remote call e.g. like this: serviceProxy.setWorkspace(IWorkspace workspace…why would passing a workspace to a method call be a problem?)
There are others, but I think these are the most compelling.
Note that all three of the above are runtime issues…i.e. they can happen at runtime for lots of reasons that have nothing to do with the semantics of the RPC itself. In the case of 1 and 2 they cannot be prevented. And they are likely to happen for non-trivial procedures.
Note also that since OSGi services are method calls on some service interface (a pojo), that remote services will also have the above issues. It doesn’t matter what your remoting implementation is, they will all be subject to such problems. Unfortunately, we (the distribution system implementers) can’t prevent it.
So, to me this means remote procedure call is a leaky abstraction…because even though it looks like normal/local/in memory method call, there are occasions where the ’truth’ about networks leaks out.
So what to do? Well, I think there are several things to do, both from the service designer’s viewpoint (i.e. those defining the service to be remoted) and the distribution system implementer’s viewpoint (i.e. people that implement distribution infrastructures…like me).
From the service designer’s viewpoint you could design all of your services to prepared for 1, 2, 3 above…and/or document them as having these properties. This can/does definately help. But it is a major pain, and you can end up having services that are more complex (especially if they are used locally as well as remotely).
From the distribution system implementer’s viewpoint I feel one thing to do is what Joel describes in his paper as what TCP did for IP..layering.
That is the approach we’ve taken with implementing RFC 119…as the implementation of transparent remoting as specified in RFC119 is implemented on a non-transparent/explicit remoting API (ECF remote services). I think this is nice, because it allows/supports more use cases:
“I want to create a simple remote service, as easily as I can and have it work” (use RFC 119)
“I want to create a remote service that knows about or at least responds properly to the remoting leaks (1, 2, 3, etc above), and not simply crash/block/fail when the service is used” (use ECF remote services)
So, we’ve implemented RFC119 itself using ECF remote services…and layered transparent remoting on top of a non-transparent remoting runtime API. This gives choices to both service designers and service consumers about how much they want/need to care about the network leaking into remote OSGi services. That is, if they care about the leaks they can do something about them, but if they (or the service consumers) don’t care about such leaks they can have a standard way of publishing, discovering, and receiving remote OSGi services.