Façade-Based Compare/Merge: Partial Façades
This post is an instalment in a series about a new initiative in the EMF Compare project: Façade-based compare/merge. The purpose of this effort is to enable extenders to plug in façade models that provide an alternative view on the models that a user is comparing and merging, especially for models described by domain-specific modeling languages (DSMLs) such as might be implemented with UML profiles. The goal is to remove the need for, or at least to simplify, the various extensions that are required to customize the compare/merge experience for DSMLs by having the compare/merge framework operate on a façade that more precisely implements the domain-specific semantics for comparison.
This work is led by the author, in collaboration with EclipseSource.
Having developed the integration of façade models for DSMLs based on underlying profiled UML models already through two major use cases:
- a façade for the entire UML model content
- a mix of façade models for domain-specific content and UML for generic design specification content
it was time to tackle one of the most interesting that has been driving this whole effort in prototyping façade-based compare/merge:
- partial façades: weaving a leaner representation, customized specifically for compare and merge, of only a subset of the underlying model into the content trees being compared
The results as presented to the user are demonstrated in a short video, here:
As is usual for a development activity that is proceeding at an exciting and fun pace, questions of performance had always been in the back of my mind (and actually on my to-do list) but not interesting enough yet to actually investigate. The main concern was for the cost of the various hooks that the façade integration has inserted into the compare and merge process to call out to extensions for façade-specific behaviour:
- in the matching phase, obtaining façades objects to replace the root objects of the input resources for resources that are deemed to have content covered by a façade
- in the merging phase, obtaining copiers that understand façade objects in order to
- ensure that a copied object is wrapped in a dynamic façade proxy if the original was (for façade models that do not natively implement the FacadeObject interface)
- copy the XMI IDs of underlying model elements to the corresponding elements in the merge result, because the mergers only operate on the façades and so have no visibility of the elements that actually are persisted, that are manipulated indirectly via the merger’s actions on the façades
None of this seemed liked it would have a serious impact as for the most part the number of call-outs to extensions would be fairly limited or were interceding in processes that already did a lot of work.
However, the partial façade use case felt different. We had to examine not just the roots of resources to find façades for them, but every object in the input to a comparison operation. And in the initial prototype, this called out to the façade provider extensions for every object. Not only that, but with these partial façades the matching and comparison phases needed a deeper referential integrity in the dynamic façade proxies, which introduced more Java reflection overhead. How much will all of this new overhead for extensibility cost?
Referential Integrity in the Façade
First, a word about the referential integrity requirement mentioned above in the matching and comparison phases. Up to this point, the façade integration has provided façades for model content at the root level of the resources being compared. For a resource that supports façades, all of its roots are presented to the compare/merge process in terms of the façades. For façade models that are generated specifically for EMF Compare, the root extends class serves to FacadeObjectImpl link the façade objects to the underlying resources so that functionality such as grouping by resource in the compare UI works correctly. For façade models that, for whatever reason, are not generated to implement the FacadeObject root interface, the FacadeProxy API is used to provide a dynamic proxy adding this interface to the object. So, this dynamic proxy implemented the same eResource and eDirectResource semantics as the FacadeObjectImpl to satisfy the compare framework’s requirements. But that is all it did, apart of course from the additional API defined by FacadeObject for access to the underlying model.
This is fine as long as façades are only obtained for the entire resource, because these façades are roots of the match tree and the entire content of the comparison input is provided by them.
However, the partial façades use case complicates this picture. Now, façades can be injected at any level of the model tree, and so the matching process needs a lot more guarantees about the references between façades and other objects:
- in order to place a façade’s Match in the correct place in the match tree, it needs to report a previously matched object as its eContainer. If the logical container is an element of the underlying model, then that must be supplied as is. Otherwise, if it is another façade object, then it must be the dynamic proxy for that object that is returned because that is what will be referenced by the match tree, not the actual façade implementation object
- likewise for cross-references. The matching, comparison, and merge processes all need to be able to find the Match tree nodes for referenced objects and if those referenced objects are façades, then the dynamic proxy must supply the corresponding dynamic proxies for those referenced objects
In short, a dynamic proxy for a façade must provide all of its references, by whatever API a client might use: the model’s own API and also the EMF run-time’s reflective eGet API. Also, because the merging process modifies references in the merge result, it will pass in dynamic proxies to these APIs (the model’s own API and also the EMF run-time’s reflective eSet API), so the dynamic proxy must unwrap arguments to incoming method calls to get the real façade implementations. This is summarized in the following graphic:
So, here dynamic proxies have to present other proxies as their containers, contents, and specific references (such as the body reference) whereas in the actual EMF-generated implementation these references are all in terms of the real EObject instances. There is a special case for the container of the OpaqueExpression façade proxy: it cannot actually reference the UML Constraint object as its container because there is no practical way to have that object pretend that the dynamic proxy is contained within it (we do not want to create dynamic proxies for all of the underlying model elements, also; only for the façades). However, EMF Compare needs to see this relationship in order to build the correct Match tree structure. Happily, the matching process relies on the IdentifierEObjectMatcher::getParentEObject(EObject) API to get the containing object for which to find the parent Match instance under which to add a new match. The FacadeMatchEngine creates a custom object matcher that overrides this API.
All of this dynamic delegation of the cross-referencing APIs with wrapping and unwrapping of dynamic proxies, including in custom list implementations for multi-valued references, would be assumed to incur a measurable performance penalty. But how much?
To first of all measure the performance of façade-based compare and merge against the same operation on the raw UML model, I created some JUnit test cases that
- generate a large-ish model comprising several thousand constraints with OpaqueExpressions having each two languages with two corresponding bodies
- generate a revision of this model that adds a new body (with a new language) to every constraint
- compare and merge these models left-to-right
- using the partial façade for opaque expressions
- and again not using façades (just the raw UML)
- raw-UML scenario has two variants: with and without the post-processors and mergers that implement the current (as of Oxygen release) UML semantics
- compare the relative duration of the compare and merge with and without façades
- the same again, except using a façade provider that wraps the façades injected into the matching phase in dynamic proxies
The idea for the longer term in these tests is to actually assert that the façade-based comparison does not take longer than some fixed multiple of the time it takes to do the raw comparison. Thus the test could detect in future some catastrophic performance regression in the façade integration.
So, in each test running ten iterations of the compare-and-merge experiment, I got these results, in which the first four runs are with dynamic proxies for the façades and the last four are with a façade provider that does not wrap the façades, and within each quartet the first pair contrast façade performance with the raw model with UML-specific post-processors and mergers and the second pair contrast façade performance with the raw model excluding these UML-specific hooks:
It would seem that the performance hit is staggering. Upwards of 250x the execution time with the dynamic proxies, and not much better (125x) with just the simple façades. This appears to be a compounding of overhead: something in the injection of façades cripples the performance, and then the cost of the reflective dynamic proxies just doubles whatever that overhead is.
So, what is it about the façade injection that is slowing the process down so much?
It turns out from a little work in a profiler that the most significant cost of the façade-based compare and merge was not actually the injection of the façades and other extensibility hooks related to them, but to structural matching during the initial object matching phase of the comparison.
The problem in this use case was that, whereas it is expected that for the most part a façade will present a simpler structure to EMF Compare than the underlying UML model, in which each façade object maps to one or more UML model elements, here we actually have the reverse case: multiple façade objects trace to the same UML element: an expression in the façade has child objects encapsulating language/body pairs, but these are not distinct objects in the UML. They are values taken out of attributes of the UML OpaqueExpression.
So, whereas for the OpaqueExpression façade, the FacadeMatchEngine was able to present the XMI IDs of the corresponding UML model elements as the unique identifiers for the façade objects, enabling an identifier-based match, the contained BodyEntry objects had no identity of their own. This triggered structural matching for these objects, which is much more computationally expensive than the identifier-based strategy. Orders of magnitude more expensive.
The solution was to add to the pluggable ICopier extension API an operation for computation of a façade object’s unique identifier. For this example opaque-expression façade, that computation for a BodyEntry simply combines the identifier of the OpaqueExpression façade (which is derived from the underlying UML model) with the entry’s language, which is expected to be unique (a refinement would somehow handle clashes, but it’s not necessary for the purposes of these tests). The ICopier makes some sense as the owner of this responsibility because it already handles the copying of XMI IDs in the underlying model when façades are merged, which is a related concept, but this may not be the best long-term design.
In any case, closing this gap in the matching phase so that now all façade elements have unique identifiers on which basis to match them, the performance profile looks more like this:
Altogether more satisfactory. Façade-based compare and merge performance is now in the same ballpark as raw UML compare/merge, and the factor contributed by dynamic proxies is comparable to what it was before. A bart chart presents more clearly the contrasting performance profiles: