Wednesday, March 28, 2007

XSLT via Smooks - Comparisons

As you may already know, JBoss ESB uses Smooks to perform message transformation. One of the design goals of Smooks is to support fragment based transformations i.e. targeting transformation logic at a fragment of an XML message/document as opposed to the whole message. The good thing about this approach is that it not only means you can perform more componentization of your transformation logic, but it also means (often more importantly) that you can easily mix and match different types of transformation logic (XSLT, Java, Groovy, StringTemplate etc) within the context of a single message transformation. This is a very powerful feature. From a purely XSLT perspective, this also means you can have functionality equivalent to XSLT Extensions (Xalan Extensions), but maintain portability across XSL Processors.

I recently performed some performance profiling of Smooks based XSL transforms, comparing them against standalone XSLT. This was done in order to get a better idea of the overhead incurred by using Smooks to perform XSL based transforms. I'll attempt to answer the question of why you'd bother applying XSLT via Smooks in future blog.

The comparisons were run on a Dell Latitude D820 (2GHz Dual Core), 2 GB of RAM, running Windows XP, running jdk1.5.0_10.

For this round of comparisons (I hope to do more some time in the future):
  1. I selected a very simple message format (an with multiple ), as well as a very simple set of XML to XML transformations to be performed on that message. I wanted to keep it simple in order to keep the resulting XSLTs as simple as possible. This way I hope I can get a "worse case scenario" re the overhead incurred while using Smooks to perform XSL transforms. The tests are available here.
  2. I compared the number of messages transformed in a 6hr time period.
  3. I compared the results produced by using the following XSLT processors: Sun XSLTC (JDK 1.5), Xalan 2.7.0, Saxon 8.7.
  4. I compared the results produced by a DOM Source and Result for XSLT Standalone versus a DOM Source and Result on Smooks.
  5. I compared the results produced by streaming both the Source and Result for XSLT Standalone versus a DOM Source and Result on Smooks (Smooks only supports DOM based XSLT).
  6. I tested against 10 input messages ranging in size from 20K to 200K. That is, 10 threads running concurrently, continually transforming, each pausing for 10ms between iterations.
  7. The message bytes were read and buffered in memory so as to avoid File IO variabilities.
  8. During the timed test runs, I didn't compare the transformation output against the expected output. However, I did perform preliminary test runs on all processors and verified that the output produced across the board was consistent. The one exception to this was Xalan 2.7.0, which has a concurrency bug that results in the transformed output not being consistent when the XSL is not applied in a synchronized fashion. I still ran the tests against Xalan unsynchronized, so as to get an idea of how it would perform, assuming the absence of this bug.
The test results (listed below) clearly show that streaming XML into an XSL processor makes a huge difference in terms of throughput. The fact that streaming is faster did not surprise me, but the extent of the difference did. In some cases the whole process (taking the input stream, transforming it and producing the output stream) was up to 4 times faster than the poorest performing alternative (taking the input stream, paring it to a DOM, transforming the DOM to a DOM using the XSL Transformer and finally serializing the resulting DOM to the output stream).

In summary, the performance figures are as follows....

DOM based XSLT Vs Smooks based XSLT (also DOM based):

Total Bytes - XSLTTotal Bytes - Smooks/XSLTSmooks Relative Performance
Sun XSLTC709530946565995260892884.50%
Saxon 8.7319880472482987463103293.39%
Xalan 2.7.0400900381203857131051296.21%

(full details)

Stream based XSLT Vs Smooks based XSLT (DOM based):

Total Bytes - XSLTTotal Bytes - Smooks/XSLTSmooks Relative Performance
Sun XSLTC2237761892165995260892826.79%
Saxon 8.71464828944402987463103220.39%
Xalan 2.7.0863166141363857131051244.69%

(full details)

To understand what this means for JBoss ESB Transformation, you first need to understand what it means for Smooks. At the top of this blog I mentioned how Smooks implements a fragment based approach to message transformations (including some of advantages of that approach). The down side to this fragment based approach is that Smooks currently supports this via a DOM based model.

So what does all this mean for JBoss ESB Transformation. Well, because JBoss ESB Transformation relies on Smooks to manage and apply transformation logic (including XSL transforms), it means that JBoss ESB cannot currently avail of the performance advantages offered by streaming messages into an XSL Processor. So, for a transformation where you don't need to features offered by Smooks (fragment based transforms etc) and you want to implement the transformation in XSLT, the performance hit is quite significant.

Luckily, this is not all that difficult to work around for JBoss ESB because we can simply implement a native XSLT ActionProcessor that can apply XSLTs using streaming. This is just a couple of lines of code - no big deal. From a Smooks perspective, there's a little bit more involved, but I don't think all that much. In short, Smooks could simply check the resources targeted at the message exchange in question, and if there's only one, which is targeted at the "docroot" fragment, and it's a stream supporting resource, then stream it - no need to DOM'ify. This would enable JBoss ESB to have the best of both worlds again... stream based XSLT + management and exchange based selection of transformation resources (via Smooks). We'll keep this for another blog :-)

Saturday, March 24, 2007

JBossESB 4.2 Milestone Release 1 is out!

We've just released MR1 for the 4.2 version of JBossESB. It's surprising how much work we've managed to cram into the time between the 4.0 GA release and this! We've increased the developer community size yet again (we're averaging 1 new contributor a week) and Bill Burke has begun to get involved too. Welcome Bill!

There have been quite a few changes between 4.0 GA and 4.2 MR1, including:

  • jBPM integration: you can now invoke a service, use an ESB action to start a new process and signal a process, all from within the ESB.

  • Groovy scripts can be embedded within the ESB via the action framework.

  • Configure your ESB graphically, via a community donated editor.

  • Scoped deployments within the same ESB server; see Kurt's earlier posting.

  • JBoss Messaging 1.2.0GA is now the recommended JMS implementation, both for standalone and embedded operation.

  • We've improved the performance of the ESB, particularly if you use JMS as your transport.

  • There's now a Dead Letter Queue for the CBR; if it can't route your message then at least it can be persisted for later offline management.

There's more to come too as we move towards JBossESB 4.2 and 5.0. Hopefully this will give you a flavour of what's to come. Over the next few days, members of the team will post a few entries here on different topics of interest around our ESB. If there's anything specific you'd like to see an entry on, just let us know and we'll try to accommodate you.

Thursday, March 22, 2007

The esb archive does not end in ar.

After jar, war, ear, sar and har we now have a new type of archive called esb! An esb archive is one of the new features of our upcoming release and it packages up both your configuration and your custom code (action classes) in one neat package. The structure of the esb archive looks like:

│ └───jboss-esb.xml
├─── classes
├─── queue-service.xml

What used to be called jbossesb.xml is now called jboss-esb.xml and it lives in the META-INF directory of the archive together with the MANIFEST.MF. Your custom action classes go in the root of the archive and optionally you can add a queue-service.xml definition to bring up any Queues or Topics that are specific to this ESB package.

All the ESB libraries and property files are now consolidated into one jbossesb.sar archive and you can simply deploy the esb archives to the deploy directory. Yes archives (plural), you can deploy multiples of them!

Monday, March 5, 2007

W3C Workshop Day 2

The second day of the workshop was even more interesting then the second one. The theme of this day was "Separate or Together? (i.e. one Web or two?)". The "mashup" technology has demonstrated you can offer up a Service on the Web and people can use it their applications just fine. Think of the Google maps - Greg's List integration alone. What does this mean for WebServices? Should the W3C create a REST Description language (e.g. standardize WADL)? Mark Baker went as far as saying we don't really need WebServices at all, but I think the general consensus was that some form of integration of the two world would be a better alternative. Both technologies have their own strengths, but it would be nice of WebService technology would still 'work' on the Web. In other words WebService should at least support the HTTP/GET, and the URI should be identifying address.

This brought up a nice discussion on EPRs, which was interesting for me, since JBossESB is using EPRs. EPRs look a lot like URIs, but they can contain more information, and they are not identifiers like URIs are. In fact an EPR should be able to "fall back" to a URI is its simplest form, in a manner that is consistent with REST. Let's learn from the Web and try to make the Web and Web Services live in Harmony. Both architectures are in use and the market place will determine where these technologies develop. Maybe use the URI part of the EPR as identifier. The Web has taught us the power of the URI. Let's see what WADL can do for REST.

Finally I wanted to mention a presentation by David Booth on Resource Description Framework (RDF). His point is that to (automated) service interaction is hindered by non-consistent data naming ('babelonization'). And RDF provides a framework to describe a piece of data in a standardized manner such that data integration can be automated. Such a framework would reduce the complexity of service integration dramatically. Maybe something our Smooks engine can take advantage off by generating transformations on the fly.

It looks like time and the marketplace will tell if and how the two Webs will converge.

Friday, March 2, 2007

W3C Workshop Day 1

Well last Tuesday I was at a workshop of the W3C. A first for me, and I enjoyed it a lot, after I got the acronyms down that is. I loved having the wikipedia at my fingertips I have to say!

The first day was titled "What's missing from the picture - new stuff to consider". The gist of that day was that some people fully embraced Web Services with SOAP and WSDL. The speed of development is great, and there are a great number of extension standards to make Web Services into first class Services. I actually presented a paper by Mark to push towards a general context standard in Web Services.

However other voices were calling to "Stop making standards and let's make things work". This quote is from Paul Downey from BT and it resonated with a lot of other people in the meeting. Basically vendors are still working out some interoperability issues and there are no 'best practices' available on how to really use SOAP with all it's WS-* extensions. They were calling on the w3c to provide somekind of test suite and knowledge base. In my humble opinion I see a huge oppertunity for the Open Source community to jump in here, and for the JBossESB in particular. I think we should lead the way to bring WS-* to the developer community. JBoss has always been very much in touch with the developers, and I think there is an opportunity for the JBossESB team to show how these standards can be used to right way.

Well I'd love to hear what you think. Stay tuned for what happened on day 2 of the workshop!