I recently performed some performance profiling of Smooks based XSL transforms, comparing them against standalone XSLT. This was done in order to get a better idea of the overhead incurred by using Smooks to perform XSL based transforms. I'll attempt to answer the question of why you'd bother applying XSLT via Smooks in future blog.
The comparisons were run on a Dell Latitude D820 (2GHz Dual Core), 2 GB of RAM, running Windows XP, running jdk1.5.0_10.
For this round of comparisons (I hope to do more some time in the future):
- I selected a very simple message format (an
with multiple ), as well as a very simple set of XML to XML transformations to be performed on that message. I wanted to keep it simple in order to keep the resulting XSLTs as simple as possible. This way I hope I can get a "worse case scenario" re the overhead incurred while using Smooks to perform XSL transforms. The tests are available here. - I compared the number of messages transformed in a 6hr time period.
- I compared the results produced by using the following XSLT processors: Sun XSLTC (JDK 1.5), Xalan 2.7.0, Saxon 8.7.
- I compared the results produced by a DOM Source and Result for XSLT Standalone versus a DOM Source and Result on Smooks.
- I compared the results produced by streaming both the Source and Result for XSLT Standalone versus a DOM Source and Result on Smooks (Smooks only supports DOM based XSLT).
- I tested against 10 input messages ranging in size from 20K to 200K. That is, 10 threads running concurrently, continually transforming, each pausing for 10ms between iterations.
- The message bytes were read and buffered in memory so as to avoid File IO variabilities.
- During the timed test runs, I didn't compare the transformation output against the expected output. However, I did perform preliminary test runs on all processors and verified that the output produced across the board was consistent. The one exception to this was Xalan 2.7.0, which has a concurrency bug that results in the transformed output not being consistent when the XSL is not applied in a synchronized fashion. I still ran the tests against Xalan unsynchronized, so as to get an idea of how it would perform, assuming the absence of this bug.
In summary, the performance figures are as follows....
DOM based XSLT Vs Smooks based XSLT (also DOM based):
Total Bytes - XSLT | Total Bytes - Smooks/XSLT | Smooks Relative Performance | |
Sun XSLTC | 70953094656 | 59952608928 | 84.50% |
Saxon 8.7 | 31988047248 | 29874631032 | 93.39% |
Xalan 2.7.0 | 40090038120 | 38571310512 | 96.21% |
(full details) |
Stream based XSLT Vs Smooks based XSLT (DOM based):
Total Bytes - XSLT | Total Bytes - Smooks/XSLT | Smooks Relative Performance | |
Sun XSLTC | 223776189216 | 59952608928 | 26.79% |
Saxon 8.7 | 146482894440 | 29874631032 | 20.39% |
Xalan 2.7.0 | 86316614136 | 38571310512 | 44.69% |
(full details) |
To understand what this means for JBoss ESB Transformation, you first need to understand what it means for Smooks. At the top of this blog I mentioned how Smooks implements a fragment based approach to message transformations (including some of advantages of that approach). The down side to this fragment based approach is that Smooks currently supports this via a DOM based model.
So what does all this mean for JBoss ESB Transformation. Well, because JBoss ESB Transformation relies on Smooks to manage and apply transformation logic (including XSL transforms), it means that JBoss ESB cannot currently avail of the performance advantages offered by streaming messages into an XSL Processor. So, for a transformation where you don't need to features offered by Smooks (fragment based transforms etc) and you want to implement the transformation in XSLT, the performance hit is quite significant.
Luckily, this is not all that difficult to work around for JBoss ESB because we can simply implement a native XSLT ActionProcessor that can apply XSLTs using streaming. This is just a couple of lines of code - no big deal. From a Smooks perspective, there's a little bit more involved, but I don't think all that much. In short, Smooks could simply check the resources targeted at the message exchange in question, and if there's only one, which is targeted at the "docroot" fragment, and it's a stream supporting resource, then stream it - no need to DOM'ify. This would enable JBoss ESB to have the best of both worlds again... stream based XSLT + management and exchange based selection of transformation resources (via Smooks). We'll keep this for another blog :-)
2 comments:
Ha! I beat Mark to leaving the first comment. Nice read Tom. I really like the fragment approach, having written some huge XSLTs in the past. I think one of the most attractive benefits for me is re-using the fragments in other transformations. That could have saved me a lot of time. I'd like to see a "best-case-scenario" test where the XLST gets really hairy, i.e. due to lots of normalization in the XML which makes you have to loop and jump, as that would be much easier to implement (and probably better performing) using Smooks. Maybe we can do this for a real world scenario, if any of our users is interested in some transformation optimization.
Another thought I had that fragment processing may lend itself better to utilize the power of multiprocessor machines?
Sure thing Kurt.... comparisons based on a more complex transformation scenario are something I do plan on doing.
As I said, I purposely chose the message format and transformation so as to favor XSLT. Any XSLT processor should be able to perform that transform in a single pass.
As the transformation gets more hairy, I'd expect the advantages of streaming the message to fall off - I've yet to prove that more formally however :-)
The streaming approach also has the downside of it being a one-shot-transformation-solution. You need to be able to do all your transformation within a single application of the XSLT.
Post a Comment