Processing large XML files in the SOA Suite by Emiel Paasschens


clip_image002Read large XML files in chunks


At my current project, XML files are uploaded by the end-user to be processed in the Oracle SOA Suite. The XML files contain information about employers and their employees. Because an employer can have hundreds and even thousands of employees, these XML files can be quite large.
Processing such large XML files consumes a lot of memory and can be a bottleneck especially when multiple end users are uploading large XML files at the same time. It even can cause a server to crash because of an OutOfMemory problem.
The best way to solve is, is to read and process the large XML files in chunks, so read and process XML fragments instead of the full XML file.
My colleague, Aldo Schaap, already did and describes this for CSV files in his blog “Processing large files through SOA Suite using Synchronous File Read“. I thankfully used his blog to do the same for XML processing. However, a few things are slightly different in reading XML instead of CSV, so that’s the reason for this blog.
Another reason is that I ran into another problem, which I will describe later on in this blog. To be able to solve this problem I have to ‘pre transform’ the XML file. This means the XML file needs to be transformed before it is read by the SOA Suite. To achieve this I used the pre processing features of the file adapter with a custom (Java) valve. This pre en post processing is described in the blog “SOA Suite File Adapter Pre and Post processing using Valves and Pipelines” by Lucas Jellema.
The combination of these two blogs provided me the solution for my problem.

Problem Description

Back to my problem. The large XML files, which have to be parsed, contain one ‘Message’ element as root. This root element contains one or more employers with some basic employers information and each employer can contain multiple employee elements, up to thousands, with employee information and employment information. In the real use case the XML structure contains Dutch element names and the XML is very specific about the business problem. For the purpose of this blog, I’ve reduced the problem to a basic XML structure with English names and used some basic sample data. XSD source: Read the complete article here.

SOA & BPM Partner Community

For regular information on Oracle SOA Suite become a member in the SOA & BPM Partner Community for registration please visit (OPN account required) If you need support with your account please contact the Oracle Partner Business Center.

Blog Twitter LinkedIn image[7][2][2][2] Facebook clip_image002[8][4][2][2][2] Wiki

About Jürgen Kress
As a middleware expert Jürgen works at Oracle EMEA Alliances and Channels, responsible for Oracle’s EMEA Fusion Middleware partner business. He is the founder of the Oracle SOA & BPM and the WebLogic Partner Communities and the global Oracle Partner Advisory Councils. With more than 5000 members from all over the world the Middleware Partner Community is the most successful and active community at Oracle. Jürgen manages the community with monthly newsletters, webcasts and conferences. He hosts his annual Fusion Middleware Partner Community Forums and the Fusion Middleware Summer Camps, where more than 200 partners get product updates, roadmap insights and hands-on trainings. Supplemented by many web 2.0 tools like twitter, discussion forums, online communities, blogs and wikis. For the SOA & Cloud Symposium by Thomas Erl, Jürgen is a member of the steering board. He is also a frequent speaker at conferences like the SOA & BPM Integration Days, JAX, UKOUG, OUGN, or OOP.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: