This example shows how to deal with big XML files in Camel.
The XPath tokenizer will load the entire XML content into memory, so it’s not well suited for very big XML payloads. Instead, you can use the StAX or XML tokenizers to efficiently iterate the XML payload in a streamed fashion. For more information please read the official documentation.
There are 2 tests:
-
StaxTokenizerTest
: requires using JAXB and process messages using a SAX ContentHandler -
XmlTokenizerTest
: easier to use but can’t handle complex XML structures (i.e. nested naming clash)
The test XML contains a simple collection of records.
<?xml version="1.0" encoding="UTF-8"?>
<records xmlns="http://fvaleri.it/records">
<record>
<key>0</key>
<value>The quick brown fox jumps over the lazy dog</value>
</record>
</records>
You can customize numOfRecords and maxWaitTime to do performance tests
with different payloads.
Max JVM heap is restricted to 20 MB to show that it works with a very
limited amount of memory (see pom.xml
).
There are also a number of optional runtime settings: - no cache enabled - no parallel processing - no mock endpoints with in-memory exchange store - enabled Throughput Logging for DEBUG level - disabled JMX instrumentation
Tested on MacBook Pro 2,8 GHz Intel Core i7; 16 GB 2133 MHz LPDDR3; Java 1.8.0_181.
tokenizer | numOfRecords | maxWaitTime (ms) | XML size (kB) | time (ms) |
---|---|---|---|---|
StAX |
40000 |
5000 |
3543 |
3052 |
XML |
40000 |
5000 |
3543 |
2756 |
StAX |
1000000 |
20000 |
89735 |
11740 |
XML |
1000000 |
20000 |
89735 |
11137 |
StAX |
15000000 |
200000 |
1366102 |
132176 |
XML |
15000000 |
200000 |
1366102 |
132549 |
If you hit any problem using Camel or have some feedback, then please let us know.
We also love contributors, so get involved :-)
The Camel riders!