XML Load
From Apache OpenOffice Wiki
ODF Documents are a zipped archive of XML files along with some assorted information and pictures. Reading and parsing the XML constitutes a chunk of the time spent in opening documents. Below is an analysis of XML processing.
We use a suite of 60 Performance Related Test Documents. The content.xml for each has been extracted for XML-only tests.
Niklas and Florian have prototyped a test component, which tokenizes XML tags, and passes tokens around. This saves string allocation times and provides speedup.(FastXML)
Comparisons
- Compare OpenOffice TestXML and TestFastXML for doc sample
- Compare different XML parsers & APIs in terms of processing content.xml
- Compare time spent in XML parsing, building document model, and rendering
Methodology
- Time is measured in the same way for each test - This is based on Time::GetSystemTicks
- File handling and parsing is done as similar as possible - within allowances of API differences.
- Only C and C++ parsers are considered - Java based parsers/wrappers are excluded.
Results
- FastXML provides good speedup, across the test suite.
- libxml2 (SAX API) is the fastest from a pure parsing point of view.
- libxml2 (Reader/processNode) is slower, but comparable to expat
- expat is faster than xerces (SAX & SAX2) as well as OO.o.
- OO.o parser has some UNO interface overhead (to be measured)
Ongoing Work
- Performance counter to measure proportion of time spent in:
- Opening & uncompressing container files
- XML Parser setup
- Actual Parsing
- String Allocation
- Building Doc Model
- Rendering
- replace expat with libxml2 to compare performance in office, currently not working for all files