Difference between revisions of "XML Load"
From Apache OpenOffice Wiki
Line 30: | Line 30: | ||
*# Building Doc Model | *# Building Doc Model | ||
*# Rendering | *# Rendering | ||
− | + | *# replace expat with [[libxml2]] to compare performance in office, currently not working for all files | |
== Data == | == Data == | ||
The [[Image:Xml-load-compare.ods|spreadsheet]]. | The [[Image:Xml-load-compare.ods|spreadsheet]]. |
Revision as of 14:54, 9 March 2006
ODF Documents are a zipped archive of XML files along with some assorted information and pictures. Reading and parsing the XML constitutes a chunk of the time spent in opening documents. Below is an analysis of XML processing.
We use a suite of 60 Performance Related Test Documents. The content.xml for each has been extracted for XML-only tests.
Niklas and Florian have prototyped a test component, which tokenizes XML tags, and passes tokens around. This saves string allocation times and provides speedup.(FastXML)
Comparisons
- Compare OpenOffice TestXML and TestFastXML for doc sample
- Compare different XML parsers & APIs in terms of processing content.xml
- Compare time spent in XML parsing, building document model, and rendering
Methodology
- Time is measured in the same way for each test - This is based on Time::GetSystemTicks
- File handling and parsing is done as similar as possible - within allowances of API differences.
- Only C and C++ parsers are considered - Java based parsers/wrappers are excluded.
Results
- FastXML provides good speedup, across the test suite.
- libxml2 (SAX API) is the fastest from a pure parsing point of view.
- libxml2 (Reader/processNode) is slower, but comparable to expat
- expat is faster than xerces (SAX & SAX2) as well as OO.o.
- OO.o parser has some UNO interface overhead (to be measured)
Ongoing Work
- Performance counter to measure proportion of time spent in:
- Opening & uncompressing container files
- XML Parser setup
- Actual Parsing
- String Allocation
- Building Doc Model
- Rendering
- replace expat with libxml2 to compare performance in office, currently not working for all files