SAX provides a mechanism for reading data from an XML document that is an alternative to that provided by the Document Object Model (DOM).Where the DOM operates on the document as a whole, SAX parsers operate on each piece of the XML document sequentially.A SAX parser only needs to report each parsing event as it happens, and normally discards almost all of that information once reported (it does, however, keep some things, for example a list of all elements that have not been closed yet, in order to catch later errors such as end-tags in the wrong order).
Other tasks, such as sorting, rearranging sections, getting from a link to its target, looking up information on one element to help process a later one, and the like, require accessing the document structure in complex orders and will be much faster with DOM than with multiple SAX passes.
Some implementations do not neatly fit either category: a DOM approach can keep its persistent data on disk, cleverly organized for speed (editors such as Soft Quad Author/Editor and large-document browser/indexers such as Dyna Text do this); while a SAX approach can cleverly cache information for later use (any validating SAX parser keeps more information than described above).
Such implementations blur the DOM/SAX tradeoffs, but are often very effective in practice.
Due to the nature of DOM, streamed reading from disk requires techniques such as lazy evaluation, caches, virtual memory, persistent data structures, or other techniques (one such technique is disclosed in ).
Processing XML documents larger than main memory is sometimes thought impossible because some DOM parsers do not allow it.
However, it is no less possible than sorting a dataset larger than main memory using disk space as memory to sidestep this limitation.
The event-driven model of SAX is useful for XML parsing, but it does have certain drawbacks.
This takes considerable time and space for large documents (memory allocation and data-structure construction take time).
The compensating advantage, of course, is that once loaded any part of the document can be accessed in any order.
Because of the event-driven nature of SAX, processing documents is generally far faster than DOM-style parsers, so long as the processing can be done in a start-to-end pass.
Many tasks, such as indexing, conversion to other formats, very simple formatting, and the like, can be done that way.