One thing that often helps is to use a lightweight low-overhead memory pool. If you combine this with "frame" allocation methods (ignoring any delete/free until you're all done with the data), you can get something that's ridiculously fast.
We did this for an embedded system recently, mostly for performance reasons, but it saved a lot of memory as well.
The trick was basically to allocate a big block -- slightly bigger than we'd need (you could allocate a chain of blocks if you like) -- and just keep returning a "current" pointer (bumping it up by allocSize, rounded up to maximum align requirement of 4 in our case, each time). This cut our overhead per alloc from on the order of 52-60 bytes down to <= 3 bytes. We also ignored "free" calls until we were all done parsing and then freed the whole block.
If you're clever enough with your frame allocation you can save a lot of space and time. It might not get you all the way to your 15GiB, but it would be worth looking at how much space overhead you really have... My experience with DOM-based systems is that they use tons of small allocs, each with a relatively high overhead.
(If you have virtual memory, a large "block" might not even hurt that much, if your access at any given time is local to a page or three anyway...)
Obviously you have to keep the memory you actually need in the long run, but the parser's "scratch memory" becomes a lot more efficient this way.