My two cents . . .
I'd be worried if memory use was exponential based on the size of the XML document. e.g. 1mb XML file memory settles at 10mb, 2mb flattens out at 30mb, etc.
Also, consider the cost of the XML file not so much on byte size, but on the cost of each node. If your 5mb XML doc had say two data nodes, then the in-memory representation of the document wouldn't be much greater than 5mb (actually it could be far less, considering that binary data in XML will be double what it will be in memory).
*
If your XML doc is utf-8, and you've two large text nodes, then the in-memory representation could be 10mb (the text could be stored in .net strings, which are Unicode, and will be twice the width of standard English language UTF-8 text).
If the XML document is comprised of lots of discreet string values, then every node is an object, every node name is an object, every node value is an object. So assuming references are 4 bytes, that's (at least) an extra 12 bytes per node.
Now, assuming you've lots of nodes, and assume your average length of node name+value is 20 characters, then the reference overhead of a 5mb file is 3mb, plus a possible extra 100% for utf-8 to Unicode conversion, it takes 5MB + 5mb + 3mb(at least) = 13mb(at least) of ram to store a 5mb XML file . . . and that's not counting bytes lost to memory alignment, or the extra bytes used to store the size of each string object **
.
Also consider that because you're caching the XML document, all those objects immediately become generation 2 collectible objects, which basically means the GC will be very lazy about walking that considerable heap to see what it can collect.
See Rico Mariani's When to call GC.Collect() for the situations where it's not only OK to call GC Collect, but when it's necessary to call it.
Hope this helps, sorry if I'm preaching to the choir on the memory size thing.
*
I've no idea if this is actually the case, but would be surprised if it isn't.
**
I'm assuming .net strings store the size of the string before/after the actual characters of the string, this could significantly increase the in-memory representation by and extra 4-8 bytes per node, giving at 20 byte cost per 20 bytes of node name/value. Which effectively increases the overhead to match the size of the data stored.