I'm trying to understand how Word files are rebuilt when opened by Microsoft Word, and in what format they are serialized when edits are saved and a file closed. Any information you may have would be very useful to me? Thanks
All .doc files are stored in a binary format. Opening and manipulating these is an exercise in PAIN.
All .docx files are actually a collection of XML files stored in ZIP format. That's right, just change the extension of .docx or .xmlx, or .pptx to .ZIP and you can open up the file just like any other ZIP file. MS even has an API for those formats called Office Open XML. Personally, I think the OOXML APIs have a pretty steep learning curve, and when I tend to make Word files or otherwise manipulate them, I just make a sample file, unzip it, and manipulate its innards. IMO the basics of the OOXML files are simple enough to use without a big old API...
Are all MS Word documents serialized in an XML readable format?
Short answer: no.
Long answer: Upon each few releases, MS changed the format for word documents. Thus Word 6.0 to 95 use a format, Word 97 to 2002 (a.k.a. XP) use another, 2003 another, and 2007 yet another one.
Of course, each version can save and open documents in older formats (although newer features can't normally be saved on such older formats).
The formats up to 2003 (.doc) are incremental upgrades of the previous ones, and are binary based.
The format introduced with Office 2007 (.docx) is XML-based, and was forced as an ISO Standard "ISO/IEC 29500:2008 Office Open XML", although word itself is not fully compliant with that standard. Note that Word 2007 can still save (and open) documents in older, binary formats.
Hope this helps.