My guess is that the XML isn't properly UTF-8 encoded. Please show the bytes within the <shortest>
element in the raw file... I suspect you'll find they're not a validly encoded character. If you could show a short but complete program which generates this XML from valid input, that would be very helpful. (Preferably saying which platform it is, too :)
EDIT: Something very odd is going on in this file. Here are the hex values for the "shorter" and "shortest" values:
Shorter: C3 96 72 77 69 63
Shortest: EF BF BD 2E
Now "C3 96" is the valid UTF-8 encoding for U+00D6 which is "Latin capital letter O with diaeresis" as you want.
However, EF BF BD is the UTF-8 encoding for U+FFFC which is "object replacement character" - definitely not what you want. (The 2E is just the ASCII dot.)
So, this is actually valid UTF-8 - but it doesn't contain the characters you want. Again, you should examine what created the file...