The HTML document which I am parsing contains some ASCII control codes. I noticed that PHP's DOMDocument parser truncates text nodes when it finds ASCII control characters within the node, such as
Device Control 0x13
End of Medium 0x19
File Separator 0x1C
Group Separator 0x1D
Is this a bug or a feature? Is there any way to have DOMDocument act otherwise? I resorted to remove this characters before DOM processing, but I wonder if that's the right solution.