I have an XML file that's the output from a database. I'm using the Java SAX parser to parse the XML and output it in a different format. The XML contains some invalid characters and the parser is throwing errors like 'Invalid Unicode character (0x5)'
Is there a good way to strip all these characters out besides pre-processing the file line-by-line and replacing them? So far I've run into 3 different invalid characters (0x5, 0x6 and 0x7). It's a ~4gb database dump and we're going to be processing it a bunch of times, so having to wait an extra 30 minutes each time we get a new dump to run a pre-processor on it is going to be a pain, and this isn't the first time I've run into this issue.