I have large XML files of 100s of MB.
Are there any utilities that can parse XML files and escape special charaters in strings without opening the entire file into memory at once?
Thanks
I have large XML files of 100s of MB.
Are there any utilities that can parse XML files and escape special charaters in strings without opening the entire file into memory at once?
Thanks
In Java, don't use the DOM. Use SAX or StaX. If not in Java, you can still use SAX either with MSXML or with Expat.
The following c++ program copies a file byte by byte, and it uses very little memory (which makes it a little bit slow). You can improve the performance by not flushing to the outfile that often.
// copy a file using associated buffer's members
#include <fstream>
using namespace std;
int main () {
char ch;
ifstream infile;
ofstream outfile;
infile.open ("original.xml",std::ifstream::binary);
outfile.open ("copy.xml",std::ofstream::binary);
while ( !infile.eof() )
{
infile >> ch;
outfile << ch;
outfile.flush();
}
outfile.close();
infile.close();
return 0;
}
If you want a unix tool, I guess you could use sed.
SAX and StAX may work if the stuff you do is very simple, otherwise, VTD-XML is the best bet