views:

35

answers:

2

What is the most effective (performance-wise) and clean way to perform the transformation of taking up to 500 XML files, sized up to 50 Mb each and making a single String out of them. All files are XML and need to keep the formatting etc.

I'm currently doing the reading using XMLEventReader , and then XMLEventWriter,reading one event at a time and using StringBuilder to concatenate all the String results. Then making it into a String at the end of the method, but this crashes due to lack of Java heap space in the IDE, way before 500 files...

Thanks !

A: 

Do you really need to bother about them being XML? Can't you read each file in turn line by line using a BufferedReader and just output all the lines to a PrintWriter?

Qwerky
I need to preserver their XMLness, for lack of better word -as this is what the caller method expects...I'll check out the approach you suggested and see whether it works for this case.
akapulko2020
Note that the resulting concatenated file will not be valid XML as it won't have a single root node. However it seems improbable that you would ever use such a large file as XML; parsing it would almost certainly run you out of memory.
Qwerky
yep, that's indeed the error response the caller method provided. As it indeed parses it as XML.. :(
akapulko2020
+1  A: 

This String object will have a size of upto 50 GByte (50 MByte * 500 * 2). You're aware of that, are you?

As you talk about input files and want to keep the serialized xml data in a String, you don't have to parse the files but can just append the file contents to your StringBuilder.

Assuming, all files are in a single folder and with a little help of commons-io, this should do it (... not on my machine with 4 GB Ram, btw):

 File[] files = parentFolder.list();
 StringBuilder veryVeryBigBuilder = new StringBuilder();
 for (File file: files) {
   if (isXmlFile(file)) {
      veryVeryBigBuilder.append(FileUtils.readFileToString(), encoding);
   }
 }
Andreas_D
The math looks scary :), thanks . why *2 BTW?
akapulko2020
you mean I should just read the file (as bytes[]?) and append to StringBuilder?
akapulko2020
A `char` in Java is represented by 16 bit and a String is backed by a `char[]`.
Andreas_D