views:

159

answers:

1

Hi All,

My program will be receiving an XML of size upto 8GB to 10GB with the following structure:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE gsafeed PUBLIC "-//Google//DTD GSA Feeds//EN" "">
<gsafeed>
<header>
<datasource>Name</datasource>
<feedtype>incremental</feedtype>
</header>
<group>
<record url="" action="add" mimetype="text/html" >
<content><![CDATA[ <html> <body><<br></body></html>  ]]>
</content>
</record>
<record url="" action="add" mimetype="text/html" >
<content><![CDATA[ <html> <body><<br></body></html>  ]]>
</content>
</record>
<record url="" action="add" mimetype="text/html" >
<content><![CDATA[ <html> <body><<br></body></html>  ]]>
</content>
</record>
</group>
</gsafeed>

Now I've to split this XML file in terms of one GB(approx), retaining the same structure i.e each split file should have the same header & footer, except that the number of <record> nodes will be less.

I've to do this in JDK 1.4.

Please suggest.

Thank You

+2  A: 

Do you know the footer in advance? If so, you just need a streaming API such as StAX or SAX - StAX will probably make this significantly easier than SAX, but it isn't built into Java 1.4 so you'd need an extra dependency.

Basically you'll need to do something like this:

  1. Read the header and remember it
  2. Read a record element. If there aren't any more, go to step 7.
  3. Do you currently have a file open? If not, open one and write the header to it.
  4. Write the record element to the current file.
  5. Have you reached the size limit for the file? If so, write the footer and close it.
  6. Go back to step 2.
  7. (Finished reading.) If you have an open file, write the footer and close it.
Jon Skeet