Hi all,
I'm working on a piece of code to split files. I want to split flat file (that's ok, it is working fine) and xml file. The idea is to split based of a number of files to split: I have a file, and I want to split it in x files (x is a parameters). I'm doing the split by taking the size of the file and spliting the size by the number of files to split. Then, mysolution was to use a BufferedReader and to use it like
while ((n = reader.read(buffer, 0, buffer.length)) != -1) {
{
The main problem is that for the xml file I cannot just split it, but I have to split it based on a block delimited by a start xml tag and end xml tag:
<start tag>
bla bla xml stuff
</end tag>
So I cannot cut a block at the middle. So if when I'm at the half of a block, is the size of my new file is greater than my max, I will have to read until the end of the tag, and then, to start a next file.
The problem is that I have all sort of cases, and is a bit difficult to search the end tag. - the block reads a text until the middle of the end tag - the block reads a text until the end of the end tag, and no more other caracter after - etc and in the same time to have a loop and read the next block. Some times the end of a block concatenated with the start of the next one, I have the end xml tag. I hope you get the idea.
My question is, does anyone have some algorithm that does that more accurate and who i treating all special cases ?
The idea is to split the file as quickly as possible. I did not want to use a lib to treat the file as a xml file because the size of a block cand be smaller or very large, and I don't know if the memory will be enough. Or there is some lib that does not load all in memory?
Thanks alot.
Here below an example of my xml file;
<?xml version="1.0" encoding="UTF-8" ?>
<myTag service="toto" version="1.5.18" >
<endOfPeriodTradeNotification version="1.5.18">
.............
</endOfPeriodTradeNotification>
<endOfPeriodTradeNotification version="1.5.18">
.............
</endOfPeriodTradeNotification>
<endOfPeriodTradeNotification version="1.5.18">
.............
</endOfPeriodTradeNotification>
<inventoryDate>2009-12-31</inventoryDate>
<!-- reporting date -->
<processingDate>2010-01-29T00:00:00</processingDate>
</myTag>
I forgot one thing: my xml file could be all written on the first line, so I cannot gues that one line has one tag.