views:

306

answers:

5

The problem

When opening very large XML files locally, on your machine, it's almost a certainty that it will take an age for that file to open - it can often mean your computer locks down because it thinks it's not responding.

This is an issue if you serve users XML backups of rather complex databases or systems they use - the likehood of them being able to open large backups, let alone use them, is slim.

Is pagination possible?

I use XSLT to present readable backups to users. In this same way, would it be possible to pull only a page at a time of data, to prevent the entire file from being read in one go, thus causing the issues above.

I imagine the answer is simply a no - but I would like to know if anyone else has seen the same issues and resolved them.

Note: This is on a local machine only, it must not require an internet connection. JavaScript can be used if it makes things easier.

+2  A: 

Pagination with XSLT is possible, but will probably not lead to the desired results: For XSLT to work, the whole XML document must be parsed into a DOM tree.

What you could do, is experiment with streaming transformations: http://stx.sourceforge.net/

Or you could preprocess the large XML file to cut it up into smaller bits before processing with XSLT. For this I'd use a command line tool like XMLStarlet

chiborg
I'm thinking it might be easier to simply cut the file up before presenting it for download (as a zip) to the user, which is kind of annoying.
jakeisonline
A: 

HI, i don't know what programing language you are using but in C# using XMLReader i can read the file tag by tag and not the whole file. This way you can read only the first page and stop the reading. Best Regards, Iordan

IordanTanev
A: 

One way to alleviate this problem would be to split the large XML files into a number of smaller XML documents. Depending on the type of data you may split or partition the file any number of ways (i.e. Day, Transaction, Entity, etc)

This will introduce a number of other challenges of course. For instance you will have to come up with a specialized parser if you need to view the data as a whole or across partitions.

Saul Dolgin
+2  A: 

Right on, very good question!

XSLT implementations I know require DOM, so they are bound to access the entire document (although it could perhaps be done in a lazy fashion)

Anyway, you should take a look at VTD-XML: http://vtd-xml.sourceforge.net/

The latest SAXON XSLT processor also supports rudimentary support for what is called "Streaming XSLT". Read about that here: http://www.saxonica.com/documentation/index/intro.html

That said, database backups are probably not the right use case for XML. If you have to deal with XML database backups, I would try to get away from those as fast as possible. Same for logs - a linear process should work by simply appending things. I mean, it would be even better of XML would allow a forest as top level structure, but I think that is never going to happen.

Roland Bouman
Hey Roland, this look promising. I was wondering if this would require an end-user to have anything installed other than a Browser? This needs to be viewable by both geeks and non-techs alike.
jakeisonline
A: 

XMLMax Virtual xml editor will read, parse and display a 1 Gigabyte xml file in a treeview in about 30 seconds on a fast PC. Windows OS only. It will work with xml of any size or structure.

bill seacham