views:

276

answers:

7

Hiya All,

I need to parse and process an XML feed, unfortunately the feed is about 110mb in size (and i cannot do anything about it) but to be able to parse it i need to see the structure (or if anyone has any other ideas i'd love to hear it).

But for some reason using editplus i've been unable to open the file. I'm on a 64bit Vista Machine with 4gb of ram (And alot of it free for use) but the file crashes every program i try to open it with.

Anyone have any ideas of how i can parse blindly (server runs linux...) ?! (PHP please..!) or a program that might be able to resolve my problem?

Cheers

UPDATE

I have managed to find the problem but it was resolved by the answer i've accepted. It seemed to be that the XML file wasn't just large but all on one line which seemed to break the line limit in most programs. The chosen answer textpad++ detected this and broke the file across different lines so that it can be viewed...(might help someone in the future!)

+1  A: 

I've never had trouble opening very large files in TextPad: http://www.textpad.com/

Zoran Simic
Agreed, Textpad is pretty good at this. Too bad there are hardly any updates anymore to that great little text editor...
A: 

Since XML is just text, you could potentially split it into multiple smaller files, and examine each section individually to determine the structure of the XML inside. I've used these many times to split large files into manageable chinks for emailing and such (with 20MB limits, etc). I don't know of any viewers that I can guarantee will open a 100MB+ XML file without crashing.

md5sum
problem here is, you'd have to find a program to split it up (of course, you don't want to do this manually, otherwise you might as well read it yourself), then you have no way of validating the structure programmatically, which could be a problem. or not. something feels off about this solution. a modern day computer should be able to do this, right?
MiRAGe
A modern day computer is able to do this, but a lot of PROGRAMS can't. From the question, it appears that he needs to visually inspect the file, because either the structure is already known to be invalid or for some other reason. Try opening a file that large in Notepad and it will hang forever. If you open an XML file with WordPad, it handles the file just fine, but rewrites any linebreaks and makes it difficult for a lot of applications to read the xml (including `ConfigurationManager` with .config files).
md5sum
+1  A: 

gVim can open extremely large files without trouble.

Amber
A: 

You have several options for this:

  • Notepad++ is my personal favourite for opening large files.

  • The V file viewer is pretty handy

  • Microsoft Log Parser is pretty good too, it is desgned to allow you to use SQL-like access to large text files including XML.

    E.g. Select top 10 * from test.xml

  • You could install Cygwin then use the GNU utility 'head'

  • You could use OPENROWSET to import the XML file into a SQL Server table

Jon Winstanley
+2  A: 

XmlReader is a pull-parser. It maintains a cursor in the file and only reads in one element at a time. It's a slightly different way to work with XML over DOM, but it performs well for large files.

Of course, if you just want to peek manually into the file, use less or vim for it.

troelskn
+1  A: 

XMLMax will open your 100MB file in a treeview in under 5 seconds and will handle any size or structure xml file. It also has a number of options to split it up for you. YOu mentioned wanting to see the structure: if you create an index, the index file, which is a plain text utf-8 file, has a list at the end of all the unique paths in the xml file.

bill seacham
+1  A: 

THIS is the program you want. It's the best I've seen anywhere and I regularly use it for large XML documents. It's completely free, tiny and doesn't require an install.

Damned genius, and nobody has ever heard of it!

XML Viewer 3.1

If that link doesn't work, scroll down on this page until you find it:

http://www.mitec.cz/Data/XML/data_downloads.xml

Django Reinhardt