views:

129

answers:

5

Hello, I have a file, which change it content in a short time. But I'd like to read it before it is ready. The problem is, that it is an xml-file (log). So when you read it, it could be, that not all tags are closed.

I would like to know if there is a possibility to close all opened tags correctly, that there are no problems to show it in the browser (with xslt stylsheet). This should be made by using included features of python.

Thank you.

A: 

You can use any SAX parser by feeding data available so far to it. Use SAX handler that just reconstructs source XML, keep the stack of tags opened and close them in reverse order at the end.

Denis Otkidach
+5  A: 

Some XML parsers allow incremental parsing of XML documents that is the parser can start working on the document without needing it to be fully loaded. The XMLTreeBuilder from the xml.etree.ElementTree module in the Python standard library is one such parser: Element Tree

As you can see in the example below you can feed data to the parser bit by bit as you read it from your input source. The appropriate hook methods in your handler class will get called when various XML "events" happen (tag started, tag data read, tag ended) allowing you to process the data as the XML document is loaded:

from xml.etree.ElementTree import XMLTreeBuilder
class MyHandler(object):
    def start(self, tag, attrib):
        # Called for each opening tag.
        print tag + " started"
    def end(self, tag):
        # Called for each closing tag.
        print tag  + " ended"
    def data(self, data):
        # Called when data is read from a tag
        print data  + " data read"
    def close(self):    
        # Called when all data has been parsed.
        print "All data read"

handler = MyHandler()

parser = XMLTreeBuilder(target=handler)

parser.feed(<sometag>)
parser.feed(<sometag-child-tag>text)
parser.feed(</sometag-child-tag>)
parser.feed(</sometag>)
parser.close()

In this example the handler would receive five events and print:

sometag started

sometag-child started

"text" data read

sometag-child ended

sometag ended

All data read

Tendayi Mawushe
+1  A: 

If I am understanding your question correctly, you have a log file that is always being appended to so you get something like:

<root>
<entry> ... </entry>
<entry> ... </entry>
...
<entry> ... </entry
<!-- no closing root -->

In this case you DON'T want to use a DOM parser because it tries to read a complete document and would choke on the missing tag. Instead, a SAX or Pull parser would work because it reads the document like a stream of data rather than a complete tree. As Denis replied above, you could either close the missing tag at the end or ignore any incomplete tags before writing it out.

XML parsing on Wikipedia

antonm
A: 

year, but the problem is, that I don't want to wait for loading the full document. I want to close all opened tags.

John
A: 

You could use BeautifulStoneSoup (XML part of BeautifulSoup).

www.crummy.com/software/BeautifulSoup

It's not ideal, but it would circumvent the problem if you cannot fix the file's output...

It's basically a previously implemented version of what Denis said.

You can just join whatever you need into the soup and it will do its best to fix it.

Swixel