ansaurus

Question

Python and GnuCash: Extract data from GnuCash files

Answer 1

+2 A:

Are you talking about the data files? From there wiki, it looks like they are just compressed XML files. WIth Python, you can decompress them with the gzip module and then parse them with any of the available XML parsers.

ElementTree Example

>>> import xml.etree.cElementTree as ET
>>> xmlStr = '''<?xml version="1.0" encoding="UTF-8" ?>
<painting>
<img src="madonna.jpg" alt='Foligno Madonna, by Raphael'/>
<caption>This is Raphael's "Foligno" Madonna, painted in
     <date>1511</date>?<date>1512</date>.
</caption>
</painting>
'''
>>> tree = ET.fromstring(xmlStr)  #use parse or iterparse to read direct from file path
>>> tree.getchildren()
[<Element 'img' at 0x115efc0>, <Element 'caption' at 0x1173090>]
>>> tree.getchildren()[1].text
'This is Raphael\'s "Foligno" Madonna, painted in\n    '
>>> tree.getchildren()[0].get('src')
'madonna.jpg'

Mark 2010-08-04 13:53:25

+1 Thanks! That looks good for a start. I have managed to decompress it with the `gzip` module. I tried the first XML parser I saw with an example (`Expat`), but unfortunately, I couldn't parse out the tags and contents. Can you recommend which XML parser I should use, or at least get started with?

Kit 2010-08-05 12:09:18

@Kit, my favorite in the standard library is cElementTree (http://docs.python.org/library/xml.etree.elementtree.html). Make sure to use the cElementTree instead of ElementTree (the former is written in C and later is pure python) for extra speed. See edits above for a little quick start.

Mark 2010-08-05 13:29:21

This is great. I'm still quite stumped at dealing with namespaces in the `{URI}tag` format. Anyway, that would be a topic for another question. Thanks for your help :)

Kit 2010-08-06 00:52:25

@Mark: I just found out that `lxml.etree` does a better job at [handling namespaces](http://stackoverflow.com/questions/3428792/xml-and-python-get-the-namespaces-declared-in-root-element/3428820#3428820).

Kit 2010-08-07 14:05:14

ansaurus

tags:

views:

answers:

Python and GnuCash: Extract data from GnuCash files

related questions