views:

51

answers:

1

I'm looking for information on how to read GnuCash files using python. I have read about this python-gnucash which provides Python bindings to the GnuCash library, but it takes a lot of work at the moment (e.g. dependencies, headers, etc.). The instructions are tailored for the Linux environment, and a rather old GnuCash version (2.0.x). I am running GnuCash 2.2.9. Though I can operate the Linux command line, I am running GnuCash on Windows XP.

My main objective is to read (no plans to write yet) my GnuCash files so that I can create my own visual dynamic reports using matplotlib and wxpython. I'm not yet in the mood to learn Scheme.

I hope someone can point me to a good start on this. As far as I know about GnuCash and Python, I think someone probably knows solutions of the following types:

  1. More recently updated documentation aside from this one from the GnuCash wiki
  2. Some workaround, like exporting to a certain file format for which there is a more mature Python library that can read it.

You guys might have better suggestions in addition to those mentioned.

+2  A: 

Are you talking about the data files? From there wiki, it looks like they are just compressed XML files. WIth Python, you can decompress them with the gzip module and then parse them with any of the available XML parsers.

ElementTree Example

>>> import xml.etree.cElementTree as ET
>>> xmlStr = '''<?xml version="1.0" encoding="UTF-8" ?>
<painting>
<img src="madonna.jpg" alt='Foligno Madonna, by Raphael'/>
<caption>This is Raphael's "Foligno" Madonna, painted in
     <date>1511</date>?<date>1512</date>.
</caption>
</painting>
'''
>>> tree = ET.fromstring(xmlStr)  #use parse or iterparse to read direct from file path
>>> tree.getchildren()
[<Element 'img' at 0x115efc0>, <Element 'caption' at 0x1173090>]
>>> tree.getchildren()[1].text
'This is Raphael\'s "Foligno" Madonna, painted in\n    '
>>> tree.getchildren()[0].get('src')
'madonna.jpg'
Mark
+1 Thanks! That looks good for a start. I have managed to decompress it with the `gzip` module. I tried the first XML parser I saw with an example (`Expat`), but unfortunately, I couldn't parse out the tags and contents. Can you recommend which XML parser I should use, or at least get started with?
Kit
@Kit, my favorite in the standard library is cElementTree (http://docs.python.org/library/xml.etree.elementtree.html). Make sure to use the cElementTree instead of ElementTree (the former is written in C and later is pure python) for extra speed. See edits above for a little quick start.
Mark
This is great. I'm still quite stumped at dealing with namespaces in the `{URI}tag` format. Anyway, that would be a topic for another question. Thanks for your help :)
Kit
@Mark: I just found out that `lxml.etree` does a better job at [handling namespaces](http://stackoverflow.com/questions/3428792/xml-and-python-get-the-namespaces-declared-in-root-element/3428820#3428820).
Kit