views:

30

answers:

1

I'm about to start writing a program which will attempt to extract data from a Google Code site so that it may be imported in to another project management site. Specifically, I need to extract the full issue detail from the site (description, comments, and so on).

Unfortunately Google don't provide an API for this, nor do they have an export feature, so to me the only option looks to be extracting the data from the actual HTML (yuck). Does any one have any suggestions on "best practice" from attempting to parse data out of HTML? I'm aware that this is less than ideal, but I don't think I have much choice. Can anyone else think of a better way, or maybe someone else has already done this?

Also, I'm aware of the CSV export feature on the issue page, however this does not give complete data about issues (but could be a useful starting point).

A: 

I just finished a program called google-code-export (hosted on Github). This allows you to export your Google Code project to an XML file, for example:

>main.py -p synergy-plus -s 1 -c 1
parse: http://code.google.com/p/synergy-plus/issues/detail?id=1
wrote: synergy-plus_google-code-export.xml

... will create a file named synergy-plus_google-code-export.xml.

nbolton