I currently have some Ruby code used to scrape some websites. I was using Ruby because at the time I was using Ruby on Rails for a site, and it just made sense.
Now I'm trying to port this over to Google App Engine, and keep getting stuck.
I've ported Python Mechanize to work with Google App Engine, but it doesn't support DOM inspection with XPATH.
I've tried the built-in ElementTree, but it choked on the first HTML blob I gave it when it ran into '&mdash'.
Do I keep trying to hack ElementTree in there, or do I try to use something else?
thanks, Mark