I am trying to strip data from thousands of identical Excel 2007/2010 files. I would prefer to do this using scraping techniques. Is it possible to scrape an Excel file since, as far as I know, the file is basically some sort of XML format.
So, is it possible to convert an Excel file to XML or some other markup format?
...
Im trying to parse a list of video game titles from a shopping site. however as the item list is all stored inside a tag .
This section of the documentation supposedly explains how to parse only part of the document but i cant work it out. my code:
from BeautifulSoup import BeautifulSoup
import urllib
import re
url = "Some Shopping ...
Using the HTML Agility Pack is great for getting descendants and whole tables etc... but how can you use it in the below situation
...Html Code above...
<dl>
<dt>Location:</dt>
<dd>City, London</dd>
<dt style="padding-bottom:10px;">Distance:</dt>
<dd style="padding-bottom:10px;">0 miles</dd>
<dt>Date Issued:</dt>
<dd>26/10/2010</dd>
<d...