views:

28

answers:

2

I'm trying to extract data from : http://www.phillysheriff.com/old_site/properties.html

Ideally I'd be able to get a CSV file with the address, ward, price, and square feet? Is there an easy way to do this?

A: 

The process of extracting information like this from webpages is known colloquially as "scraping". If it was me I'd use the python language and the "Beautiful Soup" package to do it. However, a google for "screen scrape" or "web scrape" and your favourite programming language should find you a package that will do the hard work for you.

Nick Fortescue
A: 

You can run IRobotSoft web scraper, open the page in its browser window, and use menu: Design -> Practice HTQL. Give the following HTQL query in the input box to transform the page into a standard HTML table:

<hr sep>2-0{
a=<center>1 &tx &trim;
b=<center>1:xx ./'nbsp'/1 &tx &trim('&; ');
c=<center>1:xx ./'nbsp'/3 ./'\n'/1 &tx &trim('&; ');
d=<center>1:xx ./'nbsp'/3 ./'Ward'~'BRT#'/1 &tx;
e=<center>1:xx ./'nbsp'/3 ./'BRT#'~'Improvements:'/1 &tx;
f=<center>1:xx ./'nbsp'/3 ./'Improvements:'/2 &tx;
g=<br sep>2. /'nbsp'/1 &tx &trim('&; ');
h=<br sep>2. /'nbsp'/3 &tx &trim('&; '); 
i=<br sep>2. /'nbsp'/5 &tx &trim('&; ');
j=<br sep>2. /'nbsp'/7 &tx &trim('&; ');
}
seagulf