Hi!
I have a small crawler/screen-scraping script that used to work half a year ago, but now, it doesnt work anymore. I checked the html and css values for the reg expression in the page source, but they are still the same, so from this point of view, it should work. Any guesses?
require "open-uri"
# output file
f = open 'results.csv', 'w+'
# output string
results = ""
begin
# crawl first 20 pages
for i in (1..20)
open("http://www.my-hammer.de/search.php?mhFormData[allCategories]=1&mhFormData[rangeAll]=1&mhFormData[priceRangeEnd]=999999999&mhFormData[refineSearch]=1&mhFormData[searchText]=&mhFormData[searchZipcode]=&mhFormData[searchZipcodeCircumcircle]=50&mhFormData[priceRangeStart]=1&mhFormData[categories][0]=45&page=" + i.to_s) {|url|
# check each line using regular expression
url.each_line { |line|
if line =~ /class=\"L1g\" onclick=\"s_objectID=\'ShowAuction_from_AuctionTitle\'\">([^<]+)<\/a><\/h3><\/li>/
# if regular expression matches then add to results
results += $1 + "\n"
end
}
}
end
ensure
# write to and close file
f.print results
f.close
end