I am making a crawler parsing images on the Gantz manga at http://manga.bleachexile.com/gantz-chapter-1.html and on.
I had success until my crawler tried to open a image (on chapt 273):
bad URI(is not URI?): http://static.bleachexile.com/manga/gantz/273/Gantz[0273]_p001[Whatever-Illuminati].png
BUT this url is valid I guess, because I can open from Firefox.. Any thoughts?
Partial code:
img_link = nav.page.image_urls.find {|x| x.include?("manga/gantz")}
img_name = RAILS_ROOT+"/public/#{nome}/#{cap}/"+nome+((template).sub('::cap::', cap.to_s).sub('::pag::', i.to_s))
img = File.new( img_name, 'w' )
img.write( open(img_link) {|f| f.read} )
img.close