screen-scraping

Html / Script Scraping Google Map using Hpricot (Ruby On Rails)

Hi, I am having a problem Scraping Code i require to extract information for a Web MashUp i'm creating. Basically, I am trying to Scrap Code from: http://yellowpages.com.mt/Meranti-Ltd-In-Malta-Gozo;/Hair-Accessories;Hijjhkikke=Hiojhhfokje.aspx This is just one of the pages i will need to scrape and hence i cannot feed the program d...

scraping a form from an ssl site and using it on your own

If I screen scrape a form from a site secured with SSL, and put that form on my site (which is also secured by SSL), do I still get the benefits of SSL? ...

Screen Scraping in PHP with login

Looking around for a solution to this, I have found different methods. Some use regex, some use DOM scripting or something. I want to go to a site, log in, fill out a form and then check if the form sent. The logging in part is the part I can't find anything on. Anyone know of an easy way to do this? ...

Scraping hidden HTML (when visible = false) using Hpricot (Ruby on Rails)

Hi, I've come across an issue which unfortunately I can't seem to surpass, I'm also just a newborn to Ruby on rails unfortunately hence the number of questions I am attempting to scrape a webpage such as the following: http://www.yellowpages.com.mt/Malta/Grocers-Mini-Markets-Retail-In-Malta-Gozo.aspx I would like to scrape The Addres...

Do Kronos time/attendance clocks have an accessible API?

I'd like to extract a few pieces of system information from a Kronos clock programatically. I can scrape the web-based interface but there's got to be a cleaner interface. Does anyone have experience querying a Kronos 4500 clock for status info? ...

scrapy unknown scheduler middleware recursion problem

Dear everyone, I am using scrapy for scrapping I decided to write my own scheduler middleware to store some request to reduce the size of that within memory. Here is my code: def enqueue_request_into_scheduler(self, spider, request): print "ENQUEUE SCHEDULER with request %s" % str(request) scrapyengine.scheduler.enqueue_reques...

How do I save a web page, programatically?

I would like to save a web page programmatically. I don't mean merely save the HTML. I would also like automatically to store all associated files (images, CSS files, maybe embedded SWF, etc), and hopefully rewrite the links for local browsing. The intended usage is a personal bookmarks application, in which link content is cached in c...

Scraping from wsj.com or finance.yahoo.com

I want to display on a wordpress page the total volume of shares traded on the NYSE stock exchange the last 2 weeks that it's been open. What is the best way to go about doing this? ...

PHP scraping problem with Google "I'm feeling lucky"

I'm trying to scrape using Google "I'm Feeling Lucky" button. For small query like 'iteminfo.ca' it works, because it redirects me to iteminfo.ca. This is the query url: http://www.google.com/search?hl=en&source=hp&q=iteminfo.ca&btnI=I%27m+Feeling+Lucky But for the query like '061754020164 site:iteminfo.ca' it doesn't wo...

Streaming the desktop

I want to create a C++ cross-platform (Windows and MacOS X) application that sends the screen as a video stream to a server. The application is needed in the context of lecture capture. The end result will be a Flash based web page that plays back the lecture (presenter video and audio + slides/desktop). I am currently exploring a few ...

How do I strip all the html out of database records, than create an xml file?

Hello! Im trying to figure out a way to strip out all html tags from records in a database, then create xml? Any ideas? Built on asp.net 2.0 with sql server ...

CURL / screen scrape delivery tracking details from Canada Post

I need to obtain delivery tracking details from the Canada Post website, which does not offer an API. I've formulated a URL that when entered into a browser correctly returns the tracking information, but I can't get the request to function with CURL (it returns a 500 We're Sorry page). class cURL { var $headers; var $user_agent; v...

Clean HTML using C#

How do I repair malformed HTML using C#? A great answer would be an HTML Agility Pack sample! I'm scraping a site (for legitimate use). The site's HTML is OK but there are some annoying problems. One way I could go would be through regular expressions. I used Expression Web to analyse the problems and the regular expressions needed t...

Is there anyway to scrape flash in this format?

Hello, is it possible to scrape this applet http://www.text118118.com/livefeed.aspx Its not possible to do it traditionally as the text is within the applet however is it possible to do it with a macro. The feeds loops after 8 questions and the text stays highlighted? ...

scrape ASIN from amazon URL using javascript

Hi, Assuming I have an Amazon product URL like so http://www.amazon.com/Kindle-Wireless-Reading-Display-Generation/dp/B0015T963C/ref=amb_link_86123711_2?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=center-1&pf_rd_r=0AY9N5GXRYHCADJP5P0V&pf_rd_t=101&pf_rd_p=500528151&pf_rd_i=507846 How could I scrape just the ASIN using javascript...

Paid API to get incoming links to a website?

I'm working on an SEO app that (among other things) shows the number of incoming links to your site over time. There are a few ways to get this data. Scraping Google "link:yoursite.com" results gives you some (not all) of the links they know about, but they aren't too happy if you are doing lots of scraping. Similarly Yahoo has their ...

How to convert xhtml to xml after screen scraping in asp.net?

Hi, How to convert the retrieved xhtml string to xml file? Are there any FCL libraries to do this? ...

Best screen scraper, simple html dom or snoopy?

which one is better for screen scraping? simple html dom or snoopy ?? i use simple html dom and find it comfortable.. does snoopy has any advantage over simple html dom? my requirements : if i wanna scrape contents from a page(after login).. simple html dom is easy but it takes a lotta time to print the results.. ...

Win32.: How to scrape HTML without regular expressions?

A recent blog entry by a Jeff Atwood says that you should never parse HTML using regular expressions - yet doesn't give an alternative. i want to scrape search search results, extracting values: <div class="used_result_container"> ... ... <div class="vehicleInfo"> ... ... ...

Python HTML scraping

Hey, It's not really scraping, I'm just trying to find the URLs in a web page where the class has a specific value. For example: <a class="myClass" href="/url/7df028f508c4685ddf65987a0bd6f22e"> I want to get the href value. Any ideas on how to do this? Maybe regex? Could you post some example code? I'm guessing html scraping libs, su...