webscraping

How to select a <td> by his bgcolor attribute using PHP simple html dom parser

I have to extract this particular HTML using PHP. Since I haven't any class or unique ID I tried to use his bgcolor attrib but without success... <td bgcolor="#F5EC97" width="154" valign="top" align="left" height="55"> <font face="Verdana, Arial, Helvetica, sans-serif" size="1"><b><font color="#CC6633">CITY</font></b><br>...

Trouble Scraping .HTM File

Hi All, I have just begun scraping basic text off web pages, and am currently using the HTMLAgilityPack C# library. I had some success with boxscores off rivals.yahoo.com (sports is my thing so why not scrape something interesting?) but I am stuck on NHL's game summary pages. I think this is kind of an interesting problem so I would p...

Need to retrieve content from specific news sources / blogs etc. Third party software, or build my own?

Hi guys, Looking for some guidance. I've got a requirement to get article content from specific sources that will be used for data analysis in a nutshell. So we've got to get the latest articles, and store them in our database for processing later on. I'm not sure really sure of the best approach. Our code for current news retrieval...

Loading a page that sometimes 'hangs' via PHP (Curl)

Hi, I'm trying to get information from a site by parsing/scraping it via PHP & Curl. But sometimes the current page doesn't finish loading, so the script runs without anything happening. It's a simple script like this... ... curl_setopt($curl, CURLOPT_URL, $url); $page = curl_exec($curl); ... Is there a way to simply retry the loa...

Is it easier to scrape data for a gae app in dev and upload it to prod or should you scrape in prod?

I have to run a scraping task to collect data for my App Engine (Java) app. I'm not sure which is best - scrape data in development mode and upload it to prod or scrape it while the app is running in production. Does it make a difference? Are there any difficulties with bringing large quantities of data from one environment to the ot...

Extract HTML of a Scraped Page Using PHP's DOM

Is it possible to create HTML output from the contents of an HTML snippet that has been extracted via PHP's DOM tools (e.g. $div = $dom->getElementsByTagName('table')->item(0);) such that the HTML created contains just the elements with specified tag name, and their descendants? Otherwise, are there perhaps any other ways to easily ext...

reuse Web::Scraper example-script: headstart of a novice

Hello and good evening dear Stackoverflow-friends, First of all.-This is a true place for learning. I am new to programming - and i am sure that this is a superb place for all novices! I am a beginner - and i learn the most in practical situations - real live situations...So here is one! I like Web::Scraper because it is a web scrape...

Create YQL for sites which do not have an API.

I plan to create a YQL open table for a site which does not have an XML/JSON based API. I plan to use HTML scrapping to get data from the site and return it to YQL. Is this possible and is any of the Open Tables similar in nature? ...