scrape

scraping a non RSS page to generate a feed

I want to scrape a page that regularly updates (adding new articles with exactly the same structure as previous ones) in order to generate an RSS feed. I can write the code to analyse the page easily, but how do I emulate a ping i.e. when the page updates how can my php script know? Does it have to be a cron job? (Probably a duplicate...

Stuck selecting classes or id's using PHP Simple HTML DOM Parser

Hi everyone, I'm trying to select either a class or an id using PHP Simple HTML DOM Parser with absolutely no luck. My example is very simple and seems to comply to the examples given in the manual(http://simplehtmldom.sourceforge.net/manual.htm) but it just wont work, it's driving me up the wall. Other example scripts given with simple ...

How to write a python script to search a website html for matching links

I am not too familiar with python and have to write a script to perform a host of functions. Basically the module i still need is how to check a website code for matching links provided beforehand. ...

How to I make the result of this a variable?

right now its set up to write to a file, but I want it to output the value to a variable. not sure how. from BeautifulSoup import BeautifulSoup import sys, re, urllib2 import codecs woof1 = urllib2.urlopen('someurl').read() woof_1 = BeautifulSoup(woof1) woof2 = urllib2.urlopen('someurl').read() woof_2 = BeautifulSoup(woof2) GE_DB = o...

How to scrape Google SERP based on copyright year?

Hi all: I know there must be ways to do this sort of things. I am not pro in RoR or Python, not even an expert in PHP. So my solution tends to be quite dumb: It uses a FireFox add-on called imarcos to scrape the target urls from Google SERP, and use PHP to store info into the database. At the very core of my workaround there lies a pro...

How do I scrape information off ASP.NET websites when paging and JavaScript links are being used?

I have been given a staff list which is supposed to be up to date but it doesn't match an intranet People Finder which is written in ASP.NET. As the information is sensitive I am not able to access the database the People Finder is using so the only way I can get at the information is by scraping the structure starting at the top brass ...

scrape google codeSEARCH

Q: Advice on programming tools/scripts to automate the extraction of all project files from a Google code search result? NOTE: The question is specifically for code search: http://www.google.com/codesearch and NOT "google code" which already has repositary access. Motivation: An open source project official site has long gone without...

How do I get data from the iTunes app store

I'm trying to scrape the entire iTunes App Store so that I can store it in a database for a project I'm working on. I'm having a hard time finding the best way to do this. I know there are ways to get specific information about price changes but I can't find anything that describes how to scrape the entire app store. Any additional inf...

Help with PHP simplehtmldom - Modifiying a form.

Ive gotten some great help here and I am so close to solving my problem that I can taste it. But I seem to be stuck. I need to scrape a simple form from a local webserver and only return the lines that match a users local email (i.e. onemyndseye@localhost). simplehtmldom makes easy work of extracting the correct form element: forea...

How can I scrape the current webpage with php/javascript?

I have made the following webpage for generating interactive todo lists: http://robert-kent.com/todo/todo.php Basically, the user pastes a numbered todo list and each task is placed into it's own div with a unique id. Users can add notes to the tasks (done with javascript) and can click the green check when the task is done to hide it. ...

How do websites like appcomments.com or androlib.com get data, particulary the reviews?

Do they just scrape or are there APIs? ...

Scrape images from html using PHP?

Hi, I'm very new to PHP. The first little project I've assigned to myself was to create a barebones mobile version of my wordpress site. My plan of attack was to use the website's RSS feed to attain the Title, Description and the Images and then I would be free to format them however I wished. So far I've had no trouble extracting the...

Scrape a website URL to get the path of an image

I'm hacking together a simple php script that will build a list of photo albums I have on my Facebook fan page. Facebook kindly offer the Graph API which gives me back a nice list of Albums, however they no longer provide the path of the default album image. I want to write a PHP script that loads an album url via curl and somehow grab...

Looking for Opensource OCR in Java

Am looking for an ocr library in java for scraping details from an image (IELTS Certificate image) http://pbrusilovskij.net/wp-content/uploads/2008/09/ielts-sprachzertifikatt.jpg Need to take out the details like Family Name,First Name etc from the image and put to database ...

Trying to scrape the entire content of a div.

I have this project i'm working on and id like to add a really small list of nearby places using facebooks places in an iframe featured from touch.facebook.com I can easily just use touch.facebook.com/#/places_friends.php but then that loads the headers the and the other navigation bars for like messges, events ect bars and i just want t...

Facebook Login with XMLHttp

How can I login to Facebook using XMLhttp? ...

Scan xml for links using ajax, apply links to another ajax call to scrape page and return data.

So, I've tried to look around before posting, but I can't seem to find an answer. My dilemma: I have an XML file that houses url links to various pages(all similar, diff products). By using jQuery and AJAX, I am able to pull the links from the XML file. I then want to be able to pass those links, in order, to another AJAX call that w...

scrape the content of an email attachment (CSV) from postfix

Hi, I am wondering how I could scrape the contents of an email attachment, specifically a csv file, sent to postfix. Would I have to log in somehow and scrape the contents? Should I use something like selenium to achieve this? ...

Is it possible to scrape content and generate an rss feed from a membership site?

Is it Possible to scrape content from a membership site so that i can create an Rss feed for import into my inbox? You see, I'm a member of several sites that provide casting calls for the performing arts industry (some paid, some free), but most of them don't provide Rss feeds of the newest casting call updates with means that I have t...

Applescript scrapping webpage

there is this awesome website called www.engrade.com. You can get your grades from the website when you log in from your various classes. Now, is it possible for an Applescript to parse engrade.com, log in as me, using my username and password, then parse to find my grade? can some one show an example of this? especially logging in. ...