I have a, what seems to be, rather common scenario I'm trying to work through.
I have a site that accepts input through two different text fields. If the input is malformed or invalid, I receive a Javascript pop-up notification.
I will not always receive one, but I should in the event of (like I said earlier) malformed data, or when a...
I wish there was a central, fully customizable, open source, universal login system that allowed you to login and manage all of your online accounts (maybe there is?)...
I just found RPXNow today after starting to build a Sinatra app to login to Google, Facebook, Twitter, Amazon, OpenID, and EventBrite, and it looks like it might save s...
I want to scrap mail ids from a page and have got a script which is running in most sites. But in some sites they are loading mail ids with javascript so curl cant able to load the contents of the page with mail ids. i mean here
http://www.everynation.org/churches/church-directory/africa/zambia
Here they are loading mailids with java...
I saw this video, and I am really curious how it was performed. Does anyone have any ideas? My intuition is that he scraped pixels from the screen (one per 'box'), and then fed that into some program to determine the next move.
Is scraping pixel-by-pixel the way to do this, or is there a better way? I am looking to do something similar ...
We are using a web scraper and have it set up to have a sleep function which has a random function set up (so that it isn't the same time between each scrape) but we are still getting blocked from Yahoo after 20-30 requests.
Does any one know if there is a limit (i.e: 20 requests per minutes, 200 an hour) Right now our average between ...
From an iTunes page, like http://itunes.apple.com/us/podcast/this-week-in-tech-mp3-edition/id73329404, is there a way to extract the corresponding feed address? In this case it would be http://leoville.tv/podcasts/twit.xml.
I know that if you open on iTunes you can extract it manually, but I want to do it programmatically. There's a lin...
How can I prevent my asp.net 3.5 website from being screen scraped by my competitor?
Ideally, I want to ensure that no webbots or screenscrapers can extract data from my website.
Is there a way to detect that there is a webbot or screen scraper running ?
...
hai how to create the scrap or inbox like in orkut scraping in asp.net with c# coding? any one guide me i'm strucked in my project!!!!!!!!
...
I have a html file with xml snipped embedded, the source code is pasted in the pastbin:
http://pastebin.com/Hy0QaWk8
my task is to extract the text enclosed in the first textarea, which is a xml snippet, from the html. Without any change to the original snippet. I'm able to get it by using the BeautifulSoup, but it changes all the tag ...
I want to set up an app which can get the information from a particular web page. Then i display the value which got from that page to the iPhone user.
Detail:In the webpage on server ,there is the schedule for bus time. If the user input origin and terminus then show the user the time information(list on webpage) in a label. That's all...
So I'm scraping a site that I have access to via HTTPS, I can login and start the process but each time I hit a new page (URL) the cookie Session Id changes. How do I keep the logged in Cookie Session Id?
#!/usr/bin/perl -w
use strict;
use warnings;
use WWW::Mechanize;
use HTTP::Cookies;
use LWP::Debug qw(+);
use HTTP::Request;
use LWP:...
What if I only need to download the page if it has not changed since the last download?
What is the best way? can I get the size of the page first, then compare the decide if it has changed, if so, I ask for download else skip?
I plan to use (python) mechanize.
...
Hi,
I am building a web scraping application. It should scrape a complex web site with concurrent HttpWebRequests from a single host to a single target web server.
The application should run on Windows server 2008.
One single HttpWebRequest for data could take from 1 minute to 4 minutes to complete (because of long running db operatio...
I wrote a scraper using python a while back, and it worked fine in the command line. I have made a GUI for the application now, but I am having trouble with one issue. When I attempt to update text inside the gui (e.g. 'fetching URL 12/50'), I am unable seeing as the function within the scraper is grabbing 100+ links. Also when going ...
Is there a good guide or tutorial for people who need to programmatically interact with dynamic websites? There's been a rash of Perl questions about that lately, and I haven't found a good resource to point people toward. I'm asking not because I need one but because I don't want to waste my time writing it if it already exists. Althoug...
This is the body of the selector that is specified in NSThread +detachNewThreadSelector:(SEL)aSelector toTarget:(id)aTarget withObject:(id)anArgument
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
while (doIt)
{
if (doItForSure)
{
NSLog(@"checking");
doItForSure = NO;
...
I'm setting up to cron a web scraping job, using xvfb, firefox, and watir on my Mac OS X.
In testing the script so far, firefox pops up visibly on the local desktop, the watir script executes, and then firefox exits (I quit firefox in my script).
I'd like to set the xvfb DISPLAY such that firefox will run, but won't be seen on the loca...
This deals with the (diverse) flash viewers that let you zoom in on images on websites. I’m trying to extract the large, zoomed-in image rendered by the viewer. In many cases the images seem to be dynamically called by the viewer, or are created only for the part of the image you are zooming on at that point. Ideally, the approach here...
I need cod in C#.
Look, i am trying to post the search.aspx page which contains Asp.Net grid. When grid is rendered it loads very first page on the screen and then there are number of pages in the grid header.
I scrap first page, and now i want to move on to the next page. All this is being done using following code:
HttpWebRequest my...
Hi, We want to add a facebook fan page photo competition to our fan page. The meaning is that ppl can upload photo's and others can like them. The person with the most likes on his photo wins a price.
Now i was wondering if anyone knows a good idea on how to get a snapshot of all the photo's on a given moment. So that when we want to s...