I have a web page that has the following content (I've changed the URL in the src tag for privacy purposes, otherwise viewing the page source is identical):
<HTML>
<BODY>
<script type="text/javascript" src="http://localhost/servlet?publicKey=abcdefg12345678&amp"></script>
</BODY>
</HTML>
The resulting page displays an i...
Hi,
I'm trying to build a functionality for the users of my site to "screen scrap" websites of their choice and I'm looking for a toolkit that allows me to do that OR a third party site that allows other sites (like mine) to use their interface.
Let me try explaining it better. I have customers on my site. These customers are intereste...
Hi,
I would like to parse a webpage to can get the url of the video download. I use python and firebug but I cant get the url link.
Example:
The url where I have to get the video link is:
hxxp://www.rtve.es/mediateca/videos/20100125/saber-comer---salsa-verde-judiones-25-01-10/676590.shtml"
The video is
hxxp://www.rtve.es/resources/TE...
I need to scrape a site with python. I obtain the source html code with the urlib module, but I need to scrape also some html code that is generated by a javascript function (which is included in the html source). What this functions does "in" the site is that when you press a button it outputs some html code. How can I "press" this butt...
I know this isn't a specific programming question, but I really need to know how this can be done. How does a website like this:
http://www.dogpile.com/
display search results from google and other search engines on it's own page. The only way I can think about doing something like this is by using iframes but of course then the conten...
Looking for something similar to Mechanize for .NET C#.
If you don't know what Mechanize is.. http://search.cpan.org/dist/WWW-Mechanize/
I will maintain a list of suggestions here. Anything for browsing/posting/screen scraping (Other than WebRequest and WebBrowser Control).
Parsing
HTMLAgilityPack - http://www.codeplex.com/htmlagil...
In Python's mechanize.Browser module, when you submit a form the browser instance goes to that page. For this one request, I don't want that; I want it just to stay on the page it's currently on and give me the response in another object (for looping purposes). Anyone know a quick to do this?
EDIT:
Hmm, so I have this kind of working wi...
Suppose I want to embed the latest comic strip of one of my favorite webcomics into my site as a kind of promotion for it. The webcomic has the strip inside of a div with an id, so I figured I can just embed the div in my site, except that I couldn't find any code examples for how to do it (they all show how to embed flash or a whole web...
One of the arguments I make to my (Microbiology and Genetics) students is that "data" is/are messy, and Python can help with that (of course other languages can too). So here is a practical kind of web-based data-gathering exercise.
I notice that there a few people who answer Python-related questions among the users with the highest re...
Okay, here's what I'm trying to do. First I'll explain the end result I'm trying to achieve in case there are other ideas on how to do this.
I'm making a screen capture utility that takes a screen shot of only one window... my window (which I have total programmatic control over). However, this window may be much larger than the desktop...
Does Python have screen scraping libraries that offer JavaScript support?
I've been using pycurl for simple HTML requests, and Java's HtmlUnit for more complicated requests requiring JavaScript support.
Ideally I would like to be able to do everything from Python, but I haven't come across any libraries that would allow me to do it. Do...
I would need to make a simple program that logs with given credentials to certain website and then navigate to some element (link).
It is even possible (I mean this Authlogin thing)?
EDIT: SORRY - I am on my company machine and I cannot click on "Vote" or "Add comment" - the page says "Done, but with errors on page" (IE..). I do appreci...
I'm trying to click the Settings button on the home page, but when I do I get this page back:
#<WWW::Mechanize::Page
{url
#<URI::HTTP:0x1023c5fc0 URL:http://www.facebook.com/editaccount.php?ref=mb&drop>}
{meta}
{title nil}
{iframes}
{frames}
{links}
{forms}>
which is.. kinda empty! Is there some problems with these iframes ...
Hi, I want to scrape data from www.marktplaats.nl . I want to analyze the scraped description, price, date and views in Excel/Access.
I tried to scrape data with Ruby (nokogiri, scrapi) but nothing worked. (on other sites it worked well) The main problem is that for example selectorgadget and the add-on firebug (Firefox) don’t find any ...
I wanted to know how to scrape web pages that use AJAX to fetch content on the web page being rendered. Typically a HTTP GET for such pages will just fetch the HTML page with the JavaScript code embedded in it. But I want to know if it is possible to programmatically (preferably Java) query for such pages and simulate a web browser kind ...
I'd like to scrape a website to programmatically collect any external links within any flash elements on the page. I'd also like to collect any other text, if possible, but the links are the important part. Is this possible? A freeware library/service to accomplish this task would be preferable, but if none is, how can I accomplish the t...
Im trying to scrape data from IMDB, but naturally there are a lot of pages, and doing it in a serial fashion takes way too long. Even with I do multi-threaded CURL.
Is there a faster way of doing it?
Yes I know IMDb offers text files, but they dont offer everything, in any sane fashion.
...
I'd like to use the snoopy class, but I don't have the proper server permissions with my shared hosting to install it. Any easy to use alternatives?
I need to submit this POST data:
hash = $_POST['hash']
Submit = Submit
to this site:
http://milw0rm.com/cracker/info.php
And extract the output in the -::PASS column from
http://milw0...
Example: http://www.whois.net/whois/hotmail.com
When open in browser, output is shown.
When using curl call, it show nothing.
What's wrong? I want to return whole page result, then use regular expression to retrieve data at Expiration Date: 29-Mar-2015 00:00:00 line.
$postfields= null;
$postfields["noneed"] = "";
$queryurl= "http://...
I would like to monitor a process every second until it displays an expected "error" message.
how can i monitor something.exe and get notification via "screen scraping" the error message from something.exe all from my vb6 program ? is it possible to terminate or click the "okay" button from vb6 ?
is this sort of thing better suited for...