Hello Experts,
This is a two part question.
Q1: Can cURL based request 100% imitate a browser based request?
Q2: If yes, what all options should be set. If not what extra does the browser do that cannot bee imitated by cURL?
I have a website and I see thousands of request being made from a single IP in a very short time. These reque...
I've got a python web crawler and I want to distribute the download requests among many different proxy servers, probably running squid (though I'm open to alternatives). For example, it could work in a round-robin fashion, where request1 goes to proxy1, request2 to proxy2, and eventually looping back around. Any idea how to set this up?...
Hi Guys!
I need to scrape a simple webpage which has the following text:
Value=29
Time=128769
The values change frequently.
I want to extract the Value (29 in this case) and store it in a database. I want to scrape this page every 6 hours. I am not interested in displaying the value anywhere, I just am interested in the cron. Hope I ...
Hello,
I want to be able to manipulate the html of a given url. Something like html scraping. I know this can be done using curl or some scraping library.But i would like to know if it is possible to use jquery to make a get request to the url using ajax and retrieve the html of the url, and run jquery code on the html returned ?
Thank...
Hi,
I'm trying to scape back a set of links and content from a domain.
The Query in google would be
"site:www.newswebsite.com search_term"
I've seen some close stuff to getting this working, but I can't seem to quite get a search working across a whole website, and then filter by the search term.
Is this possible without a custom d...
i came across this .net library
http://www.webzinc.com/online/faq.aspx
however, i was wondering if there was a free alternative out there ?
...
Hello all,
I think I know the answer for this question allready, but just as curious I am, I'll ask it anyways.
I'm running a webshop which products come with a csv file. I can import all the objectsng without any trouble, the only thing is that images and thumbnail locations are not exported with the the database dump. (it's never per...
Hi,
From my windows application, i want to detect selected text in "Internet Explorer", Firefox and any other browser.
Do you know what piece of code should i use in order to achieve this?
Thanks,
The idea is not to search for a text in IE, but instead "capture the selected text" in IE. By the way not only IE, but any windows applica...
Hi,
I'm working on a "personal-can-it-work" sort of thing, and i have everything working great except for trying to parse some information from a .asp sourcefile into my Program.
This is the parsing code i have so far
// parse out the results
try
{
int snr_start = result.IndexOf("SNR");
...
I am wondering how to use Ruby to scrape a website, with the goal of launching a new browser with the destination page loaded. This is needed, because the destination page is not stateless, and requires a number of session parameters.
For an example flow, see how Kayak.com does this.
1. Go to Kayak.com, and search for a hotel in Chica...
Hello,
When sending a message on Facebook, if you include a URL it generally grabs a picture from the webpage and adds it at the bottom as a thumbnail. You then have the ability to select through a number of pictures featured on the site.
I can see how this could be built, but to save me the hassle I wonder if somebody has already don...
I'm developing an iPhone application where I wish to authenticate (login form) on a site and retrieve some information by doing some screen scraping. Is there an API available to do this or documentation how I could do this?
thanks
...
Hi,
I have webpage1.html which has a hyperlink whose href="some/javascript/function/outputLink()"
Now, using curl (or any other method in php) how do I deduce the hyperlink (of http:// format) from the javascript function() so that I can go to next page.
Thanks
...
I'm considering writing a simple web scraping application to extract information from a website that does not seem to specifically prohibit this.
I've checked for other alternatives (eg RSS, web service) to get this information, but there are none available at this stage.
Despite this I've also developed/maintained a few websites mys...
I have a website that I'm scraping that has a similar structure the following. I'd like to be able to grab the info out of the CData block.
I'm using BeautifulSoup to pull other info off the page, so if the solution can work with that, it would help keep my learning curve down as I'm a python novice.
Specifically, I want to get at the ...
I am trying to load and parse html in adobe air. The main purpose being to extract title, meta tags and links. I have been trying the HTMLLoader but I get all sort of errors, mainly javascript uncaught exceptions.
I also tried to load the html content directly (using URLLoader) and push the text into HTMLLoader (using loadString(...)) b...
I'm writing a spider that needs a load_url function that performs the following for me:
Retry the URL if there is a temporary error, without leaking exceptions.
Not leak memory or file handles
Use HTTP-KeepAlive for speed (optional)
URLGrabber looks great on the surface, but it has trouble. The first I hit a problem with too many fil...
Hi
I am familier with java programming language I like to extract the data from a website and store it to my database running on my machine.Is that possible in java.If so which API I should use. For example the are number of schools listed on a website How can I extract that data and store it to my database using java.
...
As I (semi) understand it, all on-screen text in any Windows application is drawn by the same drawtext functionality. It is possible to hook onto this method and view (or even change) every bit of text being drawn to the display.
How does OS X put text on the screen? Is there a similar way to hook into this API and view all text being...
I am using spidering a video site that expires content frequently. I am considering using scrapy to do my spidering, but am not sure how to delete expired items.
Strategies to detect if an item is expired are:
Spider the site's "delete.rss".
Every few days, try reloading the contents page and making sure it still works.
Spider every ...