Hi I have a website's home page that I am reading in using Curl and I need to grab the number of pages that the site has.
The information is in a div:-
<div class="pager">
<span class="page-numbers current">1</span>
<a href="/users?page=2" title="go to page 2"><span class="page-numbers">2</span></a>
<a href="/users?page=3" title="go to...
I have this class called SiteAsyncDownload.cs
Here's the code:
public class SiteAsyncDownloader
{
WebClient Client = new WebClient();
string SiteSource = null;
/// <summary>
/// Download asynchronously the source code of any site in string format.
/// </summary>
/// <param name="URL">Site URL to download.</para...
Is it possible, and if so how, do I use RAKE to scrape an ASP.Net Application (very simple, just 2 login forms) - Basically a spider bot/web crawler.
I only ask since I've heard this mentioned before and wonder what method I would use to go about doing it?
Help greatly appreciated.
...
I'm teaching myself Perl and I learn best by example. As such, I'm studying a simple Perl script that scrapes a specific blog and have found myself confused about a couple of the regex statements. The script looks for the following chunks of html:
<dt><a name="2004-10-25"><strong>October 25th</strong></a></dt>
<dd>
<p>
[Cont...
I'm trying to download the source HTML of a website using the WebClient.DownloadData() method.
My method is supposed to give me the source:
public string GetSite(string URL)
{
Uri Site = new Uri(URL);
byte[] lol = Client.DownloadData(Site);
SiteSource = Encoding.ASCII.GetString(lol);
return SiteSourc...
Hey guys. I'm wondering if there are any existing libraries in or accessible from Objective-C that would allow me to scrape pages formatted like this one. Specifically, all of the dates and all of the text next to each date. If not, what would be the best way to go about doing this? Regular expressions? I heard that NSString might alread...
I want to go through the children of an element and filter only the ones that are text or span, something like:
element.children.select {|child|
child.class == String || child.element_type == 'span'
}
but I can't find a way to test which type a certain element is. How do I test that? I'd like to know that regardless if there's a bet...
I need to scrape Form 10-K reports (i.e. annual reports of US companies) from SEC website for a project.
The trouble is, companies do not use the exact same format for filing this data. So for ex., real estate data for 2 different companies could be displayed as below
1st company
Property name State City Ownership Year Occu...
Is there such a software?
...
I'm looking for a PHP library that allows me to scrap webpages and takes care about all the cookies and prefilling the forms with the default values, that's what annoys me the most.
I'm tired of having to match every single input element with xpath and I would love if something better existed. I've come across phpQuery but the manual is...
I am working on a web application and we would like to capture the screen (either the applications current screen or the whole screen) and attach this to an e-mail that is automatically generated for error messages. I've seen a few posts about how to do this in a winform app, but nothing really on how to do it in a web app. Is it the s...
I want to match links like <a href="mailto:[email protected]">foo</a>, but this doesn't work only works in Nokogiri:
doc/'a[href ^="mailto:"]'
What's the right way of doing that? How do I do that with Hpricot?
...
Instead of using some third party app, I'd like to write an app in Ruby that when invoked, will capture the full screen and save it in c:\screenshot\snap000001.png
The graphic package is readily there, but how can you capture a region from the full screen so as to save it?
This program is to be invoked by some hot-key, such as settin...
Env.: Windows Media Encoding 9 SDK, C#.
I have a task to capture window video. Successfully captured desktop, as explained in C# code samples. Now trying capture window or area. C++ sample uses PropertyBag to specify area. Please help how specify region/window for C# capture code?
I use Windows Media Encoding, because it simplifies follo...
I need some help with screen scraping a site (http://website.com).
Lets say I'm trying to get an image inside
But when I pull it down, it's path is relative ie "image_large/imageName.jpg" (I'm going to pull down this image daily as it changes daily. It always begins with "images_large/.
How can I go in and prepend the url website.com...
BeautifulSoup newbe... Need help
Here is the code sample...
from mechanize import Browser
from BeautifulSoup import BeautifulSoup
mec = Browser()
#url1 = "http://www.wines.com/catalog/index.php?cPath=21"
url2 = "http://www.wines.com/catalog/product_info.php?products_id=4866"
page = mec.open(url2)
html = page.read()
soup = BeautifulSou...
I have some HTML like this:
<h4 class="box_header clearfix">
<span>
<a rel="dialog" href="http://www.google.com/?q=word">Search</a>
</span>
<small>
<span>
<a rel="dialog" href="http://www.google.com/?q=word">Search</a>
</span>
</h4>
I am trying to get the href here in Java using Selenium. I have tried the following:
...
If I have a directory on a remote web server that allows directory browsing, how would I go about to fetch all those files listed there from my other web server? I know I can use urllib2.urlopen to fetch individual files, but how would I get a list of all the files in that remote directory?
...
I wrote a perl script a while ago which logged into my online banking and emailed me my balance and a mini-statement every day. I found it very useful for keeping track of my finances. The only problem is that I wrote it just using perl and curl and it was quite complicated and hard to maintain. After a few instances of my bank changi...
hi, I have read a large deal of tutorials to help out and under Hpricot, the problem that i am finding out it is not scraping all the Html so to speak. I'll elaborate:
The website i am attempting to scrape html off is http://yellowpages.com.mt/Malta-Search/Radio-In-Malta-Gozo.aspx .
I require to obtain the links that are listed as resu...