Hi!
I'm building a html screen scraper, which parses urls, and then compare those with a set of other urls.
The comparison is done with Uri.AbsoluteUri or Uri.Host.
My problem is that when i'm creating a new Uri (new Uri(url)), an UriFormatException is thrown when the url is to long, or contains to many slashes.
Since my predefined s...
I'm writing a multi-threaded Java web crawler. From what I understand of the web, when a user loads a web page the browser requests the first document (eg, index.html) and as it receives the HTML it will find other resources that need to be included (images, CSS, JS) and ask for those resources concurrently.
My crawler is only requestin...
Howdy,
I have a url I want to grab. I only want a short piece of content from it. The content in question is in a div that has a ID of sample.
<div id="sample">
Content
</div>
I can grab the file like so:
$url= file_get_contents('http://www.example.com/');
But how do I select just that sample div.
Any ideas?
...
Hello.
I need screen scraper application which will recognize text from the screen (and not use winapi to do this so source could be in image file). I found a lot of commercial solutions, but I need something open source or free.
I plan to include it in my C# project, so there should be some SDK available.
Thanks.
...
Hello.
I need to write standalone application which will "browse" external resource. Is there lib in C# which automatically handles cookies and supports JavaScript (through JS is not required I believe)? The main goal is to keep session alive and submitting forms so I could pass multistep registration process or "browse" web site after ...
OK. So I have a CMS written in Java that satisfies the needs of several hundred clients. But periodically, a client will need a specialized application: for example, a class registration database application.
So let's say that I don't feel like writing it or I'm too busy. So I outsource it to someone else but I don't want his/her code ...
I am a university student and it's time to buy textbooks again. This quarter there are over 20 books I need for classes. Normally this wouldn't be such a big deal, as I would just copy and paste the ISBNs into Amazon. The ISBNs, however, are converted into an image on my school's book site. All I want to do is get the ISBNs into a string...
I'm attempting to do some screen scraping however the html being returned is causing an error as there is no header (i think). Below is the code
public class xpath
{
private Document doc = null;
public xpath()
{
HttpClient httpclient = new DefaultHttpClient();
HttpGet httpget = new HttpGet("http://blah.com/blah.php?param1...
Hi,
I have a regular, nested HTML unordered list of links, and I'd like to scrape it with PHP and convert it to an array.
The original list looks something like this:
<ul>
<li><a href="http://someurl.com">First item</a>
<ul>
<li><a href="http://someotherurl.com/">Child of First Item</a></li>
<li><a href="http://som...
Hello I want to screen scrape a site like yelp to get phone numbers of italian restaurants.. I created a simple program to do just what I wanted but they blocked my servers ip
I am using php to do it. How can I get past the ip block?
I've heard about programs like screen-scraper, but I still haven't used it yet
What is the best way to...
Currently, I have a script that calls Firefox and runs a macro, but this is very buggy and rarely works the way I want it to.
...