tags:

views:

119

answers:

4

Hi All,

I know that cURL will download a complete file.

What I really want is to take all links on a page and evaluate against my specific criteria, location of the link, etc and decide if I should grab that page and parse it for information.

More specifically, I want to find links that pertain to entertainment events and parse the data and store it in my MySQL database to populate a website for events in my area.

Would anyone have thoughts on how to accomplish?

-Jason

+2  A: 

I suggest you base your effort on an existing web crawler/indexer solution, rather than implement it yourself in code or with tools such as CURL.

See Lucene, for instance.

Assaf Lavie
how does one deploy this at GoDaddy on a shared server?
Toddly
And second, on a mac mini with a static IP?
Toddly
A: 

If all you want is an enumeration of links on a page, you can use the .NET WebBrowser and the DOM to do that. Digging up my code for this... I will get back to you.

Christopher Morley
A: 

You did not specify a programming language. Apache Droids may be the thing for you, if you are willing to customize it using Java. It is planned as a minimal crawler that you can customize for your specific needs.

Yuval F
A: 

Those solutions in the other answers sounds interesting, but I just did something similar and simple with C#/Mono and HTML Agility Pack.

kenny