tags:

views:

108

answers:

6

I want to extract a specific data from the website from its pages...

I dont want to get all the contents of a specific page but i need only some portion (may be data only inside a table or content_div) and i want to do it repeatedly along all the pages of the website..

How can i do that?

+1  A: 

Use curl to retreive the content and xPath to select the individual elements.

Be aware of copyright though.

Visage
For example, if i want to get images from a website matching certain category , how can i do?
kvijayhari
You could use google image search, and restrict the search to a site. It may or may not work, somehow google has to tag the pictures into categories. This is also a hint.
Paul
A: 

You need the php crawler. The key is to use string manipulatin functions such as strstr, strpos and substr.

Sarfraz
A: 

There are ways to do this. Just for fun I created a windows app that went through my account on a well know social network, looked into the correct places and logged the information into an xml file. This information would then be imported elsewhere. However, this sort of application can be used for motives I don't agree with so I never uploaded this.

I would recommend using RSS feeds to extract content.

Zeb
A: 

I think, you need to implement something like a spider. You can make an XMLHTTP request and get the content and then do a parsing.

Kangkan
A: 

"extracting content from other websites" is called screen scraping or web scraping.

simple html dom parser is the easiest way(I know) of doing it.

vsr
A: 

How about the imacros tool for firefox to do repetitive tasks? will that be used to get the data from a site which has a standard format of displaying data?

kvijayhari