Im working on a system that requires the parsing of HTML documents under PHP.
my question is simply this:
What's the best method of parsing content for relative information.
When I parse a site I don't want random content I want to find relevant content such as blocks of text, images, links etc. but obviously I don't want header links or footer links.
So is there anyway you can advise me to look at.. tips / tricks are also welcome :)
Regards