views:

23

answers:

1

I am putting together a Forum Stats website and I need to find the number of Active Members on several forums.

Many have "Total members" listed but that doesn't help me.

I'm considering "Active" as someone that has posted at least 5 times within 6 the last months.

I'm really perplexed as to how this might be done. Any suggestions?

+2  A: 

If you don't have access to the database, scraping the html pages, following links, and determining post-date & post-user from the HTML itself seems to only way to gp. (HTTPRequest or cURL for fetching, in combination with DOMDocument / DOMXpath for reliable HTML parsing & finding explicit nodes), storing it in your own database. All in all, depending on the exact HTML layout of the forums, not exactly complicated, but a lot of work, and possibly work you have to repeat again and again with small variations for each different forum.

If the forums have RSS feeds or other means of getting more structured content / data the amount of work needed could be greatly reduced.

Wrikken
Do you know of any tutorials or additional information on how one would go about starting this? I have a good understanding of php but this would most likely be the hardest thing I've done to date.
Castgame
Essentially, you're looking for a crawler, with the capability to parse specific pages with a custom function. Googling for 'php crawling' gives a wealth of tutorials and even a load of already built classes, you might want to try some of them. For the actual parsing of a page: using Firebug in Firefox its terribly easy to get an XPath for the nodes you require (or make your own paths, [this is a nice XPath turorial](http://zvon.org/comp/r/tut-XPath_1.html) )
Wrikken
Thank you, you were very helpful. I'm using PHPCrawler and added some custom DOMDocument and DOMXpath code to filter out the needed data. Then it's stored right in a mysql database for easy access. It's tidy!
Castgame