I have a website about to launch with a forum facility. Should I put each thread into my sitemap.xml file, or will Google just find the links to each thread via the forum itself? Don't worry: it doesn't require registration to read the forum.

If yes, how best to keep it up to date? Doing it by hand is obviously not an option for that amount of data. One way I've considered is writing an apache mod_rewrite which redirects requests to sitemap.xml to go to sitemap.php which would then generate the entire thing on the fly. The other way I can think of is to set up a cron job to generate the map and dump it to file once a day. Are these good options, what else could I do?

+4  A: 

If you have a good structure, Google will probably find your threads, but it's always good to put all of your pages into a XML sitemap as Google will, most of the time, crawl your site more frequent.

Regarding the method of generation, I would suggest a PHP-script that just fetches all thread, and then cache it for X minutes, depending on the server load.


Depends on the amount of sites. Too many entries in the sitemap.xml can be counter-productive.

If you have a huge forum, it might be better to do a selection and build your sitemap.xml dynamically. Rewriting .xml into .php and having a script deciding what to list in your sitemap sounds like a good way to handle it.

Threads with high activity or those, which contain desired keywords should be listed, others (e.g. new threads, empty threads or threads which do not provide important content) could be left out. For a sitemap.xml sometimes less is more.