tags:

views:

36

answers:

3

I'm creating a site that is going to have a wiki article database. Right now, there aren't any links into the wiki articles other than using the on site search engine.

How can I get the articles to be spidered by Google and the other Internet search engines? There are far too many articles in the database to directly include links to them all unless it's some type of automated site map.

On a lot of wikis I've seen a view a random page button; I've never seen the point of those myself as a user, are those there to help the search engine bots?

+1  A: 

Create an XML sitemap.

The Sitemaps protocol allows a webmaster to inform search engines about URLs on a website that are available for crawling.

Simon Brown
+1  A: 

Submit a Sitemap to google. Use Google Webmaster Tools to add your site and automatically generate a compressed sitemap.xml. This will tell Google about all the URLs on your site so it can crawl them. You can also monitor how often Google crawls your site and whether it encounters any errors doing so.

EDIT: If you're worried about the Sitemap being too large you can generate a sitemap with a single URL pointing to a master index page. That index page can be generated once a day or one demand and can be segmented however you like. It simply acts as the source for a Google crawl. For example, it could present a list of characters A, B, C, D, E, ... , Z that are links to pages that contain a listing of all pages starting with that character. It doesn't matter, however you want to do it to optimize your database resources.

They key is to get a sitemap.xml into Google's system so they know when and how often to crawl you. There are all sorts of intricacies to generating a sitemap. The above approach with one URL is crude, but it can work. Ideally you'd generate a sitemap with every URL in your system sorted by priority, but that isn't requried.

Look at the sitemap specification for more information. If you just want to seed Google, use the 1 URL approach to get going.

lrm
+1  A: 

You could write a PHP or ASP script that generates a sitemap, and redirect requests to /sitemap.xml to that script.

You can then submit the sitemap to Google using their Webmaster Tools.

robinjam
Is that how most sites do it? It seems to me that could really batter my database. I'm probably going to have a around 2 million articles in the database, and it seems pretty painful to generate a new sitemap dynamically. Or anything more frequent than daily or so.
WIlliam Jones
If you want to mimimize database access you could cache the sitemap and only update it every week or so. Basically when someone requests the sitemap, check if its last modified date is over a week old, and if it is then regenerate it. Otherwise serve the cached version. With modern DB systems 2 million articles can be handled quite easily.
robinjam