views:

167

answers:

4

I have a community site which has around 10,000 listings at the moment. I am adopting a new url strategy something like

example.com/products/category/some-product-name

As part of strategy, I am implementing a site map. Google already has a good index of my site, but the URLs will change. I use a php framework which accesses the DB for each product listing.

I am concerned about the perfomance effects of supplying 10,000 new URLs to google, should I be?

A possible solution I'm looking at is rendering my php-outputted pages to static HTML pages. I already have this functionality elsewhere on the site. That way, google would index 10,000 html pages. The beauty of this system is that if a user arrives via google to that HTML page, as soon as they start navigating around the site, they jump straight back into the PHP version.

My problem with this method is that I would have to append .html onto my nice clean URLs...

example.com/products/category/some-product-name.html

Am I going about this the wrong way?

Edit 1: I want to cut down on PHP and MySQL overhead. Creating the HTML pages is just a method of caching in preparation of a load spike as the search engines crawl those pages. Are there better ways?

A: 

Not an answer to your main question.

You dont have to append .html. You can leave the URLs as they are. If you cant find a better way to redirect to the html file (which does not have ot have an .html suffix), you can output it via PHP with readfile.

OIS
Thanks OIS. The main reason for the HTML approach was to have no php processing overhead on the server. Interesting approach though, I will keep it in mind.
ed209
+1  A: 

Unless I'm missing something, I think you don't need to worry about it. I'm assuming that your list of product names doesn't change all that often -- on a scale of a day or so, not every second. The Google site-map should be read in a second or less, and the crawler isn't going to crawl you instantly after you update. I'd try it without any complications and measure the effect before you break your neck optimizing.

Charlie Martin
product URL format won't change again. Product URL may change if the product name changes.
ed209
A: 

I am concerned about the perfomance effects of supplying 10,000 new URLs to google, should I be?

Performance effects on Google's servers? I wouldn't worry about it.

Performance effects on your own servers? I also wouldn't worry about it. I doubt you'll get much more traffic than you used to, you'll just get it sent to different URLs.

Max Lybbert
yes, performance on my server (I'm sure google could manage it!). I'm expecting an initail spike as the new pages are indexed - but you don't think this will happen? thanks :)
ed209
Your site is already getting crawled by Google, and is handling the load just fine. You shouldn't get any more traffic from Googlebot than you already do.
Max Lybbert
+1  A: 

You shouldnt be worried about 10000 new links, but you might want to analyze your current google traffic, to see how fast google would crawl them. Caching is always a good idea (See: Memcache, or even generate static files?).

For example, i have currently about 5 requests / second from googlebot, which would mean google would crawl those 10,000 pages in a good half hour, but, consider this:

  1. Redirect all existing links to new locations

    By doing this, you assure that links already indexed by google and other search engines are almost immediatelly rewritten. Current google rank is migrated to the new link (additional links start with score 0).

  2. Google Analytics

    We have noticed that google uses Analytics data to crawl pages, that it usually wouldn't find with normal crawling (javascript redirects, logged in user content links). Chances are, google would pick up on your url change very quickly, but see 1).

  3. Sitemap

    The rule of thumb for the sitemap files in our case is only to keep them updated with the latest content. Keeping 10,000 links, or even all of your links in there is pretty pointless. How will you update this file?


It's a love & hate relationship with me and Google crawler theese days, since most used links by users are pretty well cached, but the thing google crawler crawls usually are not. This is the reason google causes 6x the load in 1/6th the requests.

How will you update this file? cron job and the PHP framework (Seagull PHP) I use has sitemap functionality.
ed209