views:

30

answers:

4

Hi, I have a web portal on Cricket News India. I have built my own customised CMS to update the news in the website.

My questions are

  1. Since the data is in the database, will it be a good practice to genarate the html pages out and save it in our server. Advantages that I observe, next time if a user comes server gets the generated html page, instead of fetching from the database. can you please suggest this practice has any loop holes? if yes please share
  2. My webportal has more than 10000 articles, but in google indexing it shows only 2010 pages, why? For example If i Type "Site:www.cricandcric.com" I get only 2010 pages. My question is that because of the CMS implementation.
  3. How to make the pages getting indexed to google.com
A: 

While pre-rendering each page and dumping that to a DB is certainly one way to do output caching, I certainly wouldn't recommend it.

I think you will see that even on a busy site, not all of the pages need such caching. Home pages and major sections are fine for this, but smaller pages may not need it, because their traffic is so much less.

If you do decide to do output caching, I would make that cache file upon request, rather than having a process that renders out everything all at once. There are many ways to do this. Check out this list of PHP accelerators on Wikipedia.

Brad
A: 
  1. It's not a big problem to save the actual content of the webpage as HTML on your server, as long as it's easy to edit it with a WYSIWYG-editor for example. But I'm talking about the content only, don't save the whole page in the database; no <head> and <title> et cetera.
  2. That your data is not indexed is probably because there are no links to it, or because your website is not popular enough (not enough people are linking to your website)
  3. The solution is to wait until people find your website popular and add links to it
Harmen
Yes I am not loking for completely adding the page, only the content section.
harigm
A: 
  1. You can just cache the content sections of the page using a caching mechanism. That way, the whole page doesn't need to be cached and you can still have dynamic content on the page. This is what I do with nearly all of my websites.

  2. Google doesn't index an unlimited amount of web pages. Only if your site has good page rank will it increase the number of pages indexed. So the omission is most likely a factor of your site's popularity and not your pages or scripting.

orvado
Hi, can you share your website if you dont mind.
harigm
A: 

1) it would be an okay practice if you don't want to change anything on your pages. however, in practice, this approach is not preferred because you can find yourself in a huge mess. Also this puts more stress on your database engine, so if you think it can cope with all the load, then go ahead. Exact gains are different from one code instance to another, so just experiment. you might want to use memcached or any other tools for caching your pages and data in RAM. 2) google bot only crawls through the pages that are accessible to it (through your internal links on the website, or through external links). The search results that you are getting on the results page are approximated for performance reasons, and in practice it could be that all of your pages have been indexed, and if you try searching for each one you'll get a result back. 3) google bot can though an external link or you can it give a hint here: http://www.google.com/addurl/?continue=/addurl

guruslan
I have added the site to the google domain, for few keywrods I have made that on top of the ranking
harigm