tags:

views:

258

answers:

8

I posted a source code on codeplex and to my surprise found that it appeared on google within 13 hours. Also when i made some changes to my account on codeplex those changes reflected on google within a matter of minutes. How did that happen ? Is there some extra importance that google pays to sites like Codeplex, Stackoverflow etc to make their results appear in the search results fast ? Are there some special steps i can take to make google crawl my site somewhat faster, if not this fast.

+4  A: 

Probably (and you have to be an insider to know...) if they find enough changes from crawl to crawl they narrow the window between crawling until - sites like popular blogs / news ect are being crawled every few min.

Dani
May be you are correct. Its a amazing thing to know that some sites get crawled that fast !
Bootcamp
They make money out of finding staff in the network. they have to pay attention to rapidily updated sites... or other search engine will take over.
Dani
@Bootcamp: Just thinking about google and it's architecture and speed is so incredible, you can't possibly visualize all of the work that goes into their software.
Crowe T. Robot
+3  A: 

For popular sites like stackoverflow.com the indexing occurs more often than normal, you could notice this by searching for a question that has been just asked.

Alberto Zaccagni
+7  A: 

Huh?

Ewan Todd
+7  A: 

Google prefers some sites over others. There is a lot of magic rules involved, in the case of CodePlex and Stackoverflow we can even assume that they had ben manually put on some whitelist. Then Google subscribes to the RSS feed of these sites and crawls them whenever there is a new RSS post.

Example: Posts on my blog are included in the index within minutes, but if I dont post for weeks, Google just passes by every week or so.

Adrian
A: 

Actually ... Popular sites have certain feeds that they share will google. The site updates these feeds and google updates its index when the feed changes. For other sites that rank well, seach engines crawl more often, provided there are changes. True its not public knowledge and even for the popular sites there are no guarantees about when newly published data appears in the index.

No Refunds No Returns
A: 

Real time search is one of the newest buzzwords and battlegrounds in the search engine wars. Google's announced/Bing's twitter integration are good examples of this new focus on super-fresh content.

Incorporating fresh content is a real technical challenge and priority for companies like Google since one has to crawl the documents, incorporate them into the index (which is spread across hundreds/thousands of machines), and then somehow determine if the new content is relevant for a given query. Remember, since we are indexing brand new documents and tweets that these things aren't going to have many inbound links which is the typical thing that boosts PageRank.

The best way to get Google/Yahoo/Bing to crawl your site more often is to have a site with frequently updated content that gets a decent amount of traffic. (All of these companies know how popular sites are and will devote more resources indexing sites like stackoverflow, nytimes, and amazon)

The other thing you can do is also make sure that your robots.txt isn't preventing spiders from crawling your site as much as you want and to make sure to submit a sitemap to google/bing-hoo so that they will have a list of your urls. But be careful what you wish for: http://blog.stackoverflow.com/2009/06/the-perfect-web-spider-storm/

nick
But Real-Time will also create more noise and information overload.
Rebol Tutorial
A: 

Well even my own blog appears in real time (it's pagerank 3 though) so it's not such a big deal I think :)

For example I just posted this and it appeared in Google at least 37 minutes ago (maybe it was in real-time as I didn't check before) http://www.google.com/search?q=rebol+cgi+hosting

Rebol Tutorial
+2  A: 

It is not well known but Google relies on pigeons to rank its pages. Some pages have particularly tasty corn, which attracts the pigeons' attentions much more frequently than other pages.

APC