views:

503

answers:

4

Today a lot of content on Internet is generated using JavaScript (specifically by background AJAX calls). I was wondering how web crawlers like Google handle them. Are they aware of JavaScript? Do they have a built-in JavaScript engine? Or do they simple ignore all JavaScript generated content in the page (I guess quite unlikely). Do people use specific techniques for getting their content indexed which would otherwise be available through background AJAX requests to a normal Internet user?

+8  A: 

Most of them don't handle Javascript in any way. (At least, all the major search engines' crawlers don't.)

This is why it's still important to have your site gracefully handle navigation without Javascript.

Ben S
+3  A: 

Most don't handle JavaScript at all. They ignore all JavaScript. It's best to have a sitemap that navigates around.

Some, (like Google's) are a little bit smarter. They're not going to run your javascript, but they will look at it, and might make some decisions based on that. They might hit your AJAX server or something.

If you want to ensure that certain pages get indexed. Take a look at your robots.txt file, and make a good sitemap

http://en.wikipedia.org/wiki/Robots.txt

http://en.wikipedia.org/wiki/Sitemap

McKay
Does the future look like web crawlers will become smarter and focus more on AJAX?
Shailesh Kumar
@Shailesh - I will say to that a definite maybe. They talk a little bit about the challenges of crawling Javascript or AJAX-enabled sites here: http://searchengineland.com/google-io-new-advances-in-the-searchability-of-javascript-and-flash-but-is-it-enough-19881
Steve Wortham
+1  A: 

Precisely what Ben S said. And anyone accessing your site with Lynx won't execute JavaScript either. If your site is intended for general public use, it should generally be usable without JavaScript.

Also, related: if there are pages that you would want a search engine to find, and which would normally arise only from JavaScript, you might consider generating static versions of them, reachable by a crawlable site map, where these static pages use JavaScript to load the current version when hit by a JavaScript-enabled browser (in case a human with a browser follows your site map). The search engine will see the static form of the page, and can index it.

Joe Mabel
+2  A: 

Crawlers doesn't parse Javascript to find out what it does.

They may be built to recognise some classic snippets like onchange="window.location.href=this.options[this.selectedIndex].value;" or onclick="window.location.href='blah.html';", but they don't bother with things like content fetched using AJAX. At least not yet, and content fetched like that will always be secondary anyway.

So, Javascript should be used only for additional functionality. The main content taht you want the crawlers to find should still be plain text in the page and regular links that the crawlers easily can follow.

Guffa