views:

846

answers:

6

Can search engines such as Google index JavaScript generated web pages? When you right click and select view source in a page that is generated by JavaScript (e.g using GWT) you do not see the dynamically generated HTML. I suppose that if a search engine also cannot see the generated HTML then there is not much to index, right?

+5  A: 

if a search engine also cannot see the generated HTML then there is not much to index

That about sums it up. Technically nothing is stopping a search engine from implementing a javascript engine for their bot/spider, but it's just not normally done. They could, but they won't.

On the other hand, you can sniff a search engine's user agent and serve it something readable. But search engines don't usually like this and will penalize you pretty severely if they detect differences with what you send to a normal browser.

Joel Coehoorn
+1  A: 

Google is working on executing simple Javascript to uncover some content - but they certainly dont execute full scripts. If you are worried about SEO, then you need to consider providing static versions of pages.

Adam Pope
Any references/links on google working on implementing it?
trex279
Look for matt cutts' webmaster videos on YouTube. There was one on JavaScript.
Adam Pope
+3  A: 

A good rule of thumb: If you can see it in Lynx, it can indexed by Google.

Lynx is an excellent test because it also gives you an idea of how screen readers for the blind will see your page as well.

Diodeus
A: 

.... or you can use the <noscript> tag to redirect the search engine bot. But you must create content to this type of reader. :(

Rich interfaces is so complicated to 'link' like html. How the bot will track a onclick event in a Panel? Or if a component render show the content in severals panels?

What you think?

+3  A: 

Your suspicion is correct - JS-generated content cannot be relied on to be visible to search bots. It also can't be seen by anyone with JS turned off - and, last time I added some tests to a site I was working on (which was a large, mainstream-audience site, with hundreds of thousands of unique vistors per month), approx 10% of users were not running Javascript in any form. That includes search bots, PC browsers with JS disabled, many mobiles, blind people using screenreaders... etc etc.

This is why content generated via JS (with no fallback option) is a Really Bad Idea.

Back to basics. First, create your site using bare-bones (X)HTML, on REST-like principles (at least to the extent of requiring POST requests for state changes). Simple semantic markup, and forget about CSS and Javascript.

Step one is to get that right, and have your entire site (or as much of it as makes sense) working nicely this way for search bots and Lynx-like user agents.

Then add a visual layer: CSS/graphics/media for visual polish, but don't significantly change your original (X)HTML markup; allow the original text-only site to stay intact and functioning. Keep your markup clean!

Third is to add a behavioural layer: Javascript (Ajax). Offer things that make the experience faster, smoother, nicer for users/browsers with Ajax-capable JS... but only those users. Users without Javascript are still welcome; and so are search bots, the visually impaired, many mobiles, etc.

This is called progressive enhancement in web design circles. Do it this way and your site works, in some reasonable form, for everyone.

mattandrews
That advice only applies if the site is a content based site. If the site is an interaction rich one, like google maps, you would not do it the way this answer suggested.
Chii
Have to say I don't fully agree there; there's no intrinsic reason why user agents without Ajax (search bots, most mobiles, etc) should be denied content if it's useful and relevant. The key thing is the mode of interaction: if the interaction has to be multi-dimensional and continuous (like a FPS game), then, sure, text-only makes no sense. But Google Maps *could* usefully be implemented in a standard-HTML version. It's OK not to worry about search bots for logged-in-only sections, or for using only Ajax if you're sure your users all have Ajax (e.g. on an intranet), but they're exceptions.
mattandrews
+1  A: 

There are a few ways to handle this in GWT, this is a great discussion on the subject. Seems like the best option is to serve up static SEO content when the user-agent is a bot, as long as the SEO content is identical to what is served via the GWT route. This can be a lot of work, but if you really want a fully rich GWT app that is optimized for search engines it may be worth it.