views:

92

answers:

6

Hi!

I have a website that must have javascript turned on so it can work

there is a < noscript> tag that have a meta to redirect the user to a page that alerts him about the disabled javascript...

I am wondering, is this a bad thing for search engine crawlers?
Because I send an e-mail to myself when someone doesn't have js so I can analyze if its necessary to rebuild the website for these people, but its 100% js activated and the only ones that doesn't have JS are searchengines crawlers... I guess google, yahoo etc doesn't take the meta refresh seriously when inside a < noscript> ?

Should I do something to check if they are bots and do not redirect them with meta?

Thanks,
Joe

+1  A: 

Here is what i would do:

  1. Make it so that the site somewhat works with javascript. if you use ajax all over the place, then make sure that the links have href set to the url you will ajax in. This might get your site to "somewhat" work without javascript.
  2. Add some .htaccess redirects for the bots. redirect them to some sane place where they can go to some links and index some stuff

Your site as it stands is probably very bad in terms of crawl-ability and SEO.

edit: ok, i see your problem. The crawlers get redirected away after seeing the stuff inside noscript.

how about this solution then:

if you have just one page that has the noscript, then you can add some rewrite rules to your apache config that will show a different version of the page to the bots, and this version will not have the noscript tag. for example:

RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]
RewriteCond %{HTTP_USER_AGENT} msnbot [OR]
RewriteCond %{HTTP_USER_AGENT} Slurp
RewriteRule ^.*$ nometa.html [L]

Also, what technologies are you using? are you using any server side languages, are you even using apache? i assumed you have apache+html but no server side language. If you do have something running server side, then this is easier.

mkoryak
you didn't understand my point. My website does have lots of ajax, but they links are ok to click, because I don't want people to lose ability to right click links or open them in a new tab etc. The problem is that crawlers are entering the < noscript> tag.
Jonathan
added to answer based on this.
mkoryak
A: 

Maybe you could make use of a headless browser and could serve the HTML snapshot of the page for those who don't have javascript enabled, including the crawlers.

http://code.google.com/web/ajaxcrawling/docs/getting-started.html

methode
+3  A: 

Instead of forcefully sending the user/bot why not just make text appear at the top of the page stating to enable javascript in order to use the site?

This will allow the bots to still read the page and follow the non-javascript links. This would end the problems with being redirected and there would be no need to serve bots a different page. Which would make you update multiple pages.

You may also want to take a look at google webmaster tools to see what all google is currently reading and improve based on that.

Example: disabling javascript on SO creates a red banner at the top that just states "Stack Overflow works best with JavaScript enabled" you could make that linkable to a page with more info if you feel its not enough.

corymathews
because I want to use javascript without any problem with my users. Everyone that came at my page without javascript, was developers to see how the website look like without it and by logging I know that they visited the site with js, then without, so a redirect is better suitted for these people and whoever comes there without js enable, at least thats what I am thinking right now
Jonathan
+1  A: 

Since <meta> isn't allowed in the <body> of a page, and <noscript> isn't legal in the <head> section, perhaps the bots are just giving up on a page where they hit bad HTML.

I suggest you simply use a <noscript> tag to encapsulate a warning message and a link that the user can click on if they do not have Javascript switched on.

Search engines can be prevented from following this link using the /robots.txt file, or by placing a

<meta name="ROBOTS" content="NOINDEX,NOFOLLOW" /> 

tag on the page which is linked to.

MZB
I don't think most modern robots will just 'give up' when they hit bad HTML. there is a huge percentage of sites online with terrible HTML, yet they are still crawled.
webdestroya
+1  A: 

You could have a page that says "You need javascript" on it. And then add on that page

<script>
window.location.href='/thejspage.html';
</script>

That way, people with javascript support will be easily sent to the valid page, and the spiders will just stay on that page, instead of saving a page where there is no javascript.

This should also help your SEO (as the search engines will find a page that regular users can see).

webdestroya
A: 

Have you tried <!--googleoff: all--> <noscript><meta redirect... /></noscript><!--googleon: all-->? Its not a complete solution but its worth a shot...

David Murdoch