views:

98

answers:

4

We have a flash "gateway" page displayed to users (only once) before they enter our corporate website. This flash page is default.aspx, served as default page by the server.

The issue is that Yahoo and other SE pick up the text from the body of the page, which happens to be the Javascript/Flash requirements warning in the noscript tag.

Is there a way to use robots.txt so that all search engines see home.aspx as the default page?

I am not stuck on using robots.txt to do this, so if there's another approach please recommend that instead.

I am aware of the issues with "gateway" pages -- this was a request coming from "management" despite my recommendation against the practice. Please offer solutions other than removing the gateway page.

Thank you!

+2  A: 

What if "default.aspx" looked at the useragent and sent a redirect to "home.aspx" if the useragent is a robot?

David
Is there a reliable way to distinguish search engine spiders by the user agent field?
aaandre
This site collects the user agents of different robots: http://www.user-agents.org/
David
No solution based on User-Agent can be considered as reliable but in your case it is acceptable. Especially if you're targeted against specific search engines.
Michał Górny
The big problem with this solution is that it violates the webmaster guidelines for all the major search engines (Google, Yahoo, Microsoft). So going this route could lead to a penalty or an outright ban - which is not going to help achieve your goals.
Nathan Buggia
+3  A: 

How about displaying the ‘gateway’ as an overlay on main page using JavaScript?

You can use document.cookie to make it appear only once or (even better) some server-side magic (e.g. add appropriate <script/> once per session / cookie).

With that solution, you may even make it appear on first visit to the service independently of which page is accessed (if ‘managements’ wants that).

Michał Górny
We want the gateway page to appear only when cookie-less users type the domain name directly. The overlay idea may work but given the complexity of the implementation (a pretty heavy cms) this would be a costly solution.
aaandre
A: 

You should force a permanent redirect ( HTTP status code 301 ), so the search engines won't index the main page. Apache, nginx and lighttpd can do that for you, I don't know for IIS. Here's is an example with an apache configuration ( in a virtualhost section or .htaccess for example ) :

BrowserMatch Googlebot searchengine=1
RewriteEngine on
RewriteCond %{ENV:searchengine} =1
RewriteRule ^/$ /myrealhomepage/  [R=301,L]

Indeed, you need to add every other user agent corresponding to bots, like msnbot ( apparently still used by bing ), and any other that you find relevant.

I think it's better to avoid to bury that kind of fine tuning inside your site's webpages, and letting the webserver handle that will consume less resources.

vincent
Thank you, this would be closest to what I'd like to do. I am dealing with IIS. Is there a place where I can see all bots user agents listed?
aaandre
David posted a link to user-agents.org , which seems to be a comprehensive list. Since there is no reliable mechanism to detect all present and future robots, you should focus on just a few, I'd say google, yahoo, msn/bing, which represent together 99% of the market
vincent
+1  A: 

The Robots Exclusion Protocol does a lot of things, but it doesn't have a provision for specifying your site's home page. (more information: http://janeandrobot.com/library/managing-robots-access-to-your-website).

There are two potential solutions to this that will work for both search engines and for customers of your website. The best option would be to simply add some text within the tag describing the message of the Flash animation and include a link to your home page. This way search engines will be able to understand what the page is about, and have a link to your home page. This is also a good solution for your real customers who might be visiting from an iPhone and not have the option to install Flash. You'll want to provide these folks with a mechanism to get to your home page and some context for the page they landed on.

The second option, would be to implement Michał Górny's suggestion above, turning the gateway into a javascript overlay on your true home page.

You'll also want to make sure that you've created a good title tag and meta description tag for your page. I see that many flash pages often forget this crucial step.

What you don't want to do is to detect the search engines bots and provide a different experience for them, than you provide for your customers. This would violate the webmaster guidelines for google, microsoft and yahoo, and likely trip automated quality checks by the search engines and possibly result in some sort of penalty.

Nathan Buggia (Technical Evangelist for Microsoft Bing)

nathan buggia