views:

27

answers:

1

I have a website that has multiple languages. The way this is set up now is that it looks at the http accept language and redirect the user to the specific language, or uses a default language when none is found.

The problem that I am facing is that web crawlers can't index the root page, because it gives a 302 redirect. http://www.mydomain.com gets redirected to http://www.mydomain.com/nl/

The only way the website can be indexed is if I supply a sitemap for the whole website, including the languages. I have done that but I have not seen any indexed pages for weeks now.

So my question is: Will it be better to just have the website work in a default language.

To have the website in your own language you have to select the language when you are in the root website itself.

+3  A: 

The problem that I am facing is that web crawlers can't index the root page

I haven't seen this problem before. Webcrawlers certainly follows 302 redirects. Any chance that you're (unawarely) blocking visitors without an Accept-Language header like webcrawlers?

So my question is: Will it be better to just have the website work in a default language. To have the website in your own language you have to select the language when you are in the root website itself.

I'd rather prefer the Accept-Language header and display the language which has the closest match with the in the header specified language(s) as per the HTTP 1.1 Specification. If none is specified, I'd display English as default language or at least the language which has the biggest coverage among the (expected) website audience.


I see in your question history that you're a PHP developer, so here's an useful snippet to determine the closest match based on the Accept-Language header as per the HTTP 1.1 specification:

function get_language($available_languages, $preferred_language = 'auto') {
    preg_match_all('/([[:alpha:]]{1,8})(-([[:alpha:]|-]{1,8}))?(\s*;\s*q\s*=\s*(1\.0{0,3}|0\.\d{0,3}))?\s*(,|$)/i', 
        $preferred_language == 'auto' ? $_SERVER['HTTP_ACCEPT_LANGUAGE'] : $preferred_language, $languages, PREG_SET_ORDER);

    $preferred_language = $available_languages[0]; // Set default for the case no match is found.
    $best_qvalue = 0;

    foreach ($languages as $language_items) {
        $language_prefix = strtolower($language_items[1]);
        $language = $language_prefix . (!empty($language_items[3]) ? '-' . strtolower($language_items[3]) : '');
        $qvalue = !empty($language_items[5]) ? floatval($language_items[5]) : 1.0;

        if (in_array($language, $available_languages) && ($qvalue > $best_qvalue)) {
            $preferred_language = $language;
            $best_qvalue = $qvalue;
        } else if (in_array($language_prefix, $available_languages) && (($qvalue*0.9) > $best_qvalue)) {
            $preferred_language = $language_prefix;
            $best_qvalue = $qvalue * 0.9;
        }
    }

    return $preferred_language;
}

(the above is actually a rewrite/finetune of an example found somewhere at php.net)

It can be used as follows:

$available_languages = array(
    'en' => 'English',
    'de' => 'Deutsch',
    'nl' => 'Nederlands'
);

$requested_language = get_it_somehow_from_URL() ?: 'auto';
$current_language = get_language(array_keys($languages), $requested_language);

if ($requested_language != $current_language) {
    // Unknown language.
    header('Location: /' . $current_language . '/' . $requested_page);
    exit;
}
BalusC
Hello, you are correct I am a PHP developer. I also have a system build into the website that determines the language based on the accept language, when there is no accept language found, english is choosen. This system works fine. Only in your example here above you do not redirect. Your url stays http://url.com even tho the language changes. I redirect from http://url.com to http://url.com/nl. I think the problem sits there. http://bit.ly/hDWWQ , this is a crawler test, it returns the 302 code, but it does not follow it. Or am I mistaken?
Saif Bechan
That code was just a kickoff example. I expanded it anyway. This code is in use at under each [this site](http://www.google.com/?q=site%3Aofficeparkscharloo.com).
BalusC
Ah yes I see it is indexed, and it does give the same 302 redirect error on the crawler test. I have another question, do you use the google webmasters tools? I want to know what you get when you do a 'view as googlebot' test. Do you get the content of the website is it blank.
Saif Bechan
Webcrawlers follows redirects. Another thought, aren't you redirecting languageless webcrawlers to the same index page in an infinite loop? Check webserver access logs. Btw, the link in previous comment is broken (typo) .. [this is right](http://www.google.com/search?q=site:officeparkscharloo.com)
BalusC