views:

189

answers:

6

We're implementing a blog for a site which supports six different languages and five of them have non-Latin characters in their alphabets. We are not sure whether we should have them encoded (that is what we're doing at the moment)

Létání s potravinami: Co je dovoleno? becomes l%c3%a9t%c3%a1n%c3%ad-s-potravinami-co-je-dovoleno and the browser displays it as létání-s-potravinami-co-je-dovoleno.

or if we should replace them with their Latin "counterparts" (similar looking letters)

Létání s potravinami: Co je dovoleno? becomes letani-s-potravinami-co-je-dovoleno.

I can't find a definitive answer as to what's better from SEO perspective? Search engine optimization is very important for us. Which approach would you suggest?

A: 

well i suggest you to replace them with there latin counterparts because it's user friendly and your website will be accessible on every single computer (as the keyboard changes from computer to another but all of them have latins letters), but for SEO perspective i don't think it's gonna be a problem.

aleo
+3  A: 

Most of the times, search engines deal with latin counterparts good, although sometimes, results for i.e. "létání" and "letani" slightly differ.

So, in terms of SEO, almost no harm is done - once your site has good content, good markup and all that other stuff, your site won't suffer from having latin URLs.

You don't always know what combination of system browser and plugins users use, so make them as easy as possible - all websites use standard latin in URLs, because non-latin symbols can choke anything from server through browser to any plugin that might break user's experience.

And I can't stress this enough; Users before SEO!

Adam Kiss
I saw a few sites that encode characters in URLs (Sony PlayStation forums, for example), but yesterday we found out something interesting. I copied a URL (from the address bar) that had Polish characters in it and pasted it in IM to a person who runs Windows with US locale. When I paste the URL in my browser - it shows up with Polish letters, but if he pastes it into his US browser, he still gets garbage (encoded characters). We finally decided that for SEO purposes we should have good blog post titles in the actual language, but for slugs we should use Latin only.
Pawel Krakowiak
A: 

"what's better from SEO perspective"

Who's your audience? Americans who think all those extra letters are a mistake?

Or folks who read (and search) for "non-ASCII" letters because those non-ASCII letters are part of their language?

SEO is a bad thing to chase. Complete, correct, consistent and usable is what you what to build first.

S.Lott
The site is for the European market and features the native languages of the users. English is there to cover people who don't know any of those other languages (foreigners for example).
Pawel Krakowiak
A: 

Pawel, first of all, you should decide whether you're going to optimize for global Google (google.com) or Polish one.

purpler
It's not only Polish, we will have users coming from a few European countries and they all have their own Google site.
Pawel Krakowiak
A: 

In accordance with the URI specification, RFC 3986, only 7bit ASCII characters are allowed, and characters among those mentioned in the specification as control characters must be properly escaped. If you want to represent other characters or URI control characters then you should be using IRI, RFC 3987. Keep in mind that HTTP is not compatible with IRI, however.

When in doubt RTFM.

A: 

Another issue is that there are Unicode code points whose glyphs look very much alike in most fonts, which is absolutely ideal for phishers. Stick to ASCII and the glyphs are visibly different when the characters are.

David Thornley
Not true, many ASCII glyphs are hard to distinguish: O, 0, I, l, |, .com, .corn, etc.
Dour High Arch