views:

183

answers:

5

The user entered the word

éclair

into the search box.

Showing results 1 - 10 of about 140 for �air. 

Why does it show the weird question mark? I'm using Django to display it:

Showing results 1 - 10 of about 140 for {{query|safe}}
+7  A: 

It's an encoding problem. Most likely your form or the output page is not UTF-8 encoded.

This article is very good reading on the issue: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

You need to check the encoding of

  • the HTML page where the user input the word
  • the HTML page you are using to output the word
  • the multi-byte ability of the functions you use to work with the string (though that probably isn't a problem in Python)

If the search is going to apply to a data base, you will need to check the encoding of the database connection, as well as the encoding of your tables and columns.

Pekka
A: 

You are serving the page with the wrong character encoding (charset). Check that you are using the same encoding throughout all your application (for example UTF-8). This includes:

  • HTTP headers from web server (Content-Type: text/html;charset=utf-8)
  • Communication with database (i.e SET NAMES 'utf-8')
Emil Vikström
A: 

It would also be good to check your browser encoding setting.

Antony Hatchkins
+1  A: 

This is the result when you interpret data that is not encoded in UTF-8 as UTF-8 encoded.

The interpreter expects from the code point of your first character of the word éclair a multibyte encoded character with a length of three characters, consumes the next two characters but can’t decode it (probably invalid byte sequence). For this case the REPLACEMENT CHARACTER � (U+FFFD) is shown.

So in your case you just need to really encode your data with UTF-8.

Gumbo
A: 

I second the responses above. Some other things from the top of my head:

If you're using e.g. MySQL database, then it could be good to create your database using:

CREATE DATABASE x CHARACTER SET UTF8

You can also check this: http://docs.djangoproject.com/en/dev/ref/settings/#default-charset

Tomasz Zielinski