views:

208

answers:

2

I received the following query from a customer:

I am doing some research into character sets for future versions of our products.

Most of the sites we have built use html including a meta tag for iso-8859-1 - the Western European Latin 1 alphabet rather than UTF-8 unicode.

I have setup a page to play with this, and find that I can able to paste in various scripts to the rich text editor : chinese, punjabi, arabic, rumanian etc, with no problems and they display on the webpage ok (in Firefox/IE8).

I was a little surprised that my page was rendering these scripts correctly as they are not included in the Latin alphabet.

Reading further I see that 'It is a common misunderstanding that (the iso-8859-1 metatag) that is needed, it is not'

As 'when your browser makes the request to the server it tells the server what it wants and can handle. By the time the browser reads that code, the mimetype has already set the character set.'

So it seems the available character set is determined by the web server rather than the application/html.

Can you confirm if this is correct - does IIS 6 /7 support such character sets as you have it configured, and do you know of any problems with languages widely spoken in the UK being represented on our servers? (asian, east/west european, arabic etc).

The customer's server is Windows 2003 with the Region and Language Options configured as:

Regional Options Tab -

Standards and Formats: United Kingdom
Location: United Kingdom

Languages Tab -

Text Services and Input Languages - English (United Kingdom)

Advanced Tab -

Language for non-unicode programs: English (United Kingdom)
Code page conversion tables: All checked (there's quite a few listed: Japanese, Korean, Arabic etc)

Do I need to do anything to the configuration of the server, or does the customer configure this through settings in their web.config file and ensure that any database fields that might store non-latin characters are configured as unicode?

+3  A: 

ASP.NET serves responses in UTF-8 activated by default.

The encoding in specified in response headers so you shouldn't do anything special. However you may wish to add this tag to page header:

<meta http-equiv="Content-Type" content="text/html"; charset="utf-8">

You can configure this behavior in your web.config:

<configuration>
  <system.web>
    <globalization
      fileEncoding="utf-8"
      requestEncoding="utf-8"
      responseEncoding="utf-8"
      culture="en-US"
      uiCulture="de-DE"
    />
  </system.web>
</configuration>

Read here: How to: Select an Encoding for ASP.NET Web Page Globalization

Regarding database fields, if we're talking about SQL Server, the fields need to be nvarchar and nchar, not varchar/char.

Developer Art
+1  A: 

Agree with the answer from "Developer Art" (and voted up).

In this case though it is odd that things work even if the meta explicitly says iso-8859-1 (it should not).

The most likely explanation is that the web server is configured to report utf-8 in the Content-Type HTTP response header, which overrides the meta (as per standard).

Or that the browser detects the encoding and ignores the meta (IE tends to do that if there is enough text to do a reliable detection).

Mihai Nita