I received the following query from a customer:
I am doing some research into character sets for future versions of our products.
Most of the sites we have built use html including a meta tag for iso-8859-1 - the Western European Latin 1 alphabet rather than UTF-8 unicode.
I have setup a page to play with this, and find that I can able to paste in various scripts to the rich text editor : chinese, punjabi, arabic, rumanian etc, with no problems and they display on the webpage ok (in Firefox/IE8).
I was a little surprised that my page was rendering these scripts correctly as they are not included in the Latin alphabet.
Reading further I see that 'It is a common misunderstanding that (the iso-8859-1 metatag) that is needed, it is not'
As 'when your browser makes the request to the server it tells the server what it wants and can handle. By the time the browser reads that code, the mimetype has already set the character set.'
So it seems the available character set is determined by the web server rather than the application/html.
Can you confirm if this is correct - does IIS 6 /7 support such character sets as you have it configured, and do you know of any problems with languages widely spoken in the UK being represented on our servers? (asian, east/west european, arabic etc).
The customer's server is Windows 2003 with the Region and Language Options configured as:
Regional Options Tab -
Standards and Formats: United Kingdom
Location: United Kingdom
Languages Tab -
Text Services and Input Languages - English (United Kingdom)
Advanced Tab -
Language for non-unicode programs: English (United Kingdom)
Code page conversion tables: All checked
(there's quite a few listed: Japanese, Korean, Arabic etc)
Do I need to do anything to the configuration of the server, or does the customer configure this through settings in their web.config
file and ensure that any database fields that might store non-latin characters are configured as unicode?