views:

1303

answers:

2

I have a form that accepts text and is posted to the server.

If a user were to input a French character such as 'à', it will be read as 'Ã' by Classic ASP code and be stored as 'Ã' in a SQL Server 2005 database.

A similar affect happens to other accented characters. What's happening?

+2  A: 

It's a problem of character encoding. Apparently your server and database are configured with charsets Windows-1252 or ISO-8859-1, and you're receiving UTF-8 data.

You should check that your server sends a Content-Type or a Content-Encoding header with values ending with "charset=iso-8859-1".

I guess your server doesn't send the charset of the documents, and people with default configuration set to UTF-8 send UTF-8 characters which are stored as iso-8859-1 (or Windows-1252) in your database.

FWH
According to firebug, your suspicions are correct about iso-8859-1 but it seems to also say that it accepts utf-8. If my server accepts utf-8 and what you are saying that the client sends data in utf-8, then this shouldn't be a problem, right?Header value captured by FirebugAccept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7
burnt1ce
The header says it will *accept* those encodings (others may be rejected); you still need to see what you get and handle it appropriately. There is no automatic conversion between character sets. SS2005 will store the bits it gets, but the default code page in Windows (1252) is not going to work entirely correctly with ISO-8859-1, so Windows clients that don't get told the string is not CP1252 are going to have difficulty.
DaveE
For the characters in the question ISO-8859-1 and Windows-1252 use identical code points.
AnthonyWJones
If the text is encoded as ISO-8859-1 and you read it as windows-1252, there's no problem; in fact, web browsers tend do that on purpose. windows-1252 merely replaces the control characters in the 128..159 range (which are never used and wouldn't print anyway) with some more printing characters.
Alan Moore
A: 

See my answer here for the detail on what is likely happening.

Utlimately you need to ensure the encoding used in the form post matches the Response.CodePage of the receiving page. You can configure the actual character set sent by a form by placing the accept-charset attribute on the form element. The default accept-charset is the documents char-set.

What exactly do you have the ASP files codepages set to (both on the page containing the form and the page receiving the post)?

What are you setting the Response.CharSet value to in the form page?

AnthonyWJones