views:

1113

answers:

5

I have a html form which goes of to do all sorts of strange back end things. This works fine in firefox. and in most cases it works fine in IE

However the (pound sterling) £ sign causes problems, and seems to get munged in the submit.

The forms is something like this

 <form action="*MyFormAction*"  accept-charset="UTF-8" method="post">

I think I have seen this problem before but can't remember the solution.

edit, the euro symbol € works fine

edit 2, In fact if I put the € symbol with a £ symbol it also works fine. Looking at the problem if I use characters which are not in the extended part of iso8859-1 it works ok. If I use extended charicters from iso8859-1 they get munged. So how do I make IE use the character set that the accept-charset says it should?

A: 

How is the £ submitted? If it's in an input box for a price don't submit it, only allow numbers to be submitted and add the £ when you display the price again. Or add the currency symbol in the backend script.

Gary Willoughby
It is free text for commetns.
Jeremy French
ah, sounds like an encoding problem then.
Gary Willoughby
A: 

I am no sure if this will help (read the entire article at http://fyneworks.blogspot.com/2008/06/british-pound-sign-encoding-revisited.html)

Excerpt:

THE PROBLEM If you look at the UTF-8/Latin-1 (AKA ISO-8859-1) Character Table you will find that the decimal code for the British pound sterling sign is 163 - and the hexadecimal code is A3.

£ = %A3

However, this is not the case in (all) encoding/decoding functions in Javascript...

encodeURI/encodeURIComponent
Encodes a Uniform Resource Identifier (URI) component by

replacing each instance of certain characters by one, two, or three escape sequences representing the UTF-8 encoding of the character

Which means, in order to encode our beloved pound sign, Javascript uses 2 characters. This is where the annoying "Â" comes in...

£ = %C2%A3

Hope it helps.

+5  A: 

accept-charset="UTF-8"

Does not do what you think it does (or the standard says it does) in IE. Instead, IE uses the value (‘UTF-8’) as an alternative list of encodings for if a field can't be encoded using the usual default encoding (which is the same as the page's own encoding).

So if you add this attribute and your page isn't already in UTF-8, you can be getting characters submitted as either the page encoding or UTF-8, and there is no way for your form-submission-reading script to know!

For this reason you should never use accept-charset; instead you should always ensure that the page containing the form is correctly served as “Content-Type: text/html;charset=utf-8” (by HTTP header and/or <meta>).

In fact if I put the € symbol with a £ symbol it also works fine.

Yes, that's because ‘€’ cannot be encoded in the page's default encoding (presumably ISO-8859-1). So IE resorts to sending the field encoded as UTF-8, which is what you wanted all along.

bobince
Thanks for the comment clarifying what was going on. Is more or less what I thought but nice to see it explained.
Jeremy French
+1  A: 

I think bobince has the ideal answer which is “serve the page in UTF-8", however as I can't do this I am posting my work around for prosperity.

Adding a hidden field unmunge with a non ISO-8859-1 (what our pages are served in) extended character forces the submission into UTF8

so

<input type="hidden" name="unmunge" value="&#x20ac;"  />

fixes the encoding (the entity is the euro symbol).

Jeremy French
A: 

I just want to thank Jeremy, this is the only way that I could display the euro symbol in my html input box! I spent ages on this until I found your solution, so thank-you!