views:

1346

answers:

1

Hi folks,

I am having a very strange problem with pound signs displaying incorrectly (or not at all) on a web page.

I am keying text in a textbox, which then gets (briefly) stored in XML before being displayed in a new IE(6) window.

The worst part is that this is inconsistent. I have three different things happening:
1. Pound sign doesn't even appear in source code (assume XML is stripping this out as it appears to use UTF-8 by default).
2. Pound sign appears in source but not on web page.
3. Pound sign appears in source AND FINE on web page (usually, if this happens at all, the first time this is displayed).

Now, this is just one specific part of a bigger problem. I've been looking generally into this and done some research, and it appears that if I have plain ASCII (ISO 8859-1 - Western Europe) and convert to UTF-8, it has no idea what the symbol is and removes it completely (in this case, tho I have seen it replaced by a '?', a box, or an upside down '?' elsewhere).

If you input the pound sign as UTF-8 and convert back to ISO 8859-1, it gains a capital A hat (Â) before the pound sign.

I can understand the latter, at least on a basic level - it is because our system must have pound signs saved (or stored in Oracle) with different character encodings throughout it, and, as we don't specify the character encoding (at least generally) for our web pages, sometimes IE gets confused and doesn't display things correctly.

What I don't understand is the inconsistent outcome outlined above.

I realise that I have been a bit vague in my initial explanation, but I hoped that writing out my brief explanation might help myself get my thoughts straight, and possibly help others understand similar problems in the future.

EDIT: Also, I realise I could exchange all the pound signs for the HTML entity (£), but I feel this is time consuming and messy (what if it is stored in Oracle and is later passed to PDF, Excel, etc?).

Obviously any pointers and advice would be appreciated though!

Thanks!

+1  A: 

I am keying text in a textbox, which then gets (briefly) stored in XML before being displayed in a new IE(6) window.

The problem is most likely embedded in this sequence. It would help if you could elaborate the specifics of how this sequence is acheived.

The most common cause for this sort of problem is a mismatch in an understanding between what a client actually encodes a character as and what the server thinks the encoding is. The simplest solution to this is to place the accept-charset attribute on the form element which makes the character encoding of a post explicit.

The text posted in the stuff field will be encoded in utf-8.

The reason for some inconsitencies are:-

  1. It possible that the server can code the characters in the db incorrectly but then when sending those same characters to a browser reverse the corruption, things look fine again on the browser.
  2. ISO-8859-1 means different things in different places. IE6 is somewhat loose with that character set, and will actually treat is as Windows-1252. Other applications place a sctricter interpretaion on ISO-8859-1.
AnthonyWJones
Thanks Anthony, keep getting pulled onto different issues unrelated to this. In fact, I might have to stop looking into this, and just do a search and replace on the pound signs for html entities (£) even though I really don't want to...Will update the post when I get further, thanks fr the detailed reply (I cant "Like" your response til I have 15 rep, so as soon as I get that...).
FrostbiteXIII
@FrostbiteXIII, well theres 10 for you anyway ;).
AnthonyWJones
Thanks (cant believe you cant just say thanks without padding it out with this pointless line to make it over 15 characters!)! :)
FrostbiteXIII
Thanks again for the answer - I think a lot of things were going wrong, so I set ecverything I could to UTF-8 (including what you suggested). Whilst some things are still going wrong, some are fixed, and I think this is the best Im going to get it.
FrostbiteXIII