views:

300

answers:

1

Some non-ascii characters get escaped in most html documents. Like this:

text: "därför"
html: "därför"

If I view source I can see that I have the html version on my page. I'd like to retrieve that with jquery but both text() and html() unescape it and give me back the text version.

I'm buried in umlauts and thirsty for ampersands. Can anybody tell me how to get them?

+2  A: 

The ampersands are how you encode the umlauts for XML and HTML (and SGML) when you are rendering the XML/HTML document in a stupid encoding.

When you render the XML/HTML document in an encoding that understands umlauts, such as one of the Unicode encodings like UTF-8, or one of the nice charsets that has the characters you need at that moment, you should actually use the umlauts, not the ampersands.

So with the ampersands is XML-encoded text, while with the umlauts is the real text that the DOM and JavaScript and jQuery see. The ampersands are decoded into the umlauts when the XML/HTML is parsed, before the text gets to DOM/JavaScript/jQuery.

Justice
+1. The entity reference markup is long gone by the time jQuery gets a look in. If your code can't handle umlauts, your code needs fixing, not the markup.
bobince