views:

3539

answers:

2

In answering another question I became aware that my Javascript/DOM knowledge had become a bit out of date in that I am still using escape/unescape to encode the contents of URL components whereas it appears I should now be using encodeURIComponent/decodeURIComponent instead.

What I want to know is what is wrong with escape/unescape ? There are some vague suggestions that there is some sort of problem around Unicode characters, but I can't find any definite explanation.

My web experience is fairly biased, almost all of it has been writing big Intranet apps tied to Internet Explorer. That has involved a lot of use of escape/unescape and the apps involved have fully supported Unicode for many years now.

So what are the Unicode problems that escape/unescape are supposed to have ? Does anyone have any test cases to demonstrate the problems ?

+10  A: 

I think this article covers it pretty well

Peter Bailey
Excellent, just what I wanted. I see the issue is that Mozilla doesn't cope with Unicode in escape, which explains why I haven't run into any problems with it using an IE only app.
andynormancx
I am both blessed and cursed by my history of working with IE only Intranet apps. Blessed because I never have to cope with IE/FF differences and cursed for much the same reason.
andynormancx
Mozilla and IE both do the same (curious) thing with Unicode, even if the docs don't mention it.
bobince
+8  A: 

What I want to know is what is wrong with escape/unescape ?

They're not “wrong” as such, they're just their own special string format which looks a bit like URI-parameter-encoding but actually isn't. In particular:

  • ‘+’ means plus, not space
  • there is a special “%uNNNN” format for encoding Unicode UTF-16 code points, instead of encoding UTF-8 bytes

So if you use escape() to create URI parameter values you will get the wrong results for strings containing a plus, or any non-ASCII characters.

escape() could be used as an internal JavaScript-only encoding scheme, for example to escape cookie values. However now that all browsers support encodeURIComponent (which wasn't originally the case), there's no reason to use escape in preference to that.

There is only one modern use for escape/unescape that I know of, and that's as a quick way to implement a UTF-8 encoder/decoder, by leveraging the UTF-8 processing in URIComponent handling:

utf8bytes= unescape(encodeURIComponent(unicodecharacters));
unicodecharacters= decodeURIComponent(escape(utf8bytes));
bobince