views:

29

answers:

2

I have gotten a value, encoded like so:

%3Cp%3E%0AGlobal%20Business%20Intensive%20Course%20%u2013%

I noticed that one of the characters seems to be encoded in a different manner at the end, the %u2013. It appears to be some form of unicode character, but it is causing me to get URI malformed errors. is there a way to replace these with standard encoding characters? In this example, it seems %u2013 is supposed to be a hyphen.

+2  A: 

That is malformed for sure. Where are you getting it from?

Here's a way to fix all occurrences of that type of malformation.

var str = '%3Cp%3E%0AGlobal%20Business%20Intensive%20Course%20%u2013%';

str = str.replace( /u\d{4}/g, function( sequence )
{
  return encodeURIComponent( eval( '"\\' + sequence + '"' ) );
} );
Peter Bailey
A: 

To be complete and more correct, the regular expression should also accept letters from A to F, since the %u2013 refers to a four-digit hexadecimal number. And you should definitely include the percent sign in the regular expression, otherwise you end up interpreting Blu2000 as a Unicode escape sequence, which it isn't.

function fixUnicodeUrl(url) {
    var result = str.replace(/%u[0-9a-f]{4}/gi, function (match) {
        var codepoint = parseInt(match.substring(2), 16);
        var str = String.fromCharCode(codepoint);
        return encodeURIComponent(str);
    });
    return result;
}

var str = '%3Cp%3E%0AGlobal%20Business%20Intensive%20Course%20%u2013%';
alert(fixUnicodeUrl(str));
Roland Illig