ansaurus

Question

Answer 1

+1 A:

Depends what you mean by ‘support’. You can certainly put non-UCS-2 characters in a JS string using surrogates, and browsers will display them if they can.

But, each item in a JS string is a separate UTF-16 code unit. There is no language-level support for handling full characters: all the standard String members (length, split, slice etc) all deal with code units not characters, so will quite happily split surrogate pairs or hold invalid surrogate sequences.

If you want surrogate-aware methods, I'm afraid you're going to have to start writing them yourself! For example:

String.prototype.getFullCharLength= function() {
    return this.length-this.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g).length+1;
};

String.fromFullCharCode= function() {
    var chars= [];
    for (var i= 0; i<arguments.length; i++) {
        var n= arguments[i];
        if (n<0x10000)
            chars.push(String.fromCharCode(n));
        else
            chars.push(String.fromCharCode(
                0xD800+((n-0x10000)&0x3FF),
                0xDC00+(((n-0x10000)>>10)&0x3FF)
            ));
    }
    return chars.join('');
};

bobince 2010-09-21 10:16:11

Thank you very much. That's a great, detailed answer.

Delan Azabani 2010-09-21 10:33:21

ansaurus

tags:

views:

answers:

JavaScript strings outside of the BMP

related questions