To be more precise, I need to know whether (and if possible, how) I can find whether a given string has double byte characters or not. Basically, I need to open a pop-up to display a given text which can contain double byte characters, like Chinese or Japanese. In this case, we need to adjust the window size than it would be for English or ASCII. Anyone has a clue?
views:
3826answers:
5Why not let the window resize itself based on the runtime height/width?
Run something like this in your pop-up:
window.resizeTo(document.body.clientWidth, document.body.clientHeight);
Actually, all of the characters are Unicode, at least from the Javascript engine's perspective.
Unfortunately, the mere presence of characters in a particular Unicode range won't be enough to determine you need more space. There are a number of characters which take up roughly the same amount of space as other characters which have Unicode codepoints well above the ASCII range. Typographic quotes, characters with diacritics, certain punctuation symbols, and various currency symbols are outside of the low ASCII range and are allocated in quite disparate places on the Unicode basic multilingual plane.
Generally, projects that I've worked on elect to provide extra space for all languages, or sometimes use javascript to determine whether a window with auto-scrollbar css attributes actually has content with a height which would trigger a scrollbar or not.
If detecting the presence of, or count of, CJK characters will be adequate to determine you need a bit of extra space, you could construct a regex using the following ranges: [\u3300-\u9fff\uf900-\ufaff], and use that to extract a count of the number of characters that match. (This is a little excessively coarse, and misses all the non-BMP cases, probably excludes some other relevant ranges, and most likely includes some irrelevant characters, but it's a starting point).
Again, you're only going to be able to manage a rough heuristic without something along the lines of a full text rendering engine, because what you really want is something like GDI's MeasureString (or any other text rendering engine's equivalent). It's been a while since I've done so, but I think the closest HTML/DOM equivalent is setting a width on a div and requesting the height (cut and paste reuse, so apologies if this contains errors):
o = document.getElementById("test");
document.defaultView.getComputedStyle(o,"").getPropertyValue("height"))
JavaScript holds text internally as UCS-16, which can encode a fairly extensive subset of Unicode.
But that's not really germane to your question. One solution might be to loop through the string and examine the character codes at each position:
function isDoubleByte(str) {
for (var i = 0, n = str.length; i < n; i++) {
if (str[i].charCodeAt() > 255) { return true; }
}
return false;
}
This might not be as fast as you would like.
You can use a regular expression to figure out whether a string contains non-Latin codepoints
function containsNonLatinCodepoints(s) {
return /[^\\u0000-\\u00ff]/.test(s);
}
I used mikesamuel answer on this one. However I noticed perhaps because of this form that there should only be one escape slash before the u
, e.g. \u
and not \\u
to make this work correctly.
function containsNonLatinCodepoints(s) {
return /[^\u0000-\u00ff]/.test(s);
}
Works for me :)