views:

33

answers:

2

I need to find out the names for Unicode characters when the user enters the number for it. An example would be to enter 0041 and get given "Latin Capital Letter A" as the result.

Thanks

A: 

As far as I know, there isn't a standard way to do this. You could probably parse the UnicodeData.txt file to get this information.

McDowell
http://unicode.org/Public/UNIDATA/Index.txt might be easier to parse
David Titarenco
@David Titarenco - the purpose of Index.txt is to generate name-to-codepoint mappings (like this one: http://www.unicode.org/charts/charindex.html ). You'll notice that there are three entries for U+0041 and none for U+0042. The various files and their purposes are documented here: http://unicode.org/reports/tr44/ Depending on what environment the code runs in, you might use the XML format, but the OP doesn't say much about where this code is going to run.
McDowell
A: 

Here should be what you're looking for. The first array is simply http://unicode.org/Public/UNIDATA/Index.txt with replacing newlines with |;

// this mess..
var unc = "A WITH ACUTE, LATIN CAPITAL LETTER   00C1| /*... really big array ...*/ |zwsp    200B";
var uncs=unc.split("|");
var final_a = [];
var final_s = "";
for each (var item in uncs) {
    var _T=item.split("\t");
    //final_a [_T[1]] = _T[0];
    final_s += '"' + _T[1] + '"' + ' : ' + '"' + _T[0] + '",';
}

console.log (final_s);

// yields..

var unicode_lookup = { /*really big array*/ }

// which we can use like so ...

alert(unicode_lookup["1D01"]);
// AE, LATIN LETTER SMALL CAPITAL

SO doesn't preserve tabs so the first part may not work if you simply copy-paste it. You'll note that some characters are duplicates so you may want to do some cleanup.

David Titarenco