var bytes:ByteArray = new ByteArray;
bytes.writeInt(0);
trace(bytes.length); // prints 4
trace(bytes.toString().length); // prints 4
When I run the above code the output suggests that every character in the string returned by toString contains one byte from the ByteArray. This is of course great if you want to display the content of t...
I found this in the wikipedia article on utf-8:
Sorting of UTF-8 strings as arrays of unsigned bytes will produce the same results as sorting them based on Unicode code points.
That would lead me to believe that for comparison purposes (sorting, binary search, etc) that comparing two byte arrays (i.e. byte-by-byte like memcmp) of u...
I have a some Chinese characters that I'm trying to display on a Kentico-powered website. This text is copy/pasted into Kenticos FCK editor, and is then saved and appears on the site. In Firefox, Chrome, and Safari, the characters appear exactly as expected. In IE 8 Standards mode, I see only boxes.
The text is UTF-8 encoded, and as ...
I have a data file (an Apple plist, to be exact), that has Unicode codepoints like \U00e8 and \U2019. I need to turn these into valid hexadecimal HTML entities using PHP.
What I'm doing right now is a long string of:
$fileContents = str_replace("\U00e8", "è", $fileContents);
$fileContents = str_replace("\U2019", "’", $fi...
In the game I'm making, I'd like to be able to display and have the user input Unicode characters. However, I'm having problems with using SpriteFonts to handle this task. Including all of the Unicode characters uses up WAY too many resources (it even causes VS2010 to crash!), so that's out of the question. But I'm not sure what other op...
Here's an excerpt from java.text.CharacterIterator documentation:
This interface defines a protocol for bidirectional iteration over text. The iterator iterates over a bounded sequence of characters. [...] The methods previous() and next() are used for iteration. They return DONE if [...], signaling that the iterator has reached t...
I'm trying to create hebrew strings but get syntax errors. It works in the IDLE shell but not in Pydev.
Here's what I've tried so far:
s = 'מחרוזת בעברית' #works in the shell only
s = u'מחרוזת בעברית' #doesn't work at all
s = unicode("מחרוזת בעברית", "UTF-8") #also doesn't work at all
I get a syntax error: Non-UTF-8 code starting with...
Hi,
I would like to write a Ruby script which writes Japanese characters to the console. For example:
puts "こんにちは・今日は"
However, I get an exception when running it:
jap.rb:1: Invalid char `\377' in expression
jap.rb:1: Invalid char `\376' in expression
Is it possible to do? I'm using Ruby 1.8.6.
...
So I have this page:
http://hub.iis.sinica.edu.tw/cytoHubba/
Apparently it's all kinds of messed up, as it gets decoded properly but when I try to save it in postgres I get:
DatabaseError: invalid byte sequence for encoding "UTF8": 0xedbdbf
The database clams up after that and refuses to do anything without a rollback, which will be...
I have just started with Android development and bought a handset(HTC Hero) for test and usage purposes. The sad part is that it doesn't display one of the scripts (Devanagari to be precise). Hence, I would like to contribute to the Android project to help render it. However, since I have just started I have no ideas of where to look for...
I am working on a simplistic website using PHP, a language I haven't touched in a long time, especially when it's running on a Windows system from WebMatrix. Lately I've been porting a couple of websites from ASP.NET (Umbraco) to PHP to a basic templating class written in PHP, and so far things have been going fairly well, until some uni...
Hey guys,
I have an iPhone app with thousands of users. Stuff they type goes into my database. I noticed an infrequent crash recently, and tracked it down to a piece of code failing when it had to deal with this character "…" (that's one character, not three dots).
Obviously I need to fix my code to deal with it, but in the meantime. D...
Is there a way of printing out every character that satifies a given regular expression?
For example, can I print all characters that matches regular expression, let's say, in Javascript:
[A-Za-z_-]|[\u00C0-\u00D6]|[\u00D8-\u00F6]|[\u00F8-\u02FF]|[\u0370-\u037D]|[\u037F-\u1FFF]|[\u200C-\u200D]|[\u2070-\u218F]|[\u2C00-\u2FEF]|[\u3001-\...
I need to add the TM(trademark) superscript symbol next to a title in a C# string. is there anyway to possibly do this?
Thanks!
...
I've written a Windows program in Delphi that places and wraps text very precisely to both the screen and printer using GetCharWidth and Em-Square. This has worked well with ANSI text where you only need to retrieve and calculate the widths of 255 characters but when you go to Unicode with 65535 characters its too slow. The problem is ma...
Just wonder how to convert a unicode string like u'é' to its unicode character code u'\xe9'? Thank you for your help.
...
Is there a list of the Unicode encoding hex of every Emoji character on iPhone? Thanks!
...
Hi!
I'm doing a filter wherein I check if a unicode (utf-8 encoding) string contains no uppercase characters (in all languages). It's fine with me if the string doesn't contain any cased character at all.
For example: 'Hello!' will not pass the filter, but "!" should pass the filter, since "!" is not a cased character.
I planned to u...
I am pulling data from the Facebook graph which has characters encoded like so: \u2014 and \u2014
Is there a function to convert those characters into HTML? i.e \u2014 -> —
If you have some further reading on these character codes), or suggested reading about unicode in general I would appreciate it. This is so confusing to me. I...
I'm linkifying @mentions in status messages returned by Twitter's API.
One of the tweets has a unicode character in it. Parsing the JSON (with either the json gem's JSON.parse or ActiveSupport::JSON.decode) returns a string that displays correctly, but the indices for the start and end of the @mention specified by the entity don't ma...