unicode

AS3: Can ByteArray return its content as a string with two bytes per unicode character?

var bytes:ByteArray = new ByteArray; bytes.writeInt(0); trace(bytes.length); // prints 4 trace(bytes.toString().length); // prints 4 When I run the above code the output suggests that every character in the string returned by toString contains one byte from the ByteArray. This is of course great if you want to display the content of t...

Is comparing two byte[] of utf-8 encoded strings the same as comparing two unicode strings?

I found this in the wikipedia article on utf-8: Sorting of UTF-8 strings as arrays of unsigned bytes will produce the same results as sorting them based on Unicode code points. That would lead me to believe that for comparison purposes (sorting, binary search, etc) that comparing two byte arrays (i.e. byte-by-byte like memcmp) of u...

Why is IE failing to show UTF-8 encoded text?

I have a some Chinese characters that I'm trying to display on a Kentico-powered website. This text is copy/pasted into Kenticos FCK editor, and is then saved and appears on the site. In Firefox, Chrome, and Safari, the characters appear exactly as expected. In IE 8 Standards mode, I see only boxes. The text is UTF-8 encoded, and as ...

How do I convert unicode codepoints to hexadecimal HTML entities?

I have a data file (an Apple plist, to be exact), that has Unicode codepoints like \U00e8 and \U2019. I need to turn these into valid hexadecimal HTML entities using PHP. What I'm doing right now is a long string of: $fileContents = str_replace("\U00e8", "è", $fileContents); $fileContents = str_replace("\U2019", "’", $fi...

Unicode string display in XNA

In the game I'm making, I'd like to be able to display and have the user input Unicode characters. However, I'm having problems with using SpriteFonts to handle this task. Including all of the Unicode characters uses up WAY too many resources (it even causes VS2010 to crash!), so that's out of the question. But I'm not sure what other op...

Can a valid Unicode string contain FFFF? Is Java/CharacterIterator broken?

Here's an excerpt from java.text.CharacterIterator documentation: This interface defines a protocol for bidirectional iteration over text. The iterator iterates over a bounded sequence of characters. [...] The methods previous() and next() are used for iteration. They return DONE if [...], signaling that the iterator has reached t...

How to generate hebrew strings in python 3?

I'm trying to create hebrew strings but get syntax errors. It works in the IDLE shell but not in Pydev. Here's what I've tried so far: s = 'מחרוזת בעברית' #works in the shell only s = u'מחרוזת בעברית' #doesn't work at all s = unicode("מחרוזת בעברית", "UTF-8") #also doesn't work at all I get a syntax error: Non-UTF-8 code starting with...

Unicode characters in a Ruby script?

Hi, I would like to write a Ruby script which writes Japanese characters to the console. For example: puts "こんにちは・今日は" However, I get an exception when running it: jap.rb:1: Invalid char `\377' in expression jap.rb:1: Invalid char `\376' in expression Is it possible to do? I'm using Ruby 1.8.6. ...

How can I check a Python unicode string to see that it *actually* is proper Unicode?

So I have this page: http://hub.iis.sinica.edu.tw/cytoHubba/ Apparently it's all kinds of messed up, as it gets decoded properly but when I try to save it in postgres I get: DatabaseError: invalid byte sequence for encoding "UTF8": 0xedbdbf The database clams up after that and refuses to do anything without a rollback, which will be...

How do I contribute to the Android project on supporting/rendering non-English language ?

I have just started with Android development and bought a handset(HTC Hero) for test and usage purposes. The sad part is that it doesn't display one of the scripts (Devanagari to be precise). Hence, I would like to contribute to the Android project to help render it. However, since I have just started I have no ideas of where to look for...

Unicode characters at top of page causing gap in WebMatrix

I am working on a simplistic website using PHP, a language I haven't touched in a long time, especially when it's running on a Windows system from WebMatrix. Lately I've been porting a couple of websites from ASP.NET (Umbraco) to PHP to a basic templating class written in PHP, and so far things have been going fairly well, until some uni...

Strange character from iPhone keyboard

Hey guys, I have an iPhone app with thousands of users. Stuff they type goes into my database. I noticed an infrequent crash recently, and tracked it down to a piece of code failing when it had to deal with this character "…" (that's one character, not three dots). Obviously I need to fix my code to deal with it, but in the meantime. D...

How can you print all the characters that satisfy a regular expression?

Is there a way of printing out every character that satifies a given regular expression? For example, can I print all characters that matches regular expression, let's say, in Javascript: [A-Za-z_-]|[\u00C0-\u00D6]|[\u00D8-\u00F6]|[\u00F8-\u02FF]|[\u0370-\u037D]|[\u037F-\u1FFF]|[\u200C-\u200D]|[\u2070-\u218F]|[\u2C00-\u2FEF]|[\u3001-\...

Adding a TM superScript to a string.

I need to add the TM(trademark) superscript symbol next to a title in a C# string. is there anyway to possibly do this? Thanks! ...

WYSIWIG with Unicode

I've written a Windows program in Delphi that places and wraps text very precisely to both the screen and printer using GetCharWidth and Em-Square. This has worked well with ANSI text where you only need to retrieve and calculate the widths of 255 characters but when you go to Unicode with 65535 characters its too slow. The problem is ma...

How to convert an accented character in an unicode string to its unicode character code using Python?

Just wonder how to convert a unicode string like u'é' to its unicode character code u'\xe9'? Thank you for your help. ...

Is there a list of Unicode encoding range for the Emoji characters?

Is there a list of the Unicode encoding hex of every Emoji character on iPhone? Thanks! ...

Python: How to check if a unicode string contains a cased character?

Hi! I'm doing a filter wherein I check if a unicode (utf-8 encoding) string contains no uppercase characters (in all languages). It's fine with me if the string doesn't contain any cased character at all. For example: 'Hello!' will not pass the filter, but "!" should pass the filter, since "!" is not a cased character. I planned to u...

How to parse unicode format (e.g. \u201c, \u2014) using PHP

I am pulling data from the Facebook graph which has characters encoded like so: \u2014 and \u2014 Is there a function to convert those characters into HTML? i.e \u2014 -> — If you have some further reading on these character codes), or suggested reading about unicode in general I would appreciate it. This is so confusing to me. I...

Character indices of a string containing unicode characters

I'm linkifying @mentions in status messages returned by Twitter's API. One of the tweets has a unicode character in it. Parsing the JSON (with either the json gem's JSON.parse or ActiveSupport::JSON.decode) returns a string that displays correctly, but the indices for the start and end of the @mention specified by the entity don't ma...