questions about unicode | ansaurus

unicode

What is the deal with the unicode character 首(U+9996) and how java/mysql deal with it and its friends?

Man, this character encoding hole just keeps on getting deeper. Sigh. Ok. Check this out: I have a java String that contains the unicode character U+9996 (that's what I get if I do codePointAt()). If I look at it in the debugger expressions panel (in eclipse) then all is well and it looks like "首". However if I print it out to the conso...

UTF8 MySQL problems on Rails - encoding issues with utf8_general_ci

I have a staging Rails site up that's running on MySQL 5.0.32-Debian. On this particular site, all of my tables are using utf8 / utf8_general_ci encoding. Inside that database, I have some data that looks like so: mysql> select * from currency_types limit 1,10; +------+-----------------+---------+ | code | name | symbol | ...

json_encode and mysql unicode problem

i have the following javascript code: http://www.nomorepasting.com/getpaste.php?pasteid=22561 Which works fine(the makewindows function has been changed to show it is a php variable), however the html contains unicode characters, and will only be assigned characters leading up to the first unicode character. If I make a small test file...

Displaying International Text

I am looking to create an ASP.net page that will have a control like GridView or Repeater and the data to be displayed in this page can be either unicode or Utf-8 . I am really struggling to display languages like Hebrew and some asian languages. How do I show any type of language on the ASP.net page?? I have tried the meta tag option ...

character-encoding

Multiple Font/Language support in Flash

Greetings, I'm in the middle of writing a Flash application which has multilingual support. My initial choice of font for this was Tahoma, for its Unicode support. The client prefers a non-standard font such as Lucida Handwriting. Lucida Handwriting doesn't have the same, say, Cyrillic support as Tahoma, which poses a problem that th...

Rails + MySQL Unicode

I'm trying to get Unicode working properly in rails using MySQL. Now, Rails displays the text correctly, but it shows up as ??? in MySQL. Additionally, I am not able to filter the text. My MySQL database has been configured with the utf8 character set. My client character is also UTF8. Likewise, rails is set to use UTF8. If I ent...

converting unicode for mysql and JSON

Hello, I have some html that was inserted into a mysql database from a csv file, which in turn was exported from an access mdb file. The mdb file was exported as Unicode, and indeed is unocode. I am however unsure as what encoding the mysql database has. When I try to echo out html stored in a field however, there is no unicode. This i...

I18n and Passwords that aren't US-ASCII, Latin1, or Win1252

How do you handle passwords for services when the user enters something that is best represented in Unicode or some other non-Latin character encoding? Specifically, can you use a Cyrillic password as a password to Oracle? What do you do to verify a user's password against a Windows authentication mechanism if the password is provided a...

internationalization

how to debug vb6 richtextbox not showing unicode (chinese) properly.

hi there, I have a simple vb6 editor type application which has a richtextbox as the editor page. It allows users to key in stuff and the store it into a file which will keep all the text in RTF stored as CDATA in xml. When you load back the file, it will read it off the xml and load back the rtf. We allow for unicode editing, but my pr...

Unicode programming problems, php outputting raw code

I have a mysql database set as utf-8, and csv data set as utf-8, delimited by semicolons and enclosed by double quotes. The data Is seemingly imported fine, when doing a direct dump from the database. However when attempting to display one of the fields containing html by echoing out in PHP, part of the html code is displayed instead o...

Can someone explain oracle 10G encoding, nls_lang unicode

I'll try and make it a fair reflection of my actual query. It's more to settle my confusion. Let's start at the beginning. A web front end hosted somewhere and numerous clients inserting data into web forms which is sent to and Oracle 10G database via stored procs. I have no idea of client settings nor the web server settings. So I h...

C# Regular Expressions with \Uxxxxxxxx characters in the pattern.

Regex.IsMatch( "foo", "[\U00010000-\U0010FFFF]" ) Throws: System.ArgumentException: parsing "[-]" - [x-y] range in reverse order. Looking at the hex values for \U00010000 and \U0010FFF I get: 0xd800 0xdc00 for the first character and 0xdbff 0xdfff for the second. So I guess I have really have one problem. Why are the Unicode charact...

Python UnicodeDecodeError - Am I misunderstanding encode?

Any thoughts on why this isn't working? I really thought 'ignore' would do the right thing. >>> 'add \x93Monitoring\x93 to list '.encode('latin-1','ignore') Traceback (most recent call last): File "<interactive input>", line 1, in ? UnicodeDecodeError: 'ascii' codec can't decode byte 0x93 in position 4: ordinal not in range(128) ...

Best way to decode hex sequence of unicode characters to string

Hello, What is the most code free way to decode a string: \xD0\xAD\xD0\xBB\xD0\xB5\xD0\xBA\xD1\x82\xD1\x80\xD0\xBE\xD0\xBD\xD0\xBD\xD0\xB0\xD1\x8F to human string in C#? This hex string contains some unicode symbols. I know about System.Convert.ToByte(string, fromBase); But I was wondering if there are some built-in helpers tha...

Is WideString identical to String in Delphi 2009

I'm getting some weird behaviour recompiling some applications in 2009 that used widestrings at various points. In a Delphi 2009 App is Widestring identical to String? ...

Boost.format and wide characters

Is there a way to get boost.format to use and return wide (Unicode) character strings? I'd like to be able to do things like: wcout << boost::format(L"...") % ... and wstring s = boost::str(boost::format(L"...") % ...) Is this possible? ...

Font choices in International scenarios: multilingual vs unicode

I have a website that will eventually display multiple languages. I notice the common fonts used in web CSS (ex: Arial, Verdana, Times New Roman, Tahoma) and even the newer Vista/Office 2007/VS2008 fonts (Calibri,Cambria, Candara, Corbel, etc) are significantly larger (~350K) than your average (US only?) TTF font (~50k) so these fonts c...

internationalization

Howto identify UTF-8 encoded strings

What's the best way to identify if a string (is or) might be UTF-8 encoded? The Win32 API IsTextUnicode isn't of much help here. Also, the string will not have an UTF-8 BOM, so that cannot be checked for. And, yes, I know that only characters above the ASCII range are encoded with more than 1 byte. ...

Why is my WM_UNICHAR handler never called?

I have an ATL control that I want to be Unicode-aware. I added a message handler for WM_UNICHAR: MESSAGE_HANDLER( WM_UNICHAR, OnUniChar ) But, for some reason, the OnUniChar handler is never called. According to the documentation, the handler should first be called with "UNICODE_NOCHAR", on which the handler should return TRUE if you...

Convert GB2312 to UTF-8

I have a text file that contains localized language strings that is currently encoded in GB2312 (simplified Chinese), but all of my other language files are in UTF-8. I am finding it very difficult to work with this file, as none of my text editors will work properly with it and keep corrupting it. Are there any tools to convert this to ...

1
...
7
8
9
10
11
...
104