unicode

Fixing older program: database text encoding, and incorrect field types.

I'm currently again working on a program from when I was, umm... less capable. It has a number of problems: The database collation is latin1_swedish_ci. I would like to convert it to utf8. How would I do this? The database has some fields that are boolean values stored as 0 or 1. However, the fields are varchars instead of bools. How c...

can someone help me to figure this out ? about unicode.

hibyte lobyte makeunicode 250 65 57345 I got this table, and the hibyte and lobyte are some chinese character which may use big5 or GBK encoding, hibyte is hight byte, and lobyte is low byte. And I think the unicode might be some encoding in unicode that corresponding to the big5/GBK character with the hibyte and lobyte....

Fixing Unicode Oops

It seems that we have managed to insert into our database 2 unicode characters for each of the unicode characters we want, For example, for the unicde char 0x3CBC, we've inserted the unicode equivalents for each of it's components (0xC383 AND 0xC2BC) Can anyone think of a simple solution for fixing this? I've come up with something li...

Please help me trace how charsets are handled every step of the way

We all know how easy character sets are on the web, yet every time you think you got it right, a foreign charset bites you in the butt. So I'd like to trace the steps of what happens in a fictional scenario I will describe below. I'm going to try and put down my understanding as well as possible but my question is for you folks to correc...

Does python's print function handle unicode differently now than when Dive Into Python was written?

I'm trying to work my way through some frustrating encoding issues by going back to basics. In Dive Into Python example 9.14 (here) we have this: >>> s = u'La Pe\xf1a' >>> print s Traceback (innermost last): File "<interactive input>", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128) >>> print s.encode('latin-...

How does UTF-8 "variable-width encoding" work?

The unicode standard has enough code-points in it that you need 4 bytes to store them all. That's what the UTF-32 encoding does. Yet the UTF-8 encoding somehow squeezes these into much smaller spaces by using something called "variable-width encoding". In fact, it manages to represent the first 127 characters of US-ASCII in just one...

I need a string that won't properly convert to ANSI using several code pages.

My .NET library has to marshal strings to a C library that expects text encoded using the system's default ANSI code page. Since .NET supports Unicode, this makes it possible for users to pass a string to the library that doesn't properly convert to ANSI. For example, on an English machine, "デスクトップ" will turn in to "?????" when passed ...

UTF-8 In Python logging, how?

I'm trying to log a UTF-8 encoded string to a file using Python's logging package. As a toy example: import logging def logging_test(): handler = logging.FileHandler("/home/ted/logfile.txt", "w", encoding = "UTF-8") formatter = logging.Formatter("%(message)s") handler.setFormatter(formatte...

Java Can't Open a File with Surrogate Unicode Values in the Filename?

I'm dealing with code that does various IO operations with files, and I want to make it able to deal with international filenames. I'm working on a Mac with Java 1.5, and if a filename contains Unicode characters that require surrogates, the JVM can't seem to locate the file. For example, my test file is: "草鷗外.gif" which gets broken int...

Extract first valid line of string from byte array

Hi all, I am writing a utility in Java that reads a stream which may contain both text and binary data. I want to avoid having I/O wait. To do that I create a thread to keep reading the data (and wait for it) putting it into a buffer, so the clients can check avialability and terminate the waiting whenever they want (by closing the inpu...

Rendering unicode characters correctly on textbox

I am working on a translation application in which users are allowed to give English input and I need to convert to a target language and display on a text box. I am facing problems in displaying unicode characters. Complex characters are not rendering correctly. I know windows uses Uniscribe for rendering complex characters. So do I ne...

Convert from hex string to unicode

How can i convert the 'dead' string to an unicode string u'\xde\xad'? Doing this: from binascii import unhexlify out = ''.join(x for x in [unhexlify('de'), unhexlify('ad')]) creates a <type 'str'> string '\xde\xad' Trying to use the Unicode.join() like this: from binascii import unhexlify out = ''.join(x for x in [u'', unhexlify('d...

how to overcome font problem in blackberry?

I am reading data from a .csv file and displaying it. When I encounter the micro character (µ) some special symbols are displayed instead. How can I display the micro character? ...

printing unicode in java

Hi all I need to print unicodes of A-Z in java. How do i print the unicode of a character in java. Abdul khaliq ...

SQLServer data in PHP loses multibyte characters

I have PHP talking to SQLServer through ODBC using FreeTDS and unixODBC. I followed tutorials to get this setup. It's working fine now although special characters in the database are not showing up correctly. Specifically, the ™ symbol. It's showing up in the browser as �. I've tried setting client charset = UTF-8 in the [global] se...

Java unicode question

Hello all, This may be a silly question but... Say I have a String like 4e59 which represents a special unicode character. How can I add the \u to the beginning of that character so that it displays correctly? I've tried the simplest solution of: String str = "4e59"; System.out.println("\\u"+str); And several other variants, what am I...

IME - How to handle key press

In my game code, I process key input by handling WM-KEYDOWN message. wParam gives me the keycode i need. The problem is with IME, especially KoreanIME. I get WM-IME-COMPOSITION and then WM-KEYUP, but never the WM-KEYDOWN. So, the bottom line is.. I need to get keycode when i receive WM-IME-COMPOSITION. Is there a way to do so? Any hel...

Python's libxml2 can't parse unicode strings

OK, the docs for Python's libxml2 bindings are really ****. My problem: An XML document is stored in a string variable in Python. The string is a instance of Unicode, and there are non-ASCII characters in it. I want to parse it with libxml2, looking something like this: # -*- coding: utf-8 -*- import libxml2 DOC = u"""<?xml version="1...

regexp with russian lang

Hello! I can't solve my problem with regexp. Ok, when i type: $string = preg_replace("#\[name=([a-zA-Z0-9 .-]+)*]#","$name_start $1 $name_end",$string); everything is ok, except situation with Russian language. so, i try to re-type this reg-exp: $string = preg_replace("#\[name=([a-zA-Z0-9**а-яА-Я** .-]+)*]#","$name_start $1 $name_e...

How can I work with Chinese characters from a database?

I am facing problem capturing Chinese characters in a dataset. In Delphi 2010 I have tried two kinds of components: Delphi default Developer Express components As result, those components that do not link to the datasource are working fine, but those components do that link to the datasource have a problem. The Chinese characters ha...