I'm currently again working on a program from when I was, umm... less capable. It has a number of problems:
The database collation is latin1_swedish_ci. I would like to convert it to utf8. How would I do this?
The database has some fields that are boolean values stored as 0 or 1. However, the fields are varchars instead of bools. How c...
hibyte lobyte makeunicode
250 65 57345
I got this table, and the hibyte and lobyte are some chinese character which may use big5 or GBK encoding, hibyte is hight byte, and lobyte is low byte.
And I think the unicode might be some encoding in unicode that corresponding to the big5/GBK character with the hibyte and lobyte....
It seems that we have managed to insert into our database 2 unicode characters for each of the unicode characters we want,
For example, for the unicde char 0x3CBC, we've inserted the unicode equivalents for each of it's components (0xC383 AND 0xC2BC)
Can anyone think of a simple solution for fixing this?
I've come up with something li...
We all know how easy character sets are on the web, yet every time you think you got it right, a foreign charset bites you in the butt. So I'd like to trace the steps of what happens in a fictional scenario I will describe below. I'm going to try and put down my understanding as well as possible but my question is for you folks to correc...
I'm trying to work my way through some frustrating encoding issues by going back to basics. In Dive Into Python example 9.14 (here) we have this:
>>> s = u'La Pe\xf1a'
>>> print s
Traceback (innermost last): File "<interactive input>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range(128)
>>> print s.encode('latin-...
The unicode standard has enough code-points in it that you need 4 bytes to store them all. That's what the UTF-32 encoding does. Yet the UTF-8 encoding somehow squeezes these into much smaller spaces by using something called "variable-width encoding".
In fact, it manages to represent the first 127 characters of US-ASCII in just one...
My .NET library has to marshal strings to a C library that expects text encoded using the system's default ANSI code page. Since .NET supports Unicode, this makes it possible for users to pass a string to the library that doesn't properly convert to ANSI. For example, on an English machine, "デスクトップ" will turn in to "?????" when passed ...
I'm trying to log a UTF-8 encoded string to a file using Python's logging package. As a toy example:
import logging
def logging_test():
handler = logging.FileHandler("/home/ted/logfile.txt", "w",
encoding = "UTF-8")
formatter = logging.Formatter("%(message)s")
handler.setFormatter(formatte...
I'm dealing with code that does various IO operations with files, and I want to make it able to deal with international filenames. I'm working on a Mac with Java 1.5, and if a filename contains Unicode characters that require surrogates, the JVM can't seem to locate the file. For example, my test file is:
"草鷗外.gif" which gets broken int...
Hi all,
I am writing a utility in Java that reads a stream which may contain both text and binary data. I want to avoid having I/O wait. To do that I create a thread to keep reading the data (and wait for it) putting it into a buffer, so the clients can check avialability and terminate the waiting whenever they want (by closing the inpu...
I am working on a translation application in which users are allowed to give English input and I need to convert to a target language and display on a text box. I am facing problems in displaying unicode characters.
Complex characters are not rendering correctly. I know windows uses Uniscribe for rendering complex characters. So do I ne...
How can i convert the 'dead' string to an unicode string u'\xde\xad'?
Doing this:
from binascii import unhexlify
out = ''.join(x for x in [unhexlify('de'), unhexlify('ad')])
creates a <type 'str'> string '\xde\xad'
Trying to use the Unicode.join() like this:
from binascii import unhexlify
out = ''.join(x for x in [u'', unhexlify('d...
I am reading data from a .csv file and displaying it. When I encounter the micro character (µ) some special symbols are displayed instead. How can I display the micro character?
...
Hi all
I need to print unicodes of A-Z in java.
How do i print the unicode of a character in java.
Abdul khaliq
...
I have PHP talking to SQLServer through ODBC using FreeTDS and unixODBC. I followed tutorials to get this setup. It's working fine now although special characters in the database are not showing up correctly. Specifically, the ™ symbol. It's showing up in the browser as �. I've tried setting client charset = UTF-8 in the [global] se...
Hello all,
This may be a silly question but... Say I have a String like 4e59 which represents a special unicode character. How can I add the \u to the beginning of that character so that it displays correctly? I've tried the simplest solution of:
String str = "4e59";
System.out.println("\\u"+str);
And several other variants, what am I...
In my game code, I process key input by handling WM-KEYDOWN message.
wParam gives me the keycode i need.
The problem is with IME, especially KoreanIME.
I get WM-IME-COMPOSITION and then WM-KEYUP, but never the WM-KEYDOWN.
So, the bottom line is.. I need to get keycode when i receive WM-IME-COMPOSITION.
Is there a way to do so?
Any hel...
OK, the docs for Python's libxml2 bindings are really ****. My problem:
An XML document is stored in a string variable in Python. The string is a instance of Unicode, and there are non-ASCII characters in it. I want to parse it with libxml2, looking something like this:
# -*- coding: utf-8 -*-
import libxml2
DOC = u"""<?xml version="1...
Hello!
I can't solve my problem with regexp.
Ok, when i type:
$string = preg_replace("#\[name=([a-zA-Z0-9 .-]+)*]#","$name_start $1 $name_end",$string);
everything is ok, except situation with Russian language.
so, i try to re-type this reg-exp:
$string = preg_replace("#\[name=([a-zA-Z0-9**а-яА-Я** .-]+)*]#","$name_start $1 $name_e...
I am facing problem capturing Chinese characters in a dataset.
In Delphi 2010 I have tried two kinds of components:
Delphi default
Developer Express components
As result, those components that do not link to the datasource are working fine, but those components do that link to the datasource have a problem. The Chinese characters ha...