unicode

How to read Unicode characters from command-line arguments in Python on Windows

I want my Python script to be able to read Unicode command line arguments in Windows. But it appears that sys.argv is a string encoded in some local encoding, rather than Unicode. How can I read the command line in full Unicode? Example code: argv.py import sys first_arg = sys.argv[1] print first_arg print type(first_arg) print first_...

can Ruby get the filenames in a folder if they have Unicode characters (on Windows Vista)?

i was writing a script on Windows Vista to move the files in a folder to another hard drive, but found that Ruby 1.8.6 or 1.9 both would get back filenames with Unicode characters in it replaced by "??????" so for example, the filename "Chart for ???????.doc" is returned so the file cannot be moved at all... i used filename.each_byt...

How to stop the 'gem' utility from accessing my home directory?

When I run gem install <somegem> command the gem utility tries to access my home directory. It contains some non-latin characters and installation fails because of that. For example: E:\ruby\bin>gem install <somegem> ERROR: While executing gem ... (Errno::ENOENT) No such file or directory - C:\Documents and Settings\<user> I...

Reading a plist utf-8 value as utf-16

I'm working on an iphone app that needs to display superscripts and subscripts. I'm using a picker to read in data from a plist but the unicode values aren't being displayed corretly in the pickerview. Subscripts and superscripts are not being recognized. I'm assuming this is due to the encoding of the plist as utf-8, so the question ...

Setting the encoding for sax parser in Python

When I feed a utf-8 encoded xml to an ExpatParser instance: def test(filename): parser = xml.sax.make_parser() with codecs.open(filename, 'r', encoding='utf-8') as f: for line in f: parser.feed(line) ...I get the following: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "te...

convert CString to const char*

How do I convert a CString to const char* in my Unicode MFC application? ...

Unicode problem with JSF and HTML forms?

I have an HTML form generated by JSF which maps an input element to a bean setter and it looks to me like JSF is garbling unicode input on the way in. In particular I put the following exception for testing purposes in the setter public void setTitle(String title){ System.out.println("title set with: "+title+"\n"); if (title.st...

Convert unicode codepoint to UTF8 hex in python

I want to convert a number of unicode codepoints read from a file to their UTF8 encoding. e.g I want to convert the string 'FD9B' to the string 'EFB69B'. I can do this manually using string literals like this: u'\uFD9B'.encode('utf-8') but I cannot work out how to do it programatically. ...

International Fonts Display Issue with UTF-8

Hi We have developed a PHP-MySQL application in two languages - English and Gujarati. The Gujarati language contains symbols that need unicode UTF-8 encoding for proper display. The application runs perfectly on my windows based localhost and on my Linux based testing server on the web. But when I transfer the application to the clie...

ASP.NET Validation

Im getting a validation error when a user enters a Foreign name. An example is: System.Web.HttpRequestValidationException: A potentially dangerous Request.Form value was detected from the client (ctl00$pageContent$txtName="Pedro ú logo"). where the ú is being translated as & # 250 ; (without the spaces of course) These foreign charac...

Passing double-byte (WCHAR) strings from C++ to Java via JNI.

I have a Java application that uses a C++ DLL via JNI. A few of the DLL's methods take string arguments, and some of them return objects that contain strings as well. Currently the DLL does not support Unicode, so the string handling is rather easy: Java calls String.getBytes() and passes the resulting array to the DLL, which simply ...

UTF-8 only in Grails 1.1 database tables

When using Grails 1.1 together with a MySQL the charsets of the auto-generated database tables seem to default to ISO-8859-1. I'd rather have everything stored as pure UTF-8. Is that possible? From the auto-generated database definitions: ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin1; Note the "latin1" part. A work-around th...

New unicode characters in C++0x

I'm buiding an API that allows me to fetch strings in various encodings, including utf8, utf16, utf32 and wchar_t (that may be utf32 or utf16 according to OS). New C++ standard had introduced new types char16_t and char32_t that do not have this sizeof ambiguity and should be used in future, so I would like to support them as well, but...

Converting to safe unicode in python

I'm dealing with unknown data and trying to insert into a MySQL database using Python/Django. I'm getting some errors that I don't quite understand and am looking for some help. Here is the error. Incorrect string value: '\xEF\xBF\xBDs m...' My guess is that the string is not being properly converted to unicode? Here is my code for...

Windows cmd encoding change causes Python crash.

First I chage Windows CMD encoding to utf-8 and run Python interpreter: chcp 65001 python Then I try to print a unicode sting inside it and when i do this Python crashes in a peculiar way (I just get a cmd prompt in the same window). >>> import sys >>> print u'ëèæîð'.encode(sys.stdin.encoding) Any ideas why it happ...

How do I encode/decode UTF-16LE byte arrays with a BOM?

I need to encode/decode UTF-16 byte arrays to and from java.lang.String. The byte arrays are given to me with a Byte Order Marker (BOM), and I need to encoded byte arrays with a BOM. Also, because I'm dealing with a Microsoft client/server, I'd like to emit the encoding in little endian (along with the LE BOM) to avoid any misunderstand...

Is it bad practice to use unicode symbols or shapes in a  app?

There have been a few times where I've used unicode symbols in place of small icons in one of my Cocoa apps, either because it's easier to draw inline with text or because I didn't feel like firing up Photoshop to draw a simple arrow. I've wondered though, could there be issues with localization or fonts I might not be aware of? Are ther...

C# Display unicode text in the caption of a message box

Hello, C# 2005. My application supports 2 langauges English and Thai. However, in my message box for the caption it will display question marks i.e. ???????????? when I have to display Thai langauge. The message box string text is ok. That displays ok. However, its just the caption that is having a problem. Do I need to enable unico...

Encoding Conversion problem

Hi all, I've got a little problem changing the ecoding of a string. Actually I read from a DB strings that are encoded using the codepage 850 and I have to prepare them in order to be suitable for an interoperable WCF service. From the DB I read characters \x10 and \x11 (triangular shapes) and i want to convert them to the Unicode for...

How to determine if a String contains invalid encoded characters

Usage scenario We have implemented a webservice that our web frontend developers use (via a php api) internally to display product data. On the website the user enters something (i.e. a query string). Internally the web site makes a call to the service via the api. Note: We use restlet, not tomcat Original Problem Firefox 3.0.10 se...