unicode

How can I use Unicode characters when I write to Perl's format?

Basically I have a database where I get $lastname, $firstname, $rid, $since, $times and $ip from. Using a Perl script, I format the data to send it via e-mail. Since the $lastname and $firstname can contain special chars (for instance ä, ü, ß, é,...) I first decode the strings. my $fullname = decode("utf8", $lastname) . ', ' . decode("...

Python __str__ versus __unicode__

Is there a python convention for when you should implement __str__() versus __unicode__(). I've seen classes override __unicode__() more frequently than __str__() but it doesn't appear to be consistent. Are there specific rules when it is better to implement one versus the other? Is it necessary/good practice to implement both? ...

Emacs 23 fails to send Unicode combining diacritics through XWin -clipboard

Emacs 23 is running on a remote Linux box. It displays its frame on this local Windows box, using Cygwin's X server. I used to be able to copy-paste any text from Emacs to any Windows application. Since after I upgraded from release 22 to 23, combining diacritics don't come through any more. Non-combined characters pass unharmed. Fo...

utf-8 decoding problem in php

I got a .vcf file with parts encoded as UTF-8: CATEGORIES;CHARSET=UTF-8:Straße & –dienste Now "–" should be a "-" and "Straße" should convert to "Straße". I tried utf8_decode() iconv() mb_convert_encoding() And have been playing with several output encoding options like header('content-type: text/html; charset=utf-8'); mb...

Fetching or Deleting Entity from Google App Engine DB with Unicode Property Name

I have an Expando model kind in my App Engine datastore and I'm setting many arbitrary property names. I didn't consider that I couldn't store Unicode property names, and now I'm in a troubling situation where any attempt to fetch entities of this kind, or even deleting them to get rid of the offender get the following error: Traceback ...

Japanese COBOL code on IBM mainframe in Shift-JIS; represented after transfer to a PC how?

We have a Japanese client that has source code in COBOL on an mainframe. He claims the code on the mainframe is represented in Shift-JIS2 (and we think we understand that pretty well). When that code is transferred to an PC, what is the most common encoding used? We've sent him a program to process that COBOL code and it seems to cho...

Output RTF special characters to Unicode

I have been looking around ocn Google and Stackoverflow but haven't found what I needed, but my question seems quite simple. Anyhow; What is the way to convert a string of RTF special characters such as "\'d3\'d6" (In this case Russian) to unicode chars or string using C#? ...

Starting a new Windows app: Should I use _TCHAR or wchar_t for text?

Hi all, I'm coding up a new (personal hobby) app for Windows in c++. In previous low-level Windows stuff I've used _TCHAR (or just TCHAR) arrays/basic_strings for string manipulation. Is there any advantage to using _TCHAR over straight up Unicode with wchar_t, if I don't care about Windows platforms pre Win2k? edit: after submitti...

Am I correctly supporting UTF-8 in my PHP apps?

I would like to make sure that everything I know about UTF-8 is correct. I have been trying to use UTF-8 for a while now but I keep stumbling across more and more bugs and other weird things that make it seem almost impossible to have a 100% UTF-8 site. There is always a gotcha somewhere that I seem to miss. Perhaps someone here can corr...

How to declate a wide char constant in an IDL

We are migrating our C++ COM application to be unicode, and as part of this migration we want to migrate the constant strings in our IDL to unicode as well. The problem is that at the moment, we still compile it both in ANSI and in UNICODE, which means that we can't use the L"String" construct to declare wide charts. At the moment, our...

How do I turn off Unicode in a VC++ project?

I have a VC++ project in Visual Studio 2008. It is defining the symbols for unicode on the compiler command line (/D "_UNICODE" /D "UNICODE"), even though I do not have this symbol turned on in the preprocessor section for the project. As a result I am compiling against the Unicode versions of all the Win32 library functions, as o...

Unicode in Jar resources

I have a Unicode (UTF-8 without BOM) text file within a jar, that's loaded as a resource. URL resource = MyClass.class.getResource("datafile.csv"); InputStream stream = resource.openStream(); BufferedReader reader = new BufferedReader( new InputStreamReader(stream, Charset.forName("UTF-8"))); This works fine on Windows, but on Lin...

Piecewise conversion of an MFC app to Unicode/MBCS

I have a large MFC application that I am extending to allow for multi-lingual input. At the moment I need to allow the user to enter Unicode data in edit boxes on a single dialog. Is there a way to do this without turning UNICODE or MBCS on for the entire application? I only need a small part of the application converted at the moment...

How to replace the Unicode gem on Ruby 1.9?

Unfortunately, the Unicode 0.1 (sudo gem install unicode) doesn't work on Ruby 1.9. I have the following snippet: require "rubygems" require "unicode" str = "áéíóúç" Unicode.normalize_KD(str).gsub(/[^\x00-\x7F]/n, "") #=> aeiouc I use it to convert titles to permalink, without removing accented characters. Is there a way of converti...

How do I get str.translate to work with Unicode strings?

I have the following code: import string def translate_non_alphanumerics(to_translate, translate_to='_'): not_letters_or_digits = u'!"#%\'()*+,-./:;<=>?@[\]^_`{|}~' translate_table = string.maketrans(not_letters_or_digits, translate_to *len(not_lette...

How do I find the length of a Unicode string in Perl?

The perldoc page for length() tells me that I should use bytes::length(EXPR) to find a Unicode string in bytes, or and the bytes page echoes this. use bytes; $ascii = 'Lorem ipsum dolor sit amet'; $unicode = 'Lørëm ípsüm dölör sît åmét'; print "ASCII: " . length($ascii) . "\n"; print "ASCII bytes: " . bytes::length($ascii) . "\n"; prin...

Insert from MS SQL to Lotus Notes using NotesSQL drive

I am trying to sync up a SQL Server table with a Lotus Notes database. I have set up the NotesSQL ODBC driver and have been able to insert, update and select from the notes database form using the ActiveX Script Task in DTS. Everything works well until I try to insert Chinese characters into Text field in the notes database. After insert...

python - problems with regular expression and unicode

Hi I have a problem in python. I try to explain my problem with an example. I have this string: >>> string = 'ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿÀÁÂÃ' >>> print string ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿÀÁÂà and i want, for example, replace charachters different from Ñ,Ã,ï with "" i have tried: >>> rePat =...

Searching unicode text using regex

Searching a file which is written in Hindi(Devanagri) (UTF-16) gave rise to the following problem. The file contains: त्रास ततत जुग नींद ना हा बु Note that the first char 'त्र' is a multiple code point of त + ् + र Now while searching for 'त' I get 4 matches including the त of the first char. I am using Java. How can I go abo...

UILabel displaying Unicode Characters

Hello, I have an NSString that then sets a UILabel. This contains unicode such as... E = MC Hammer\U00ac\U2264 and complete ones such as \U2013\U00ee\U2013\U00e6\U2013\U2202\U2013\U220f\U2013\U03c0 \U2013\U00ee\U2013\U220f\U2013\U03c0\U2013\U00aa\U2013\U221e\U2014\U00c5 These are not displaying correctly, is there anythi...