unicode

Protocol Buffer and UTF16 Unicode.

How do you use Google Protocol Buffer with UTF16 Unicode? ...

Type of UTF-16 encoding, using wofstream in Windows

Recently, I want to write a text file in unicode (UTF-16) under Windows. By refering to http://www.codeproject.com/KB/stl/upgradingstlappstounicode.aspx, here is the code I am applying. When I use Notepad to open up the document, here is the display. Newline seems disappear!!! When I use Firefox with UTF-16 encoding selected, here i...

python: trouble printing short utf-encoded strings

(The following is using Python 2.6.1) I have 2 strings: >>> a = u'\u05e8\u05db\u05e1' >>> b = u'\u05e8\u05db\u05e1 \u05d4\u05d9\u05d0 \u05de\u05d0\u05d9\u05e8\u05d4 \u05d1\u05e4\u05e0\u05e1' I encode them: >>> ua = a.encode('utf-8') >>> ub = b.encode('utf-8') >>> ua '\xd7\xa8\xd7\x9b\xd7\xa1' >>> ub '\xd7\xa8\xd7\x9b\xd7\xa1 \xd7\x9...

How do I use unicode (UTF-8) characters in Clojure regular expressions?

This is a double question for you amazingly kind Stacked Overflow Wizards out there. How do I set emacs/slime/swank to use UTF-8 when talking with Clojure, or use UTF-8 at the command-line REPL? At the moment I cannot send any non-roman characters to swank-clojure, and using the command-line REPL garbles things. It's really easy to do ...

Reading Unicode Text from Java ResultSet

how to read unicode text from java resultset? ...

How do I match unicode characters in Java

I m trying to match unicode characters in Java. Input String: informa String to match : informátion So far I ve tried this: Pattern p= Pattern.compile("informa[\u0000-\uffff].*", (Pattern.UNICODE_CASE|Pattern.CANON_EQ|Pattern.CASE_INSENSITIVE)); String s = "informátion"; Matcher m = p.matcher(s); if(m.matches()){ ...

Python: Printing Unicode to File

file = open('unicode.txt', 'wb') for i in range(10): file.write(str(unichr(i) )) What i would like to do is to print all of the Unicode values to a text file ...

Python string format character for __unicode__?

Firstly, is there one? If not, is there a nice way to force something like print '%s' % obj to call obj.__unicode__ instead of obj.__str__? ...

Unicode supported isdigit and isspace function

I have the following code. // mfc.cpp : Defines the entry point for the console application. // #include "stdafx.h" #include "mfc.h" #ifdef _DEBUG #define new DEBUG_NEW #endif #include <cctype> #include <string> #include <sstream> #include <tchar.h> #include <iostream> #include <Strsafe.h> #include <algorithm> #include <cmath> #inclu...

XML entity references

I need to represent special characters like superscripts, copyright symbols etc in XML. What's the best way to do this? I'm confused as XML defines 5 entity references for "<" , ">" etc. I always use < and > but could, or should, I use Unicode decimal, U+003C, instead? Or will an XML processor treat these the same as if I'd typed "<"...

UnicodeEncodeError: 'ascii' codec can't encode character when trying a HTTP POST in Python

Hi there, I'm trying to do a HTTP POST with a unicode string (u'\xe4\xf6\xfc') as a paremter in Python, but I receive the following error: UnicodeEncodeError: 'ascii' codec can't encode character This is to the code used to make the HTTP POST (with httplib2) http = httplib2.Http() userInfo = [('Name', u'\xe4\xf6\xfc')] data = ur...

Unicode mirror character?

‮?retcarahc "rorrim" edocinu eht htiw detaicossa ytilibarenluv fo tros emos ereht sI?ksir yna ereht erA ?rof ti si tahW ...

Confirm that Python 2.6 ftplib does not support Unicode file names? Alternatives?

Can someone confirm that Python 2.6 ftplib does NOT support Unicode file names? Or must Unicode file names be specially encoded in order to be used with the ftplib module? The following email exchange seems to support my conclusion that the ftplib module only supports ASCII file names. Should ftplib use UTF-8 instead of latin-1 encodin...

Dll built in Delphi 2010/2009 not compatible to Delphi 7 when an Exception is raised.

Hello guys, I've built a dll in Delphi 2010 and it's consumed in my delphi 7 application. I'm aware of the unicode AnsiString / string matter and according to my tests everything works fine up to the moment that no exception is raised by my delphi 2010 dll. The fact is, is there any special/treated exception that is compatible to the...

All characters that may be bullet points (e.g. "*") or "dash" points

This question is a simple point (pardon the pun): What are all the characters that may, when starting a paragraph, be reasonably interpreted as indicating (in the Anglo-saxon demographic) that the paragraph was meant to be a bullet point or a "dash" point. Here are the ones I would expect, so far: Bullets Asterisk: "*", HTML entity ...

What is the equivalent of chr(153) (The TM SYMBOL) in Unicode

In earlier Delphi versions, I could use s:=chr(153); to get a trademark symbol in a string. In Delphi 2010, that doesn't work any longer, perhaps to do with unicode. What is the equivalent code string to put the TM symbol into my string? ...

Can Ruby 1.9.1 finally get a list of filenames if the filenames have unicode characters on Windows?

Can Ruby 1.9.1 finally get a list of filenames if the filenames have unicode characters on Windows? I think back in the Ruby 1.8.6 and 1.8.7 days, that wasn't possible on Windows. ...

Python "denormalize" unicode combining characters

I'm looking to standardize some unicode text in python. I'm wondering if there's an easy way to get the "denormalized" form of a combining unicode character in python? e.g. if I have the sequence u'o\xaf' (i.e. latin small letter o followed by combining macron), to get ō (latin small letter o with macron). It's easy to go the other way: ...

Add a custom tool to toolchain to remove UTF-8 BOM before compile

My question is in the context of Code::Blocks and its tweaked version of MinGW, and Notepad++ . I want to be able to include Unicode literals in my source, and I can, so long as I use UTF-8 and not use a BOM. This works fine, up to a point, but it BOMs out (bad pun) whenever I reopen the file; it (not surprisingly) has this un-nerving ...

Ruby works well with Unicode character in Filenames on Mac OS X and on Linux, but why to make it work on Windows, it took at least 2 years?

Ruby works well with Unicode character in File Path and Filenames on Mac OS X and on Linux, but why to make it work on Windows, it took more than 2 years? I was just looking at Google Code Jam. People are solving non-trivial problems within a few hours. At work, I can imagine solving a filename or path issue having unicode characters ...