unicode

How do I assign a literal chinese string to a wchar_t* in visual studio(c++)?

I am trying to compile the following code in my test application on windows in visual studio for C++: const wchar_t* chinese = "好久不见"; But I get the following error: error C2440: 'initializing' : cannot convert from 'const char [5]' to 'const wchar_t * I am compiling with unicode, so I am confused about this. The error goes away...

Is digit (number) part of unicode?

Hi, I know unicode contains all characters from most world aphabets..but what about digits? Are they part of unicode or not? I was not able to find straight answer. Thanks ...

Show hexadecimal dump of string

Is there any way to obtain an hexadecimal dump of a string in SQL Server? It'd be useful to troubleshoot character set and collation issues. In MySQL you'd do SELECT HEX('€uro') and in Oracle you'd do SELECT DUMP('€uro') FROM DUAL. ...

Checklist for going the Unicode way with Perl

I am helping a client convert their Perl flat-file bulletin board site from ISO-8859-1 to Unicode. Since this is my first time, I would like to know if the following "checklist" is complete. Everything works well in testing, but I may be missing something which would only occur at rare occasions. This is what I have done so far (forgiv...

Django UnicodeDecodeError when using pdb

I've notice every time I put an: import pdb; pdb.set_trace() in My Spanish Django project, if I have a specific Unicode character in a string like: Gracias por tu colaboración I get a UnicodeDecodeError with an 'ordinal not in range(128)' in a Django Debug window. The problem is that I can not debug my application easily. On the ot...

What is the need to ENCODE Unicode characters via UTF,etc ? Why can't we simply store them as binary of code points ?

Unicode simply assigns an integer to each character. UTF-8 or others are used to encode these integers ("code points") to a sequence of bytes to be stored in the memory. My question is that why can't we simply store the character as the binary representation of its Unicode value (the "code point") ? Consequently, some languages have char...

Python print works differently on different servers

When I try to print an unicode string on my dev server it works correctly but production server raises exception. File "/home/user/twistedapp/server.py", line 97, in stringReceived print "sent:" + json File "/usr/lib/python2.6/dist-packages/twisted/python/log.py", line 555, in write d = (self.buf + data).split('\n') exceptions.U...

Where can I get a list of Unicode chars by class?

I'm new to learning Unicode, and not sure how much I have to learn based on my ASCII background, but I'm reading the C# spec on rules for identifiers to determine what chars are permitted within Azure Table (which is directly based on the C# spec). Where can I find a list of Unicode characters that fall into these categories: letter-c...

Python / Mako : How to get unicode strings/characters parsed correctly ?

Hi. I'm trying to get Mako render some string with unicode characters : tempLook=TemplateLookup(..., default_filters=[], input_encoding='utf8',output_encoding='utf-8', encoding_errors='replace') ... print sys.stdout.encoding uname=cherrypy.session['userName'] print uname kwargs['_toshow']=uname ... return tempLook.get_template(page).re...

Python: Passing unicode string to C++ module

I'm working with an existing module at the moment that provides a C++ interface and does a few operations with strings. I needed to use Unicode strings and the module unfortunately didn't have any support for a Unicode interface, so I wrote an extra function to add to the interface: void SomeUnicodeFunction(const wchar_t* string) How...

JavaScript strings outside of the BMP

According to JavaScript: the Good Parts: JavaScript was built at a time when Unicode was a 16-bit character set, so all characters in JavaScript are 16 bits wide. This leads me to believe that JavaScript uses UCS-2 (not UTF-16!) and can only handle characters up to U+FFFF. Further investigation confirms this: > String.fromCharCod...

How can I convert CGI input to UTF-8 without Perl's Encode module?

Through this forum, I have learned that it is not a good idea to use the following for converting CGI input (from either an escape()d Ajax call or a normal HTML form post) to UTF-8: read (STDIN, $_, $ENV{CONTENT_LENGTH}); s{%([a-fA-F0-9]{2})}{ pack ('C', hex ($1)) }eg; utf8::decode $_; A safer way (which for example does not allow bog...

Unicode problems when using io.StringIO to mock a file

I am using an io.StringIO object to mock a file in a unit-test for a class. The problem is that this class seems expect all strings to be unicode by default, but the builtin str does not return unicode strings: >>> buffer = io.StringIO() >>> buffer.write(str((1, 2))) TypeError: can't write str to text stream But >>> buffer.write(str(...

How would one store German text in an embedded system?

I've created a memory mapped 1 bit interface to an LCD in an embedded system, along with 4 or 5 bit mapped fonts for the 90+ printable ASCII characters. Writing to the screen is as simple as using an echo like statement (it's embedded Linux). Other than something strictly proprietory, what recommendations can people make for storing Ge...

grepping binary files and UTF16

Standard grep/pcregrep etc. can conveniently be used with binary files for ASCII or UTF8 data - is there a simple way to make them try UTF16 too (preferably simultaneously, but instead will do)? Data I'm trying to get is all ASCII anyway (references in libraries etc.), it just doesn't get found as sometimes there's 00 between any two ch...

Converting domain names to idn in python

I have a long list of domain names which I need to generate some reports on. The list contains some IDN domains, and although I know how to convert them in python on the command line: >>> domain = u"pfarmerü.com" >>> domain u'pfarmer\xfc.com' >>> domain.encode("idna") 'xn--pfarmer-t2a.com' >>> I'm struggling to get it to work with a ...

Was just sent a JS virus. How do I safely display the output?

I just received a virus that looks something like this <script type='text/javascript'> <!-- var s="=nfub!iuuq.frvjw>#sfgsfti#!------REST OF PAYLOAD REMOVED-----?"; m=""; for (i=0; i<s.length; i++) { if(s.charCodeAt(i) == 28) { m+= '&'; } else if (s.charCodeAt(i) == 23) { m+= '!';} else { m+=String.fromCharCode(s...

What does sorting mean in double-byte languages?

I have some code that sorts table columns by object properties. It occurred to me that in Japanese or Chinese (non-alphabetical languages), the strings that are sent to the sort function would be compared the way an alphabetical language would. Take for example a list of Japanese surnames: 寿拘 松坂 松井 山田 藤本 In English, these would be S...

Android, mysql, and rendering non Latin Characters as well as Latin?

Are these squares a representation of chinese characters being turned into unicode? EDIT:[Here I entered the squares with numbers inside them into the post but they didn't render] I'd like to either turn this back into the original characters when displayed in android (or to enable mysql to just store them as chinese characters not in...

What is the easiest way to parse a string and replace 1/2 with ½ (and similar) in PHP?

I have a string with many fractions like 1/2, 1/4 etc. I want to replace them with their Unicode equivalents. I realise I could pick them up with /\s(\d+)\/(\d+)\s/ How would I replace them with their Unicode equivalents? I could probably wrap the numbers in span and do something similar with CSS, but I was wondering if there was an ...