unicode

Should I strip the XML declaration from suds output before parsing with lxml?

I’m trying to implement a SOAP webservice in Python 2.6 using the suds library. That is working well, but I’ve run into a problem when trying to parse the output with lxml. Suds returns a suds.sax.text.Text object with the reply from the SOAP service. The suds.sax.text.Text class is a subclass of the Python built-in Unicode class. In es...

Chinese/japanese characters in a search box and form.

Why is it that when I use Firefox to enter: 漢, the GET will transform to: q=%E6%BC%A2&start=0 However, when I use IE8 and I type the same chinese character, the GET is: q=?&start=0 It turns it into a question mark. ...

Are L'A' and 'A' totally same?

When we write a program which supports both unicode and multibytes, we often use _T("some string") macro for strings. But, does a character also need to wrap this macro? Are L'A' and 'A' totally same? Don't we need to wrap _T('A') for a character? ...

Delphi 2010 variant to unicode problem

I am working on a DLL in Delphi 2010. It exports a procedure that receives an array of variants. I want to be able to take one of these variants, and convert it into a string, but I keep getting ????? I cannot change the input variable - it HAS to be an array of variants. The host app that calls the DLL cannot be changed. It is written ...

Converting to and from Unicode in PHP

Hey, I'm using php 5 and need to communicate with another server that runs completely in unicode. I need to convert every string to unicode before sending it over. This seems like an easy task, but I haven't been able to find a way to do it yet. Is there a simple function that returns a unicode string? i.e. convert_to_unicode("the string...

Python: UnicodeEncodeError when reading from stdin

When running a Python program that reads from stdin, I get the following error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 320: ordinal not in range(128) How can I fix it? Note: The error occurs internal to antlr and the line looks like that: self.strdata = unicode(data) Since I don't want to modi...

How do I eliminate TT's "Wide character in print" warning ?

I have this warning every time I run my CGI-script (output is rendered by Template::Toolkit): Wide character in print at /usr/local/lib/perl5/site_perl/5.8.9/mach/Template.pm line 163. What's the right way to eliminate it? I create the tt object using this config: my %config = ( ENCODING => 'utf8', INCLUDE_PATH...

What is the difference between charsets and character encoding.

What is the difference between charsets and character encoding? When i say i am using utf-8 encoding then what will be my charset? Does it take unicode as charset by default? ...

doublechecking: no db-wide 'unicode switch' for sql server in the foreseeable future, i.e. like Oracle

Hi all, I believe I know the answer to this question, but wanted to confirm: Question Does Sql server (or will it in the foreseeable future), offer a database-wide "unicode switch" which says "store all characters in unicode (UTF-16, UCS-2, etc)", i.e. like Oracle. The Context Our application has provided "CJK" (Chinese-Japanese-Kor...

Unicode sources in eMbedded Visual C++, unicode string literals

Hi all, I'm back-porting a Windows Mobile project from Visual Studio 2005 to eMbedded Visual C++, 'cause VS2005 does not compile for the SH3 CPU family, only SH4. I ran into the fact that eVC 3 does not support Unicode sources, and I have quite a few of those, not easily convertible to a single-byte encoding. Question - between eVC 4, ...

Wordpress is ignoring Unicode Chars in URL

Hi, I am using wordpress with this type of permalink: /%year%/%monthnum%/%postname%/ if I use this type of url: example.com/2010/03/तकनीक it treats this url like this example.com/2010/03/ (By ignoring unicode chars) and displays March 2010 archive list. if I use english url: example.com/2010/03/technology then it works perfectly. T...

Python utf-8, howto align printout

Hi, I have a array containing japanese caracters as well as "normal". How do I align the printout of these? #!/usr/bin/python # coding=utf-8 a1=['する', 'します', 'trazan', 'した', 'しました'] a2=['dipsy', 'laa-laa', 'banarne', 'po', 'tinky winky'] for i,j in zip(a1,a2): print i.ljust(12),':',j print '-'*8 for i,j in zip(a1,a2): print...

Character encoding for US Census Cartographic Boundary Files

I'm trying to import the US Census cartographic boundary files (available here: http://www.census.gov/geo/www/cob/bdy_files.html ) into a GeoDjango application. However, python is complaining about UnicodeDecodeErrors (for example, for the non-ascii characters in Puerto Rico). The shapefile description file (*.dbf) doesn't specify what...

Django approximate matching of unicode strings with ascii equivalents

I have the following model and instance: class Bashable(models.Model): name = models.CharField(max_length=100) >>> foo = Bashable.objects.create(name=u"piñata") Now I want to be able to search for objects, but using ascii characters rather than unicode, something like this: >>> Bashable.objects.filter(name__lookslike="pinata") ...

C++ read registry string value in char*

I'm reading a registry value like this: char mydata[2048]; DWORD dataLength = sizeof(mydata); DWORD dwType = REG_SZ; ..... open key, etc ReqQueryValueEx(hKey, keyName, 0, &dwType, (BYTE*)mydata, &dataLength); My problem is, that after this, mydata content looks like: [63, 00, 3A, 00, 5C, 00...], i.e. this looks like a unicode?!?!. I...

Using C# to detect whether a filename character is considered international

I've written a small console application (source below) to locate and optionally rename files containing international characters, as they are a source of constant pain with most source control systems (some background on this below). The code I'm using has a simple dictionary with characters to look for and replace (and nukes every othe...

html tag attribute displayed in unicode

I have the following code, from which you can see that, I use the same way to create the text in utf-8. The text shown between html tags are shown correctly. But the text shown as html tag attribute are shown in unicode. I'm positive that on the server side(PHP), both texts are treated in the same way and are encoded in utf-8. Why the t...

Display WCHAR Strings in Xcode Debugger

I'd like to preview WCHAR strings in the variable display of the Xcode 3.2 debugger. Bascially if I have WCHAR wtext[128]; wcscpy(wtext, L"Hello World"); I'd like to see "Hello World" for wtext when tracing into the function. ...

What exactly happens when Complex Script Support is enabled?

When we click the check box "Install files for complex script and right to left languages (including Thai)" in Regional and Language settings what exactly happens? Changes to registry keys? I noticed that it installs some .fon files and keyboard dlls. Is this totally necessary if one just wish to read complex script on Windows XP? M...

convert ü to u

hi all I'm using a database that contains contacts (fields like name, address, ...). If i'm using in my database a city that contains special chars (like ü) or html codes (like ü), then how can i convert them to u, so when i search for a city that contains that a special char should be shown in the result... the database is MyISAM...