character-encoding

UTF8 to/from wide char conversion in STL

Is it possible to convert UTF8 string in a std::string to std::wstring and vice versa in a platform independent manner? In a Windows application I would use MultiByteToWideChar and WideCharToMultiByte. However, the code is compiled for multiple OSes and I'm limited to standard C++ library. ...

Help localizing application in Mac

Hi, I have an application which is supposed to work on both windows and Mac and is localized in Portuguese, Spanish and German. I have an ini file from where the localized strings are read from. But the ini file doesn't work with same encoding for the files on both platforms. For Windows I have to have the file in ANSI format or else t...

UTF8 vs. UTF16 vs. char* vs. what? Someone explain this mess to me!

I've managed to mostly ignore all this multi-byte character stuff, but now I need to do some UI work and I know my ignorance in this area is going to catch up with me! Can anyone explain in a few paragraphs or less just what I need to know so that I can localize my applications? What types should I be using (I use both .Net and C/C++, an...

Why does the string "¿" get translated to "¿" when calling .getBytes()

When writing the string "¿" out using System.out.println(new String("¿".getBytes("UTF-8"))); ¿ is written instead of just ¿. WHY? And how do we fix it? ...

Why is ¿ displayed different in Windows vs Linux even when using UTF-8?

Why is the following displayed different in Linux vs Windows? System.out.println(new String("¿".getBytes("UTF-8"), "UTF-8")); in Windows: ¿ in Linux: ¿ ...

Malformed UTF characters

I want to detect and replace the Malformed UTF-8 characters with blank space using Perl script while loading the data using SQL*Loader. How to do? ...

What does 'lew' stand for in 'lew2' or 'lew4'?

I'm seeing the term 'lew2' and 'lew4' being used in reference to character size in certain files. I know that the number represents how many bytes are used to store certain types of characters (maybe wide chars?), but I'm not sure what the 'lew' part stands for. My best guess is 'length of wide'. Can anyone enlighten me? ...

How to convert a C string (char array) into a Python string?

I have embedded a Python interpreter in a C program. Suppose the C program reads some bytes from a file into a char array and learns (somehow) that the bytes represent text with a certain encoding (e.g., ISO 8859-1, Windows-1252, or UTF-8). How do I decode the contents of this char array into a Python string? The Python string should ...

looking for a UTF-8 text editor

I am looking for a (simple) text editor that can handle text in different encodings in the same document. I need to develop some sites with mixed Japanese and English text and the editors I have now (on an English Windows system) are unable to display the Japanese text. Jedit files don't display the Japanese text I have inputted but whe...

How the heck can you edit valid XML in a webpage?

I've got to get a quick and dirty configuration editor up and running. The flow goes something like this: configuration (POCOs on server) are serialized to XML. The XML is well formed at this point. The configuration is sent to the web server in XElements. On the web server, the XML (Yes, ALL OF IT) is dumped into a textarea for editin...

How to find which character set is used by the database

Hi, I can access the database either from a .NET program (using ODBC) or through a database management tool (written in Java). If I write a 'é' character to the database from the .NET program, it appears as 'Õ' (capital O with tilde) in the DB management tool. If I write a 'é' character to the database from the DB management tool, it ...

HtmlEncode UTF-8

I'm using Server.HtmlEncode on a utf-8 string in asp-classic, which works fine until there are some accents in the string e.g. Rüstü Recber, which appears as Rüstü Recber (Rüstü Recber in the source). I've tried setting the Response.Charset property to utf-8 but this doesn't make any difference. ...

Java application failing on special characters.

An application I am working on reads information from files to populate a database. Some of the characters in the files are non-English, for example accented French characters. The application is working fine in Windows but on our Solaris machine it is failing to recognise the special characters and is throwing an exception. For example...

when copy/paste 'hello' from Word into textarea it becomes 018hello 019 after saving

Hi, I have in Word ‘hello’ and when I paste it I get 018hello 019 so the apostrophes turn into these strange characters. The type of web application should not matter as the behaviour is different depending on the workstation I use. I checked with Notepad, Excel and Wordpad and this issue does not occur, only for Word. It should be a...

Is there a Python library function which attempts to guess the character-encoding of some bytes?

I'm writing some mail-processing software in Python that is encountering strange bytes in header fields. I suspect this is just malformed mail; the message itself claims to be us-ascii, so I don't think there is a true encoding, but I'd like to get out a unicode string approximating the original one without throwing a UnicodeDecodeError...

UTF-8 latin-1 conversion issues, python django

ok so my issue is i have the string '\222\222\223\225' which is stored as latin-1 in the db. What I get from django (by printing it) is the following string, 'ââââ¢' which I assume is the UTF conversion of it. Now I need to pass the string into a function that does this operation: strdecryptedPassword + chr(ord(c) - 3 - intCounter -...

information seemingly coming out of mysqldb incorrectly, python django

In a latin-1 database i have '\222\222\223\225', when I try to pull this field from the django models I get back u'\u2019\u2019\u201c\u2022'. from django.db import connection (Pdb) cursor = connection.cursor() (Pdb) cursor.execute("SELECT Password from campaignusers WHERE UserID=26") (Pdb) row = cursor.fetchone() So I step into that an...

Save all files in Visual Studio project as UTF-8

I wonder if it's possible to save all files in a Visual Studio 2008 project into a specific character encoding. I got a solution with mixed encodings and I want to make them all the same (UTF-8 with signature). I know how to save single files, but how about all files in a project? ...

How to convert Unicode string into a utf-8 or utf-16 string?

How to convert Unicode string into a utf-8 or utf-16 string? My VS2005 project is using Unicode char set, while sqlite in cpp provide int sqlite3_open( const char *filename, /* Database filename (UTF-8) */ sqlite3 **ppDb /* OUT: SQLite db handle */ ); int sqlite3_open16( const void *filename, /* Database filename (UT...

MySQL collations not working as advertised in documentation

I'm trying to get my MySQL table to behave as the utf8 table in Example 2 from this MySQL Reference page: CREATE TABLE germanutf8 (c CHAR(10)) CHARACTER SET utf8 COLLATE utf8_unicode_ci; INSERT INTO germanutf8 VALUES ('Bar'), ('Bär'); SELECT * FROM germanutf8 WHERE c = 'Bär'; According to the example, this should yield: +------+ | c ...