questions about character-encoding | ansaurus

character-encoding

UTF8 to/from wide char conversion in STL

Is it possible to convert UTF8 string in a std::string to std::wstring and vice versa in a platform independent manner? In a Windows application I would use MultiByteToWideChar and WideCharToMultiByte. However, the code is compiled for multiple OSes and I'm limited to standard C++ library. ...

character-encoding

Help localizing application in Mac

Hi, I have an application which is supposed to work on both windows and Mac and is localized in Portuguese, Spanish and German. I have an ini file from where the localized strings are read from. But the ini file doesn't work with same encoding for the files on both platforms. For Windows I have to have the file in ANSI format or else t...

character-encoding

UTF8 vs. UTF16 vs. char* vs. what? Someone explain this mess to me!

I've managed to mostly ignore all this multi-byte character stuff, but now I need to do some UI work and I know my ignorance in this area is going to catch up with me! Can anyone explain in a few paragraphs or less just what I need to know so that I can localize my applications? What types should I be using (I use both .Net and C/C++, an...

character-encoding

Why does the string "¿" get translated to "Â¿" when calling .getBytes()

When writing the string "¿" out using System.out.println(new String("¿".getBytes("UTF-8"))); Â¿ is written instead of just ¿. WHY? And how do we fix it? ...

character-encoding

Why is ¿ displayed different in Windows vs Linux even when using UTF-8?

Why is the following displayed different in Linux vs Windows? System.out.println(new String("¿".getBytes("UTF-8"), "UTF-8")); in Windows: ¿ in Linux: Â¿ ...

character-encoding

Malformed UTF characters

I want to detect and replace the Malformed UTF-8 characters with blank space using Perl script while loading the data using SQL*Loader. How to do? ...

character-encoding

What does 'lew' stand for in 'lew2' or 'lew4'?

I'm seeing the term 'lew2' and 'lew4' being used in reference to character size in certain files. I know that the number represents how many bytes are used to store certain types of characters (maybe wide chars?), but I'm not sure what the 'lew' part stands for. My best guess is 'length of wide'. Can anyone enlighten me? ...

character-encoding

How to convert a C string (char array) into a Python string?

I have embedded a Python interpreter in a C program. Suppose the C program reads some bytes from a file into a char array and learns (somehow) that the bytes represent text with a certain encoding (e.g., ISO 8859-1, Windows-1252, or UTF-8). How do I decode the contents of this char array into a Python string? The Python string should ...

character-encoding

looking for a UTF-8 text editor

I am looking for a (simple) text editor that can handle text in different encodings in the same document. I need to develop some sites with mixed Japanese and English text and the editors I have now (on an English Windows system) are unable to display the Japanese text. Jedit files don't display the Japanese text I have inputted but whe...

character-encoding

How the heck can you edit valid XML in a webpage?

I've got to get a quick and dirty configuration editor up and running. The flow goes something like this: configuration (POCOs on server) are serialized to XML. The XML is well formed at this point. The configuration is sent to the web server in XElements. On the web server, the XML (Yes, ALL OF IT) is dumped into a textarea for editin...

character-encoding

How to find which character set is used by the database

Hi, I can access the database either from a .NET program (using ODBC) or through a database management tool (written in Java). If I write a 'é' character to the database from the .NET program, it appears as 'Õ' (capital O with tilde) in the DB management tool. If I write a 'é' character to the database from the DB management tool, it ...

character-encoding

HtmlEncode UTF-8

I'm using Server.HtmlEncode on a utf-8 string in asp-classic, which works fine until there are some accents in the string e.g. Rüstü Recber, which appears as RÃ¼stÃ¼ Recber (RÃ¼stÃ¼ Recber in the source). I've tried setting the Response.Charset property to utf-8 but this doesn't make any difference. ...

character-encoding

Java application failing on special characters.

An application I am working on reads information from files to populate a database. Some of the characters in the files are non-English, for example accented French characters. The application is working fine in Windows but on our Solaris machine it is failing to recognise the special characters and is throwing an exception. For example...

character-encoding

special-characters

when copy/paste 'hello' from Word into textarea it becomes 018hello 019 after saving

Hi, I have in Word ‘hello’ and when I paste it I get 018hello 019 so the apostrophes turn into these strange characters. The type of web application should not matter as the behaviour is different depending on the workstation I use. I checked with Notepad, Excel and Wordpad and this issue does not occur, only for Word. It should be a...

character-encoding

Is there a Python library function which attempts to guess the character-encoding of some bytes?

I'm writing some mail-processing software in Python that is encountering strange bytes in header fields. I suspect this is just malformed mail; the message itself claims to be us-ascii, so I don't think there is a true encoding, but I'd like to get out a unicode string approximating the original one without throwing a UnicodeDecodeError...

character-encoding

invalid-characters

UTF-8 latin-1 conversion issues, python django

ok so my issue is i have the string '\222\222\223\225' which is stored as latin-1 in the db. What I get from django (by printing it) is the following string, 'ââââ¢' which I assume is the UTF conversion of it. Now I need to pass the string into a function that does this operation: strdecryptedPassword + chr(ord(c) - 3 - intCounter -...

character-encoding

information seemingly coming out of mysqldb incorrectly, python django

In a latin-1 database i have '\222\222\223\225', when I try to pull this field from the django models I get back u'\u2019\u2019\u201c\u2022'. from django.db import connection (Pdb) cursor = connection.cursor() (Pdb) cursor.execute("SELECT Password from campaignusers WHERE UserID=26") (Pdb) row = cursor.fetchone() So I step into that an...

character-encoding

Save all files in Visual Studio project as UTF-8

I wonder if it's possible to save all files in a Visual Studio 2008 project into a specific character encoding. I got a solution with mixed encodings and I want to make them all the same (UTF-8 with signature). I know how to save single files, but how about all files in a project? ...

character-encoding

How to convert Unicode string into a utf-8 or utf-16 string?

How to convert Unicode string into a utf-8 or utf-16 string? My VS2005 project is using Unicode char set, while sqlite in cpp provide int sqlite3_open( const char *filename, /* Database filename (UTF-8) */ sqlite3 **ppDb /* OUT: SQLite db handle */ ); int sqlite3_open16( const void *filename, /* Database filename (UT...

character-encoding

MySQL collations not working as advertised in documentation

I'm trying to get my MySQL table to behave as the utf8 table in Example 2 from this MySQL Reference page: CREATE TABLE germanutf8 (c CHAR(10)) CHARACTER SET utf8 COLLATE utf8_unicode_ci; INSERT INTO germanutf8 VALUES ('Bar'), ('Bär'); SELECT * FROM germanutf8 WHERE c = 'Bär'; According to the example, this should yield: +------+ | c ...

character-encoding

1
2
3
4
5
...
51