unicode

Ruby character encoding problems in netbeans and command wíndow

I use netbeans as development IDE and runs the application from cmd but have problems to display ISO 8859-1 characters like åäö correct in both cmd window and when I run the application from netbeans Question: What is best practice to set it up Right now I do @output.puts indent + "V" + 132.chr + "lkommen till Ruby Camping!" to get...

Unicode replacement characters for text matching

I have some fun with unicode text sources (all correct encodet) and I want to match names. The classic problem, one source comes correctly, an other has more flatten names: "Elbląg" vs. "Elblag" (see the character a) How can I "flatten" ą, á, â or à to a for better matching? Are there unicode to ascii- matching tables? ...

Converting latin mysql data to utf8

I want to use utf 8 right now , but all my data is latin1 , what is the efficient way to convert data . Also I know how to change database's structure(charset) to utf8 , What I want to do is changing charset of existing data . update Here are my old setting , Html output : utf8 Html input : utf8 Php - mysql connection : latin1 mysql ...

Reading Unicode files line by line C++

What is the correct way to read Unicode files line by line in C++? I am trying to read a file saved as Unicode (LE) by Windows Notepad. Suppose the file contains simply the characters A and B on separate lines. In reading the file byte by byte, I see the following byte sequence (hex) : FE FF 41 00 0D 00 0A 00 42 00 0D 00 0A 00 So ...

Dealing with wacky encodings in Python

I have a Python script that pulls in data from many sources (databases, files, etc.). Supposedly, all the strings are unicode, but what I end up getting is any variation on the following theme (as returned by repr()): u'D\\xc3\\xa9cor' u'D\xc3\xa9cor' 'D\\xc3\\xa9cor' 'D\xc3\xa9cor' Is there a reliable way to take any four of the abov...

BindingSource.Filter Problem [C# 2.0 on vs2008]

BindingSource.Filter property is doesn't work while i filter with the unicode characters. any soluctoion for that? or how to implement custom BindingSource to use correct filter property. ...

Culture Sensitive GetHashCode

Hi, I'm writing a c# application that will process some text and provide basic query functions. In order to ensure the best possible support for other languages, I am allowing the users of the application to specify the System.Globalization.CultureInfo (via the "en-GB" style code) and also the full range of collation options using the S...

UNICODE Names From Character.

Hi, Can anyone tell me how to get the Unicode name of a character in MFC. e.g. - Character - Name Z - LATIN CAPITAL LETTER Z [ - LEFT SQUARE BRACKET etc. Thanks, Dev ...

Python + PostgreSQL + strange ascii = UTF8 encoding error

I have ascii strings which contain the character "\x80" to represent the euro symbol: >>> print "\x80" € When inserting string data containing this character into my database, I get: psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0x80 HINT: This error can also happen if the byte sequence does not match the encodi ng ...

C++: Get LPCWSTR from wstringstream?

If I have a wstringstream, and I want to get its .str() data as a LPCWSTR, how can I do that? ...

Code to strip diacritical marks using ICU

Can somebody please provide some sample code to strip diacritical marks (i.e., replace characters having accents, umlauts, etc., with their unaccented, unumlauted, etc., character equivalents, e.g., every accented é would become a plain ASCII e) from a UnicodeString using the ICU library in C++? E.g.: UnicodeString strip_diacritics( Un...

How to use Unicode (UTF-8) in C++

Possible Duplicate: Unicode in C++ If I remembered correctly, the default character and string encoding in C++ are ASCII. Is there a simple way to enable Unicode support? ...

tchar safe functions -- count parameter for UTF-8 constants

I'm porting a library from char to TCHAR. the count parameter of this fragment, according to MSDN, is the number of multibyte characters, not the number of bytes. so, did I get this right? My project properties in VC9 say 'use unicode character set' and I think that's correct, but I'm not how that impacts my count parameter. _tcsncmp(ac...

How does uʍop-ǝpısdn text work?

I have found upside down text in this website: http://www.cheesygames.com/upside-down-text how does it work? does unicode have upside down chars? Or what? How can I write my own text flipping function? ...

What character encoding should I use for a web page containing mostly Arabic text? Is utf-8 okay?

What character encoding should I use for a web page containing mostly Arabic text? Is utf-8 okay? ...

How to get number of bytes read from QTextStream

The following code I am using to find the number of read bytes from QFile. With some files it gives the correct file size, but with some files it gives me a value that is approximatively fileCSV.size()/2. I am sending two files that have same number of characters in it, but have different file sizes link text. Should i use some other obj...

How Do I grep For non-ASCII Characters in UNIX

I have several very large XML files and I'm trying to find the lines that contain non-ASCII characters. I've tried the following: grep -e "[\x{00FF}-\x{FFFF}]" file.xml But this returns every line in the file, regardless of whether the line contains a character in the range specified. Do I have the syntax wrong or am I doing somethin...

How do I obtain a code point integer from a 1 to 4 byte UTF-8 encoded sequence in Windows?

Hello, I am Patrick Niedzielski, a programmer for the Free Software 3D adventure game Humm and Strumm. I'm working on a minimal Unicode character class in C++. I currently have an array of four bytes representing a UTF-8 sequence. On GNU/Linux, I can just convert to UTF-32 with iconv(), but on Windows, I cannot do this. Is it possib...

Intra-Unicode "lean" Encoding Converters

Windows provides encoding conversion functions ("MultiByteToWideChar" and "WideCharToMultiByte") which are capable of UTF-8 to/from UTF-16 conversions, among other things. But I've seen people offer home-grown 30 to 40 line functions that claim also to perform UTF-8 / UTF-16 encoding conversions. My question is, how reliable are such t...

VB.NET, MySQL and Unicode

How to input the textbox's unicode string to MySQL database. I changed utf8 charset the MySQL Database. I'm using VB.NET 2005 and MySQL Database for Window application. Please Help me. ...