utf-8

Strange behaviour when converting string to UTF-8 character.

I have datatype stored as blob (XML) in Oracle DB. I am retrieving this column and converting to byte[] and then to string. I do some string operations and converting it to UTF-8 format and inserting back into the DB. Some special characters are inserted as junk characters. I do not really know what I am doing wrong? Any idea/ help woul...

Managing Unicode Data in MySQL and VB.NET

I wish to develop a client-server application in VB.NET. I want to store some fields in Unicode. As per MySQL documentation I tried the fields with varchar and charset UTF-8 for storing Unicode data. I could insert data using the MySQL connector command object but when I try to display data in datagridview some junk is appearing. What ...

How to create a utf8 db with mysqladmin

I feel like this should be simple but i can't work out how to set the character set when making a db with "mysqladmin create". I thought this would work mysqladmin -u root db_name --character-set=utf8 leveraging this bit of the mysqladmin --help text: -O, --set-variable=name Change the value of a variable. Ple...

How to convert string from cp1250 to utf-8 in Borland C++ Builder 6

Hi, I maintain an application written in Borland C++ 6. This app is using SQLite database. I am now extending it, so it can be used by unprivileged users, and so I had to move the database file to the home user directory. Unfortunately some of users have Polish national characters in their names, such as ą,ć,ę and some more. The syst...

SET NAMES utf8 in MySQL?

I often see something similar to this below in PHP scripts using MySQL query("SET NAMES utf8"); I have never had to do this for any project yet so I have a couple basic questions about it. Is this something that is done with PDO only? If it is not a PDO specific thing, then what is the purpose of doing it? I realize it is sett...

Reading UTF-16 (or UTF-8) values from XML and displaying result with PHP

Hi, I'm having a lot of trouble with unicode (UTF-16) values and PHP/XML. I want to read a set of unicode values from XML and output the correct glyphs to the browser. I've tried with UTF-8 and I get the same problem. This is a simple working example I used for my first test: $text = "\x00\x41"; $text = mb_convert_encoding($text, "AS...

Android XmlPullParser UTF-8 problem

Hello all, I have an XML document built with org.xmlpull.v1.XmlSerializer This document contains following XML prolog <?xml version='1.0' encoding='utf-8' standalone='yes' ?> When I try to parse this document using import org.xmlpull.v1.XmlPullParser; with following configuration code XmlPullParser pullParser = Xml.newPullPar...

Generating xml utf-16 sample from xsd

We use Visual Studio 2008 to generate a sample XML from a XSD. The XML that is generated is UTF 8, but we need UTF 16. Is there any way to do this? ...

Handling unicode values in GET parameters with PHP

I have the following test script on my server: <?php echo "Test is: " . $_GET['test']; ?> If I call it with a url like example.com/script.php?test=ɿ (ɿ being a multibyte character), the resulting page looks like this: Test is: É¿ If I try to do anything with the value in $_GET['test'], such as save it a mysql database, I have th...

How to force Visual Studio to honor the BOM at the start of a UTF-8 encoded CSS file?

Apparently, when Visual Studio 2008 (SP1) opens a CSS file, it doesn't recognize the UTF8 BOM marker as a BOM, but instead interprets it as text (first three characters show up as , but shouldn't be visible). While VS normally doesn't save the CSS files with a BOM, I'd expect the IDE to recognize and respect the BOM when it's there. ...

Should source code be saved in UTF-8 format

How important is it to save your source code in UTF-8 format? Eclipse on Windows uses CP1252 character encoding by default. The CP1251 format means non UTF-8 characters can be saved and I have seen this happen if you copy and paste from a Word document for a comment. The reason I ask is because out of habit I set-up Maven encoding to b...

Problem with Zend Framework and UTF-8 characters (æøå)

Hope here are some with more knowledge about Zend Framework than me, I've been trying to search for the answer but I'm not able to fin anything anywhere. Problem: When adding the content of a Zend_Form to the database with the use of Zend_Db the characters æ ø å is replaced by øæå System WampServer 2.0i Apache 2.2.11 MySQL 5.1....

utf-8 to/from utf-16 problem

I based these two conversion functions and an answer on StackOverflow, but converting back-and-forth doesn't work: std::wstring MultiByteToWideString(const char* szSrc) { unsigned int iSizeOfStr = MultiByteToWideChar(CP_ACP, 0, szSrc, -1, NULL, 0); wchar_t* wszTgt = new wchar_t[iSizeOfStr]; if(!wszTgt) assert(0); Mult...

Java File parsing toolkit design, quick file encoding sanity check

(Disclaimer: I looked at a number of posts on here before asking, I found this one particularly helpful, I was just looking for a bit of a sanity check from you folks if possible) Hi All, I have an internal Java product that I have built for processing data files for loading into a database (AKA an ETL tool). I have pre-rolled stages ...

mysql: possible loss of UTF data from char to text field type conversion?

Hi, I have a database that I use for a foreign language / vocabulary web application and was interfacing with it through phpMyAdmin (and of course php). I had about a thousand rows for Arabic text, which had been put into a char field. I wanted to expand the size of my entries, so I read that a text field could get me past the 255 cha...

utf8_general_ci convertion

my script is using utf8_general_ci and im trying to transfer to another script that also uses utf8_general_ci the problem is my script store everything as is, like "áéíóú" and the new script as "áéà óú", so im having characters problems like "ru��es" how can i convert that? ...

How can I check if a binary string is UTF-8 in mysql?

I've found a Perl regexp that can check if a string is UTF-8 (the regexp is from w3c site). $field =~ m/\A( [\x09\x0A\x0D\x20-\x7E] # ASCII | [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte | \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byt...

Can I include characters such as "ã" and "ê" in UTF-8 encoded XML, or must it be UTF-16 encoded?

Can I include characters such as "ã" and "ê" in UTF-8 encoded XML, or must it be UTF-16 encoded? ...

Vietnamese character in .NET Console Application (UTF-8)

Im trying to write down an UTF8 string (Vietnamese) into C# Console but no success. Im running on windows 7. I tried to use the Encoding class that convert string to char[] to byte[] and then to String, but no help, the string is input directly fron the database. Here is some example Tôi tên là Đức, cuộc sống thật vui vẻ tuyệt ...

Is there any way to write Hebrew in the Windows Console?

Is there any way to write Hebrew in the Windows Console? I tried the following: Console.OutputEncoding = new UTF8Encoding(false); Console.WriteLine("\u05D0\u05D1"); Console.ReadLine(); but instead of "אב" it writes some other Unicode character, that're not in the Hebrew ABC. Any ideas why? ...