utf-8

Is there any reason to prefer UTF-16 over UTF-8?

Examining the attributes of UTF-16 and UTF-8, I can't find any reason to prefer UTF-16. However, checking out Java and C#, it looks like strings and chars there default to UTF-16. I was thinking that it might be for historic reasons, or perhaps for performance reasons, but couldn't find any information. Anyone knows why these languages...

A PHP Library / Class to Count Words in Various Languages?

Some time in the near future I will need to implement a cross-language word count, or if that is not possible, a cross-language character count. By word count I mean an accurate count of the words contained within the given text, taking the language of the text. The language of the text is set by a user, and will be assumed to be correc...

Outputing UTF-8 string on Mac OS's Terminal

I got a programm in haskell outputting utf-8 using the package utf8-string and using only the output functions of this package. I set the encoding of each file I write to this way : hSetEncoding myFile utf8 {- myFile may be stdout -} but when I try to output : alpha = [toEnum 0x03B1] {- α -} instead of the nice alpha letter I got ...

C# UTF8 output keep encoded characters intact

Hello, i have a very simple question I can't seem to get my head around. I have a properly encoded UTF8-String I parse into a JObject with Json.NET, fiddle around with some values and write it to the commandline, keeping the encoded characters intact. Everything works great except for the keeping the encoded characters intact part. Co...

Display EURO Symbol in UTF-8 Format

Im wondering how I can display a Euro symbol in UTF-8 Format ? Cheers ...

UTF-8 MySQL and Charset, pls help me understand this once and for all!

Can someone explain me when I set everything to UTF-8 I keep getting those damn ��� MySQL Server version: 5.1.44 MySQL charset: UTF-8 Unicode (utf8) I create a new database name: utf8test collation: utf8_general_ci MySQL connection collation: utf8_general_ci My SQL looks like this: SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO"; CREATE TABL...

How do I read UTF-8 characters via a pointer?

Suppose I have UTF-8 content stored in memory, how do I read the characters using a pointer? I presume I need to watch for the 8th bit indicating a multi-byte character, but how exactly do I turn the sequence into a valid Unicode character? Also, is wchar_t the proper type to store a single Unicode character? This is what I have in ...

Foreign Characters not displayed correctly in form components

I retrieve data from a MySQL database which is often in foreign languages and encoded with UTF-8. This displays fine normally but when set as the value for a textarea or text box it doesn't display correctly. Infact it looks like this: ыкаепнрошгув When normally ...

Working with Foreign languages

My DB needs to hold strings containing foreign language characters such that; User enters string into form, form submitted and string added to DB, string will be displayed on page for viewing. I would like to use UTF8 as this will be able to handle all of the required languages. Currently I believe my DB is set to 'latin1' but webpages ...

special characters strange behavior

Hi, i have this string in my utf-8 mysql DB: "Pruebá de eñes" When i print it like plain text, everything works ok, but if i load that same field inside an input, textarea, etc, it becomes: "Pruebá de eñes" How can i solve this problem? =( ...

strnicmp equivalent for UTF-8?

What do I use to perform a case-insensitive comparison on two UTF-8 encoded sub-strings? Essentially, I'm looking for a strnicmp function for UTF-8. ...

Qt and unicode escape string.

I'm getting from server data using signal and slot. Here is slot part: QString text(this->reply->readAll()); Problem is, that in text variable will be unicode escape, for example: \u043d\u0435 \u043f\u0430\u0440\u044c\u0441\u044f ;-) Is there any way to convert this? ...

C# web request with POST encoding question

On the MSDN site there is an example of some C# code that shows how to make a web request with POST'ed data. Here is an excerpt of that code: WebRequest request = WebRequest.Create ("http://www.contoso.com/PostAccepter.aspx "); request.Method = "POST"; string postData = "This is a test that posts this string to a Web server."; byte[] by...

Is Django double encoding a Unicode (utf-8?) string?

I'm having trouble storing and outputting an ndash character as UTF-8 in Django. I'm getting data from an API. In raw form, as retrieved and viewed in a text editor, given unit of data may be similar to: "I love this detergent \u2013 it is so inspiring." (\u2013 is & ndash; as an html entity). If I get this straight from an API and...

How to generate real UTF-8 XML with grails without the escape characters?

I have been wondering why when I set the encoding to UTF-8 and rendering the XML it replace the extended characters by escape characters (or character reference) like ’ instead of '? I'm using the Render method render(contentType:"text/xml", encoding:"UTF-8") {...} with a proper header render(contentType:"text/xml", encoding:...

Latin characters in phpMyAdmin with UTF-8 collation

My website uses: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"&gt; <html xmlns="http://www.w3.org/1999/xhtml"&gt; And this meta: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> I created my database and tables in phpMyAdmin making sur...

How to encode 'á' to '&#225' with C# ?? (UTF8)

Hi all I'm trying to write an XML file with UTF-8 encode, and the original string can have invalid characters like 'á', so, i need to change these invalid characters to a valid ones. I know that there is an encoding method that take, for example, character á and transform it to group of characters &#225;. I am trying to achive this wi...

Converting latin mysql data to utf8

I want to use utf 8 right now , but all my data is latin1 , what is the efficient way to convert data . Also I know how to change database's structure(charset) to utf8 , What I want to do is changing charset of existing data . update Here are my old setting , Html output : utf8 Html input : utf8 Php - mysql connection : latin1 mysql ...

Changing character encoding in MySQL, PHP scripts, HTML

So, I have built on this system for quite some time, and it is currently outputting Latin1 (ISO-8859-1) to the web browser, and this is the components: MySQL - all data is stored with the Latin1 character set PHP - All PHP text files are stored on disk with Latin1 encoding HTML - The output has the http-equiv="content-type" content="...

How ensure if java program uses UTF-8 encoding

Hi, I recently discovered that relying on default encoding of JVM causes bugs. I should explicitly use specific encoding ex. UTF-8 while working with String, InputStreams etc. I have a huge codebase to scan for ensuring this. Could somebody suggest me some simpler way to check this than searching the whole codebase. Thanks Nayn ...