utf-8

What's the difference between unicode and utf8?

Is it true that unicode=utf16 ? UPDATE Many are saying unicode is a standard not an encoding,but most editors support save as Unicode encoding actually. ...

Storing binary data in UTF-8 string

I want to use a WebSocket to transfer binary data, but you can only use WebSockets to transfer UTF-8 strings. Encoding it using base64 is one option, but my understanding is that base64 is most desirable when your text might be converted from one format to another. In this case, I know the data will always be UTF-8, so is there a better...

Adding a UTF-8 BOM to an outputStream causes the last character to not be printed in java

I have this piece of code that adds a BOM UTF-8 to an outputstream . I need to add the BOM since Excel does not identify the UTF-8 encoding implicitly and hence french characters are shown as weird characters in excel . try { response.setContentType(getContentType(request, response)); response.setContentLength(...

Character Encoding Issue

I'm using an API that processes my files and presents optimized output, but some special characters are not preserved, for example: Input: äöü Output: äöü How do I fix this? What encoding should I use? Many thanks for your help! ...

how to create Postgres conversion from big5 to utf8

i use postgreSQL,in my server encoding is utf8 and at client_encoding is big5. when i insert chinese character always failed.. any idea? thanks guys ...

[C] check input for UTF-8, count characters, use regular expressions

Hi, I want to write a C-programm that gets some strings from input. I want to save them in a MySQL database. For security I would like to check, if the input is a (possible) UTF-8 string, count the number of characters and also use some regular expressions to validate the input. So my question is the following: Is there a library that ...

Flatten FDF / XFDF forms to PDF in PHP with utf-8 characters

My scenario: A PDF template with formfields: template.pdf An XFDF file that contains the data to be filled in: fieldData.xfdf Now I need to have these to files combined & flattened. pdftk does the job easily within php: exec("pdftk template.pdf fill_form fieldData.xfdf output flatFile.pdf flatten"); Unfortunately this does not wor...

How do I write a UTF-8 encoded string to a file in windows, in C++

Hello all, I have a string that may or may not have unicode characters in it, I am trying to write that to a file on windows. Below I have posted a sample bit of code, my problem is that when I fopen and read the values back out windows, they are all being interpreted as UTF-16 characters. char* x = "Fool"; FILE* outFile = fopen( "Se...

API-level Unicode GUI Native apps in C++ for Windows/Linux/Mac

API-level Unicode GUI Native apps in C++ for Windows / Linux / Mac OS X. I am looking for writing a simple Unicode, GUI, Native, application, that can be run without need any non-standard library, written in C++ compiled with GNU-GCC (g++). NOT I don't mean one-code-source run-anywhere, but 3 (Win/Linux/Mac) code source! run-without-l...

Insert Text with tilde in Sqlite3 on Android.

Hello. I'm trying to insert on Sqlite3 texts like 'descripción', and I'm geting on a shell conected to emulator with adb, strange caracters instead of 'ó'. I'm using the following data to insert: item.description = "Descripción del juego 1"; And I'm geting: Descripci|-n del juego 1 I've also tried this: item.description = new St...

iso-8895-1 to xml acceptable UTF-8

Hi Guys, I am parsing text/html from web pages into an xml feed, the text/html is encoded iso-8895-1 while the XML feed must be UTF-8. I have used html entities, but am having to manually replace loads of characters, here is what I have so far (still not parsing all text) $desc = str_replace(array("\n", "\r", "\r\n"),"",$desc); $de...

Zend_Config_XML encoding issue

Hello I am creating a XML navigation for my website. This line below is causing a simpleXML issue: <label>Osnabr&Atilde;&frac14;ck</label> My PHP code, using HTMLentities has changed Osnabrück into Osnabrck . However, when trying to parse my XML with this line in it, I get this error: /application/configs/navigation.xml:318: parser e...

Javascript and HTML: Saving file as UTF-8 without BOM

I'm trying to write an MSIE only HTML page (which I'll call the "Title Page") that allows someone to save a generated HTML webpage (which I'll call "New Page") with a click of a button. What I found out is that the "Save As" dialog box that appears does not allow for the "New Page" to be saved as UTF-8 without BOM. It is instead, being...

PHP: Problems finding the most frequent character in a UTF-8 string (eg 唐犬土用家犬尨犬山桑)?

From an MySQL database I can extract the following utf-8 characters: "唐犬土用家犬尨犬山桑山犬巴戦師子幻日幻月引綱忠犬愛犬戌年成犬教条教義" I am trying to find the most frequent character in this string. I tried putting each as element into an array $arr and do array_count_values($arr); Unfortunately the array operations (or is print_r the culprit?) produce mis-encoded...

LaTeX Question - Accents on characters

I refuse to believe that no one on stackoverflow can help me! Tone marks above Chinese characters in latex / Combining Accents in unicode My aim is to put tone marks above Chinese characters in latex, and google seems to not be letting on to the answer. Is it possible to use combining accents with chinese characters or can they only b...

Octal Escape in Java result in wrong byte value, Encoding problem?

According to this documentation ( http://java.sun.com/docs/books/jls/third_edition/html/lexical.html , 3.10.6) an OctalEscape will be converted to an unicode character. Now I have the problem, that the following code will result in a 2 byte Unicode character with wrong informations. for (byte b : "\222".getBytes()) { System.out.for...

How to get Microsoft's AntiXss library to URLEncode to the URI standard (RFC3986) instead of an IRI (RFC3987)?

I'm using the Microsoft AntiXss 3.1 library. We have a number of international sites which use non-Latin scripts. We're using SEO-friendly URL's, so we have non-ASCII characters that end up in the URL. AntiXss.UrlEncode (at least in 3.1) treats "international characters" as safe, so we end up with an IRI instead of a URI: http://somesi...

UTF-8 encoding problem in unix machine

Im exporting a set of data to excel in java, the data has certain non ascii characters, while exporting in Windows machine the data are coming correctly in UTF-8 encoded format.But when i deploy my code in Unix machine it is not working properly.UTF-8 encoding is not working properly. Im using Tomcat 5.5 server.I have also included URIen...

Monodevelop on OS X and Displaying UTF-8

Two Questions Does using Copy or paste cause monodevelop to crash, or is it just me? If you have Monodevelop installed, please can you test this, I fount that both shortcuts and from the menus cause it to crash. I seem unable to find information about this on google, though would personally consider this quite a major bug. How can UTF-...

Handling French special character in PHP

I'm using utf8_encode || utf8_decode ..., but I'm struggling to handle some special characters (é, è ê ... ,) I'm using php. Could someone help? ...