utf-8

UTF8 or UTF-8?

Which of the two is correct terminology? ...

Is this the correct way to send email with PHP?

I'm a bit worried if this function sends emails that can be recognized correctly on the majority of email and webmail clients the way it should, specifically I'm most concerned about this doubts: Are the UTF-8 declarations and attachments well formed? Do I need to use quoted_printable_decode()? If yes, where? Content-Transfer-Encoding:...

What utf format should boost wdirectory_iterator return?

If a file contains a £ (pound) sign then directory_iterator correctly returns the utf8 character sequence \xC2\xA3 wdirectory_iterator uses wide chars, but still returns the utf8 sequence. Is this the correct behaviour for wdirectory_iterator, or am I using it incorrectly? AddFile(testpath, "pound£sign"); wdirectory_iterator iter(test...

Gvim in UTF8 with Russian keyboard layout

I am a big fan of vim and gvim. But whenever I write localization code in PHP and have to translate some strings (primarily in Russian), I have to open Notepad to translate all the entries. That kinda sucks, but so far I have not found out how to make gvim work in utf8 mode. Any ideas would be appreciated. ...

NHibernate won't decode UTF-8 from MySql

Hey, I have populated a MySQL table with utf-8 strings (using a python script). You can assume that the string in the DB was correctly encoded (I've verified this by extracting the string from MySQL Query Browser and running a utf-8 decode... got my original unicode string). Now the problem begins when I try to load this string using N...

Firebird - UTF8 VarChar size

I am changing all varchar columns in our firebird database to UTF8 however I don't understand the difference in varchar size. For example, with the charset and collation set to nothing, we can set the varchar size to 255, if we set the charset and collation to UTF8, when we set the varchar to 255, it reads different values. What would ...

PHP character encoding problems

Hello guys. I need help with a character encoding problem that I want to sort once and for all. Here is an example of some content which I pull from a XML feed, insert into my database and then pull out. http://pastebin.com/d78d24f33 As you can see, a lot of special html characters get corrupted/broken. How can I once and for all sto...

Natural sorting algorithm in PHP with support for Unicode?

Is it possible to sort an array with Unicode / UTF-8 characters in PHP using a natural order algorithm? For example (the order in this array is correctly ordered): $array = array ( 0 => 'Agile', 1 => 'Ágile', 2 => 'Àgile', 3 => 'Âgile', 4 => 'Ägile', 5 => 'Ãgile', 6 => 'Test', ); If I try with asort($array)...

How to convert large UTF-8 strings into ASCII?

I need to convert large UTF-8 strings into ASCII. It should be reversible, and ideally a quick/lightweight algorithm. How can I do this? I need the source code (using loops) or the JavaScript code. (should not be dependent on any platform/framework/library) Edit: I understand that the ASCII representation will not look correct and wou...

Can a empty java string be created from non-empty UTF-8 byte array?

I'm trying to debug something and I'm wondering if the following code could ever return true public boolean impossible(byte[] myBytes) { if (myBytes.length == 0) return false; String string = new String(myBytes, "UTF-8"); return string.length() == 0; } Is there some value I can pass in that will return true? I've fiddled wit...

Reading UTF8 chars using innerHTML returns 0xfffd for all chars

I'm reading an HTML document that contains UTF-8 chars but when I access the innerHTML of the document, all the "bad" chars show up as 0xfffd. I've tried it in all the major browsers and it behaves the same way. When I alert() the innerHTML it shows those chars as a "diamond with a ? mark". Surprisingly the following works perfectly, co...

PHP MySQL Encoding Bug?

Heres my problem. I have a mysql table called quotes. In one of the rows, a quote contains the folloqing characters ‘ and ’ Now the row collation is utf8__unicode__ci When using MySQL Query Browser and PHPMyAdmin to retrive the rows the quotes come out as intended. How ever when i retrive them from the database using PHP and display ...

How do I convert stored misencoded data?

My Perl app and MySQL database now handle incoming UTF-8 data properly, but I have to convert the pre-existing data. Some of the data appears to have been encoded as CP-1252 and not decoded as such before being encoded as UTF-8 and stored in MySQL. I've read the O'Reilly article Turning MySQL data in latin1 to utf8 utf-8, but although it...

Hsqldb table encoding

How do I set the character encoding for a specific table? E.g: CREATE TABLE COMMENTS ( ID INTEGER GENERATED BY DEFAULT AS IDENTITY (START WITH 0, INCREMENT BY 1) NOT NULL, TXT LONGVARCHAR, PRIMARY KEY (ID) ) By default it's encoded as ASCII but I'd rather use UTF-8 for this one table. ...

Configuring Tomcat 5.5 to UTF-8 encode all sendRedirect() redirections?

A requirement of the product that we are building is that its URL endpoints are semantically meaningful to users in their native language. This means that we need UTF-8 encoded URLs to support every alphabet under the sun. We would also not like to have to provide installation configuration documentation for every application server and...

Using PHP's SoapClient to send UTF-16 Character to WCF Service

Hello all, My PHP application is taking user input and sending it to a WCF Web Service. Sometimes my users copy and paste from Word and get UTF-16 Characters into their message such as the "En Dash" \u2013 I get the following error when this occurs. PHP Fatal error: SOAP-ERROR: Encoding: string '\xe2...' is not a valid utf-8 st...

Reading a plist utf-8 value as utf-16

I'm working on an iphone app that needs to display superscripts and subscripts. I'm using a picker to read in data from a plist but the unicode values aren't being displayed corretly in the pickerview. Subscripts and superscripts are not being recognized. I'm assuming this is due to the encoding of the plist as utf-8, so the question ...

XML UTF-8 encoding checking

Hello everyone, I have an XML structure like this, some Student item contains invalid UTF-8 byte sequenceswhich may cause XML parsing fail for the whole XML document. What I want to do is, filter out Student item which contains UTF-8 byte sequences, and keep the valid byte sequences ones. Any advice or samples about how to do this in ....

Problem with Java properties utf8 encoding in Eclipse

I've recently had to switch encoding of webapp I'm working on from ISO-xx to utf8. Everything went smooth, except properties files. I added „-Dfile.encoding=UTF-8“ in eclipse.ini and normal files work fine. Properties however show some strange behaviour. If I copy utf8 encoded properties from Notepad++ and paste them in eclipse, they sh...

What could go wrong in switching HTML encoding from UTF-8 to UTF-16?

What are the implications of a change from UTF-8 to UTF-16 for HTML encoding? I would like to know your thoughts on the issue. Are there things I need to think of before making such a change? Note: Interested due to enormous amounts of japanese and chinese text I need to handle. ...