utf-8

Producing valid XML with Java and UTF-8 encoding

I am using JAXP to generate and parse an XML document from which some fields are loaded from a database. Code to serialize the XML: DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder(); Document doc = builder.newDocument(); Element root = doc.createElement("test"); root.setAttribute("version", text); doc....

Character set woes

I have a small ajax application built with php. Using phpMyAdmin I have set a mysql database to utf-8, and have imported a textfile containing utf-8 data into it. This worked fine on a windows machine with easyphp, after adding character-set-server=utf8 and default-character-set=utf8 to the my.cnf file. I have now tried to move this t...

What's the best way to export UTF8 data into Excel?

So we have this web app where we support UTF8 data. Hooray UTF8. And we can export the user-supplied data into CSV no problem - it's still in UTF8 at that point. The problem is when you open a typical UTF8 CSV up in Excel, it reads it as ANSII encoded text, and accordingly tries to read two-byte chars like ø and ü as two separate charact...

System.Net.Mail and =?utf-8?B?XXXXX.... Headers

I'm trying to use the code below to send messages via System.Net.Mail and am sometimes getting subjects like '=?utf-8?B?W3AxM25dIEZpbGV...' (trimmed). This is the code that's called: MailMessage message = new MailMessage() { From = new MailAddress("[email protected]", "Service"), BodyEncoding = Encoding.UTF8, Body = body...

How to deny foreign alphabets in utf-8 in PHP 5.x (symfony)?

I have to disable chinese, japanese, cyrillic (and so on) alphabets to be entered by users in my website, at validation time (only server side validation). At the same time I want all latin accented characters to be allowed. I use symfony 1.1 and PHP 5.2, using utf-8, of course. Any hint? ...

SQLServer 2005 and UTF8

I'm trying to setup a sqlserver 2005 that will be accessed using C++ and ODBC (the data read will be sent in XML files). So, I want to read data from the database (preferably utf-8), compose a XML file and send it. I have been browsing around and i haven't found a way to setup the database and the tables for using utf-8 (as in MySQL). I...

"unmappable character for encoding" warning in Java

I'm currently working on a Java project that is emitting the following warning when I compile: /src/com/myco/apps/AppDBCore.java:439: warning: unmappable character for encoding UTF8 [javac] String copyright = "� 2003-2008 My Company. All rights reserved."; I'm not sure how SO will render the character before the date, but it ...

Comparing strings in PHP the same way MySQL does

I'm storing a varchar in a utf8 MySQL table and using utf8_general_ci collation. I have a unique index on the varchar. I'd like to do a string comparison in PHP that is equivalent to what MySQL will do on the index. A specific example is that I'd like to be able to detect that 'a' is considered equivalent to 'À' in PHP before this ha...

How to test an application for correct encoding (e.g. UTF-8)

Encoding issues are among the one topic that have bitten me most often during development. Every platform insists on its own encoding, most likely some non-UTF-8 defaults are in the game. (I'm usually working on Linux, defaulting to UTF-8, my colleagues mostly work on german Windows, defaulting to ISO-8859-1 or some similar windows codep...

UTF-8 only in Grails database tables

When using Grails 1.0.4 together with a MySQL the charsets of the auto-generated database tables seem to default to ISO-8859-1. I'd rather have everything stored as pure UTF-8. Is that possible? From the auto-generated database definitions: ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin1; Note the "latin1" part. ...

XSLT encoding problem, questionmarks in result

I'm trying to run an XSLT transformation, but characters like ëöï are replaced by a literal '?' in the output (I checked with an hex editor). The source file has the correct characters, and the stylesheet has: <xsl:output encoding="UTF-8" indent="yes" method="xml"/> What else am I missing? I'm using saxon as the transformer, if that ...

How can I get interactive Python to avoid using readline while allowing utf-8 input?

I use a terminal (9term) that does command-line editing itself - programs that use readline just get in its way. It's fully utf-8 aware. How can I make an interactive python session disable readline while retaining utf-8 input and output? Currently I use: LANG=en_GB.UTF-8 export LANG cat | python -i however this causes sys.stdin.enco...

Unicode (utf8) reading and writing to files in python

I'm having some brain failure in understanding reading and writing text to a file (Python 2.4). # the string, which has an a-acute in it. ss = u'Capit\xe1n' ss8 = ss.encode('utf8') repr(ss), repr(ss8) ("u'Capit\xe1n'", "'Capit\xc3\xa1n'") print ss, ss8 print >> open('f1','w'), ss8 >>> file('f1').read() 'Capit\xc3\xa1n\n' ...

Why do my Perl tests fail with `use encoding 'utf8'`?

Hi, I'm puzzled with this test script: #!perl use strict; use warnings; use encoding 'utf8'; use Test::More 'no_plan'; ok('áá' =~ m/á/, 'ok direct match'); my $re = qr{á}; ok('áá' =~ m/$re/, 'ok qr-based match'); like('áá', $re, 'like qr-based match'); The three tests fail, but I was expecting that the use encoding 'utf8' would u...

How to convert a string from utf8 to ASCII (single byte) in c#?

I have a string object "with multiple characters and even special characters" I am trying to use UTF8Encoding utf8 = new UTF8Encoding(); ASCIIEncoding ascii = new ASCIIEncoding(); objects in order to convert that string to ascii. May I ask someone to bring some light to this simple task, that is hunting my afternoon. EDIT 1: What...

Delphi 2009 RawByteString vagaries

Suppose that for some perverse reason you want to display the raw byte contents of a UTF8String. var utf8Str : UTF8String; begin utf8Str := '€ąćęłńóśźż'; end; (1) This doesn't do, it displays the readable form: memo1.Lines.Add( RawByteString( utf8Str )); // output: '€ąćęłńóśźż' (2) This, however, does "work" - note the conc...

embedded script displaying gibberish depending on encodying type (utf-8)...

I have a widget that people can put in their site. The widget is generated via php script that echos the populated string using: document.write('$widget_output'). The hosting sites call to the widget using a javascript tag: <script type="text/javascript" src="http://www.link.com/page.php?param=1"&gt;&lt;/script&gt; The problem is t...

How to conduct an Accent Sensitive search in MySql

I have a MySQL table with utf8 general ci collation. In the table, I can see two entries: abad abád I am using a query that looks like this: SELECT * FROM `words` WHERE `word` = 'abád' The query result gives both words: abad abád Is there a way to indicate that I only want MySQL to find the accented word? I want the query to only...

WinAPI and UTF-8 support

Quick question regarding UTF-8 support and various Win32 API's. In a typical C++ MFC project, is it possible for MessageBox() to display a UTF-8 encoded string? Thanks, Andrew ...

Detect file encoding in PHP

I have a script which combines a number of files into one, and it breaks when one of the files has UTF8 encoding. I figure that I should be using the utf8_decode() function when reading the files, but I don't know how to tell which need decoding. My code is basically: $output = ''; foreach ($files as $filename) { $output .= file_ge...