utf-8

Problems reading/writing UTF-8 data in MySQL from Java using JDBC connector 5.1

Hello, I have an scenario with two MySQL databases (in UTF-8), a Java code (a Timer Service) that synchronize both databases (reading form first of them and writing/updating to second) and a Web application that lets modify data loaded in the second database. All database access are made using IBATIS (but I detect that I have the sam...

How to verify browser support UTF-8 characters properly?

Hi, Is there a way to identify whether the browser encoding is set to/supports "UTF-8" from Javascript? I want to send "UTF-8" or "English" letters based on browser setting transparently (i.e. without asking the User) Edit: Sorry I was not very clear on the question. In a Browser the encoding is normally specified as Auto-Detect (or) ...

How well is UTF-8 supported in email?

How well is UTF-8 supported in various email clients? I know it was somewhat of a problem five or so years ago -- but is it still something we should worry over? I am wondering if I should re-encode strings to some other encoding before sending. For example, Russian text would be stored as UTF-8 but when sending email notifications, I ...

regex to match any UTF character excluding punctuation

Hello, I'm preparing a function in PHP to automatically convert a string to be used as a filename in a URL (*.html). Although ASCII should be use to be on the safe side, for SEO needs I need to allow the filename to be in any language but I don't want it to include punctuation other than a dash (-) and underscore (_), chars like *%$#@"'...

How do I select a unique list of first characters [MySQL]

I have a column containing the list of names. I need to select only the first (unique) letters from the name. For non-utf-8 characters the following query works pretty well: SELECT DISTINCT LEFT(T1.Name, 1) AS firstLetter However, when the Name starts with UTF-8 encoded character this returns: �-sign. I suppose it's only the first cha...

What is a multibyte character set?

Does the term multibyte refer to a charset whose characters can - but don't have to be - wider than 1 byte, (e.g. UTF-8) or does it refer to character sets which are in any case wider than 1 byte (e.g. UTF-16) ? In other words: What is meant if anybody talks about multibyte character sets? ...

Converting xml from UTF-16 to UTF-8 using PowerShell

What's the easiest way to convert XML from UTF16 to a UTF8 encoded file? ...

Quotation marks turn to question marks ...

So I have a ruby script that parses HTML pages and saves the extracted string into a DB... but i'm getting weired charcters (usually question marks) instead of plain text... Eg : ‘SOME TEXT’ instead of 'Some Text' I've tried HTML entities and CGI::unescape ... but to no avail... did some googling n set $KCODE = 'u' & require 'jcode...

XML encoding issue

Hello everyone, I want to know whether there is quick way to find whether an XML document is correctly encoded in UTF-8 and does not contains any characters which is not allowed in XML UTF-8 encoding. <?xml version="1.0" encoding="utf-8"?> thanks in advance, George EDIT1: here is the content of my XML file, in both text form and in ...

UTF-8 to EBCDIC in Java

Hello, Our requirement is to send EBCDIC text to mainframe. We have some chinese characters thus UTF8 format. So, is there a way to convert the UTF-8 characters to EBCDIC? Thanks, Raj Mohan ...

inconsistent display of utf8 accents

I have one database, with one table, with a particular field which has descriptions of various clothing. These descriptions often contain umlauts or similar characters. I am retrieving this fields from two different php fields, and I am not doing anything to the data, yet it displays inconsistently. In file1, it will display correctly ...

C#: Unicode from a string with MySQL

I'm trying to insert a string into a MySQL database. I can insert it by running the query on the server, but when I try to use my C# source file to insert "Iñtërnâtiônàlizætiøn", I get "Iñtërnâtiônàlizætiøn". I've tried adding it as a parameter and adding ;charset=utf8 to my connection string, but no look. The table in the databas...

Fatal error: Uncaught exception 'MySQLiQuery_Exception' with message 'Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) - with PHP5

Hi, I've got this error: Fatal error: Uncaught exception 'MySQLiQuery_Exception' with message 'Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '=': select id from 'addresses' where 'shiptozip'='13000' and 'shiptostreet'='Františka Křížka' As you can see, I'm trying to get an ID from...

Errors in Flex 3 XML implementation? Rewriting xml:lang

So I'm working on a quick utility to allow simple editing for TMX files. TMX is basically an XML-based standard for storing multilingual translations. Anyhoo, I'm importing TMX into an Adobe AIR app via a File reference, then grabbing the file stream, slapping the UTF-8 characters into a string, and then that string into an XML object. T...

Handling .Net UTF-8 strings in Erlang

Hello, I'm playing a bit with erlang and the distributed db Mnesia. One of the first problem I'm facing is the incompatibilty beetween the 'int list' strings of erlang and .Net UTF-8 strings. Is there any good conversion library? Thanks ...

unicode hello world for C?

I am trying to output things like 안, 蠀, ☃ from C #include <wchar.h> int main() { fwprintf(stdout, L"안, 蠀, ☃\n"); return 0; } output is ?, ?, ? How do I print those characters? Edit: #include <wchar.h> #include <locale.h> int main() { setlocale(LC_CTYPE, ""); fwprintf(stdout, L"안, 蠀, ☃\n"); return 0; } this ...

How can I handle Russian text in Perl?

Hi, I'm new to doing anything with any language that isn't english. So far the only I've ever done with programming is take input in the basic english letters + numbers and output it. Now I have to manipulate some text in Russian (especially from the Russian wikipedia page) but I have no clue where to start. I google and google but all I...

Best way to convert a Unicode URL to ASCII (UTF-8 percent-escaped) in Python?

I'm wondering what's the best way -- or if there's a simple way with the standard library -- to convert a URL with Unicode chars in the domain name and path to the equivalent ASCII URL, encoded with domain as IDNA and the path %-encoded, as per RFC 3986. I get from the user a URL in UTF-8. So if they've typed in http://➡.ws/♥ I get 'htt...

UTF-8 characters that aren't XSS vulnerabilities

I'm looking at encoding strings to prevent XSS attacks. Right now we want to use a whitelist approach, where any characters outside of that whitelist will get encoded. Right now, we're taking things like '(' and outputting '&#40;' instead. As far as we can tell, this will prevent most XSS. The problem is that we've got a lot of internat...

Decoding problems in Django and lxml

I have a strange problem with lxml when using the deployed version of my Django application. I use lxml to parse another HTML page which I fetch from my server. This works perfectly well on my development server on my own computer, but for some reason it gives me UnicodeDecodeError on the server. ('utf8', "\x85why hello there!", 0, 1,...