utf-8

Japanese font apperaring garbled in an applet

i have a java swing application jar running as an applet in a jsp.In one of the JTextfields a user cuts pastes japanese language characters and this shows up garbled.However when i run the same application as an applet from RAD it shows up just fine.Also the JSP have the content to be represented as UTF-8 as per the META tag. ...

Java - Convert String to valid URI object

I am trying to get a java.net.URI object from a String. The string has some characters which will need to be replaced by their percentage escape sequences. But when I use URLEncoder to encode the String with UTF-8 encoding, even the / are replaced with their escape sequences. How can I get a valid encoded URL from a String object? http...

base64 and UTF-8 encoding issue

Hello everyone, I am writing a simple web method which returns byte[], and the byte[] is encoded by UTF-8. I have investigated related WSDL and soap message, seems the underlying web services stack will use base64 encoding? For various reasons, I can not use or re-encode my return byte[] from UTF-8 to base64. Any ideas to modify the ba...

How do I parse and store UTF-8 data into a tab-separated-file in Ruby?

I have a hash names hsh that has values that are UTF-8 encoided. For example: hsh ={:name => some_utf_8_string, :text => :some_other_utf_8_string} I am currently doing the following: $KCODE="UTF8" File.open("save.tsv","w") do{|file| file.puts hsh.values.map{|x| x.to_s.gsub("\t",' ')}.join("\t") } But this croaks randomly because I...

Change from HTML character refrences to utf-8 in a bash script ie. ā becomes ā

How would you go about translating a document that contains the following character references to their actual readable characters in a bash script? ā á ǎ à ē é ě è ī í ǐ ì ǖ ǘ ǚ ǜ ü ǖ ǘ ǚ ǜ ü These change in order to ā á ǎ à ...

ie7 and UTF8 problems in classic asp

I'm having problems with utf-8 coming up as squares in ie7. It works fine in firefox, opera, camino and safari. One of the many characters that I'm trying to use is ✱ - Which is ✱. IE7 has this problem with characters used in this notation or pulled from the database (All other browsers display the characters correctly). My hea...

JSON character encoding

I am writing a webservice that uses json to represent its resources, and I am a bit stuck thinking about the best way to encode the json. Reading the json rfc (http://www.ietf.org/rfc/rfc4627.txt) it is clear that the preferred encoding is utf-8. But the rfc also describes a string escaping mechanism for specifying characters. I assume t...

Output text as GIF or PNG for use in eBook

BACKGROUND: My goal is to create an eBook that I can read with the Mobipocket reader on my Blackberry. The problem is that my text includes UTF-8 characters which are not supported on the Blackberry, and therefore display as black boxes. The eBook will contain a list of English and Punjabi words for reference, such as: bait ...

Finding files ISO-8859-1 encoded?

I have a bunch of files with a mixtures of encodings mainly ISO-8859-1 and UTF-8. I would like to make all files UTF-8, but when trying to batch encode this files using iconv some problems arise. (Files cuts by half, etc.) I supposse the reason is that iconv requires to know the 'from' encoding, so if the command looks like this iconv...

Ruby on Rails :serialize UTF8 problem

When I serialize a hash containing UTF8 strings, like this: poll.variants = {0 => 'тест',1 => '-тест-',2 => 'test # test "тест'} to an ActiveRecord field, the resulting field contains: --- 0: !binary | 0YLQtdGB0YI= 1: !binary | LdGC0LXRgdGCLQ== 2: !binary | dGVzdCAjIHRlc3QgItGC0LXRgdGC The utf8 strings get treated as bi...

Zend Framework PDF generation unicode issue

Hi all, I have troubles using Zend Framework's PDF When I create PDF file I need to use UTF-8 as encoding. This is the code I am using to generate simple pdf file. I always get this wrong displayed. Instead of seeing 'Faktúra' in pdf file, it gives me 'Faktú' Instead of seeing 'Dodávateľ:' in pdf file, it gives me 'Dodáva' $pdf = new...

How to check if a unicode character is within given range in C?

The following function was written for java and has been adapted for C. bool isFullwidthKatakana(WideChar C) { return(('\u30a0'<=C)&&(C<='\u30ff')); } The problem is that my framework ("CodeGear C++Builder") shows this error: [BCC32 Warning] Unit1.cpp(101): W8114 Character represented by universal-character-name '\u30a0' ...

Problem with function removing accents and other characters in PHP

I found a simple function to remove some undesired characters from a string. function strClean($input){ $input = strtolower($input); $b = array("á","é","í","ó","ú", "ñ", " "); //etc... $c = array("a","e","i","o","u","n", "-"); //etc... $input = str_replace($b, $c, $input); return $input; } When I use it on accents or other characte...

Java equivalent to JavaScript's encodeURIComponent that produces identical output?

I've been experimenting with various bits of Java code trying to come up with something that will encode a string containing quotes, spaces and "exotic" Unicode characters and produce output that's identical to JavaScript's encodeURIComponent function. My torture test string is: "A" B ± " If I enter the following JavaScript statement i...

Does C++0x support std::wstring conversion to/from UTF-8 byte sequence ?

I saw that C++0x will add support for UTF-8, UTF-16 and UTF-32 literals. But what about conversions between the three representations ? I plan to use std::wstring everywhere in my code. But I also need to manipulate UTF-8 encoded data when dealing with files and network. Will C++0x provide also support for these operations ? ...

How can I output UTF-8 from Perl?

I am trying to write a Perl script using the "utf8" pragma, and I'm getting unexpected results. I'm using Mac OS X 10.5 (Leopard), and I'm editing with TextMate. All of my settings for both my editor and operating system are defaulted to writing files in utf-8 format. However, when I enter the following into a text file, save it as a ...

Decoding HTML Entities With Python

The following Python code uses BeautifulStoneSoup to fetch the LibraryThing API information for Tolkien's "The Children of Húrin". import urllib2 from BeautifulSoup import BeautifulStoneSoup URL = ("http://www.librarything.com/services/rest/1.0/" "?method=librarything.ck.getwork&id=1907912" "&apikey=2a2e596b887...

Rule for handling UTF-8 characters in cookie for CGI applications?

I was told to always URL-encode a UTF-8 string before placing on a cookie. So when a CGI application reads this cookie, it has to URL-decode the string to get the original UTF-8 string. Is this the right way to handle UTF-8 characters in cookies? Is there a better way to do this? ...

What is the difference between EM Dash #151; and #8212;?

I've an ASCII file that contains an EM Dash (— or &mdash; in HTML). The hex value is 0x97. When we pass this file through one application it arrives as UTF-8, and it converts the character to 0xC297, which is &#151; in HTML. However, when we pass this file through a different application it converts the character to 0xE28094 or &#8212;. ...

How can I convert non-ASCII characters encoded in UTF8 to ASCII-equivalent in Perl?

I have a Perl script that is being called by third parties to send me names of people who have registered my software. One of these parties encodes the names in UTF-8, so I have adapted my script accordingly to decode UTF-8 to ASCII with Encode::decode_utf8(...). This usually works fine, but every 6 months or so one of the names contai...