character-encoding

How to convert these kind of characters to their corresponding unicode characters in Ruby?

I want to know how to convert these kind of characters to their unicode form, such as the following one: Delphi_7.0%E6%95%B0%E6%8D%AE%E5%BA%93%E5%BC%80%E5%8F%91%E5%85%A5%E9%97%A8%E4%B8%8E%E8%8C%83%E4%BE%8B%E8%A7%A3%E6%9E%90 The unicode characters of the upper string is: Delphi_7.0数据库开发入门与范例解析 Anybody knows how to do the conversion ...

Rails, MySQL, Unicode data and latin1 tables - Where to go from here?

I'm not 100% sure on the particulars, so I'd love someone straightening me out, but I'll forge ahead with what I think is going on... When I first setup my database, I used the default character encoding of the system without even thinking, and it was latin1. I never even thought about i18n/l10n. It just didn't occur to me. I just accep...

Java: SAXParser character reference decoding

With reference to this question http://stackoverflow.com/questions/3850315/java-splitting-up-a-large-xml-file-with-saxparser I'm essentially reading in an xml file using SAXParser and echoing it to another file. My problem is that the content of my input file contains character references which are being decoded on reading in. How can I...

Character Decoders in Java

Where can I find some character decoders for the non-officially supported charsets? I.e. I don't want to reinvent the wheel, surely someone must have already written some decoders for their own purposes or as a library? Thanks! ...

Why we use flush parameter with Encoder.GetBytes method

This link explains the Encoder.GetBytes Method and there is a bool parameter called flush explained too . The explanation of flush is : true if this encoder can flush its state at the end of the conversion; otherwise, false. To ensure correct termination of a sequence of blocks of encoded bytes, the last call to GetBytes ca...

itext PDF - Greek letters are not appearing in the resulting PDF document

Hi All, I am having hard time trying to generate PDF files containing Greek letters using itextpdf. I am reading the strings from an external source as UTF-8 strings. English letters appear in results but not the Greek ones. Searching for the problem, I think it might be related to the font used. I do not know what ttf file to use if th...

Strict string to byte encoding in C#

I've just stumbled over another question in which someone suggested to use new ASCIIEncoding().GetBytes(someString) to convert from a string to bytes. For me it was obvious that it shouldn't work for non-ASCII characters. But as it turns out, ASCIIEncoding happily replaces invalid characters with '?'. I'm very confused about this because...

How to escape Japanese characters?

I have the following string "Messatsu Gou Hadou (滅殺豪波動)" Is there a way to escape these characters so it would be converted to "滅殺豪波動" Is there some way to do it? ...

LAMP UTF-8 saving incorrectly to MySQL Database

I've converted my database from Latin 1 to UTF8, and using phpPMyAdmin you can enter data and display it correctly. However viewing in the pages I've developed in PHP and editing it using my simple CMS saves characters that must be incorrectly coded. I've spent a few hours researching and eventually came up with this code snippet: mysq...

History of CP437 character "ñ"

Does anybody know why letter "ñ" preceeds letter "Ñ" in code page 437? ñ -> ALT+164 Ñ -> ALT+165 While for the rest of the characters uppercase precedes lowercase. Just wondering ...

problem with japanese character encoding in wordpress

I have a wordpress installation in english but all of the content is in Japanese. I have set the charset to utf-8 in the head section of the page and all the characters display fine. However if I use the wordpress search widget to search for something in japanese, all of the characters get encoded into some wierd encoding that looks like...

C# char/byte encoding equality

I have some code to dump strings to stdout to check their encoding, it looks like this: private void DumpString(string s) { System.Console.Write("{0}: ", s); foreach (byte b in s) { System.Console.Write("{0}({1}) ", (char)b, b.ToString("x2")); } System.Console.Writ...

Are there any situations in which you would use NLS_LENGTH_SEMANTICS=BYTE on a Unicode database?

Having a Unicode (multi-byte charset) Oracle database with NLS_LENGTH_SEMANTICS=BYTE seems like a disaster waiting to happen. Field validation in most applications only check the number of characters is within bounds, not the byte sequence length in the database’s default character encoding scheme! If you've got a Unicode database, is th...

Using the British pound sign in an XML feed to be read by an iPhone

I have created a web-based UTF-8 XML feed for use in an iPhone application. When viewing in a web browser, if the feed contains a British Pound sign, I get a nasty XML error: XML Parsing Error: undefined entity However the actual file seems to be readable. 1. Will an iPhone NSParser be able to read the file or will it fail due to this...

How can SELECT HEX(CHAR(0x4E8C USING ucs2)) return '4E01' instead of '4E8C' ?

I converted a kanji column in my database to UCS-2 codes with this, it works: SELECT hex(convert('二' using ucs2)); => 0x4E8C aka &#x4E8C aka Unicode Code Point 20108 But if I want to convert my SQL results back to kanji, I get the wrong character: SELECT CHAR(0x4E8C USING ucs2); Returns 丁 which has code point 0x4E01 Inste...

struts, hibernate, mysql - char encoding problem

Hello, I have a web app that gathers some data from the user and saves them to a mysql database. The problem is that for a String like "Ajánlat kiküldése", what gets stored to the database is "Ajánlat kiküldése". For my database, i have DEFAULT CHARACTER SET utf8. For my tables i have DEFAULT CHARSET=utf8. In my hibernate.cfg.xml i...

stristr problem with unicode string

I use mb_stristr function to detect whether a word exists in string or not but if the word I'm checking for is written in unicode this function always returns false. Even if the word actually exists. If I'm looking for non unicode word it word it works fine. Does anyone know how to solve this problem? Tried the strstr function too but th...

JSF 2.0 request.getParameter return a string with wrong encoding

Hi, I'm writing an application in JSF 2.0 which supports many languages, among them ones with special characters. I use String value = request.getParameter("name") and POST method, the page encoding is set to UTF-8 and the app is deployed on apache tomcat 6 which has the connector set correctly to utf-8 in a server.xml file: <Connector...

How do I filter chat messages by normalizing letter forms?

I'm filtering chat messages on a chat system where constraining strings to Latin-1 English is desirable. Users tend to use creative typing, e.g. ßòógīě§ instead of Boogies In Java, there are unicode normalization methods which can remove diacritic marks, but I'm more interested in methods of normalizing the shapes of the letters t...

Character display wonder

When I copied here,it's actually: printf("\n"); But when displayed in this page,it's: Anyone knows the reason, how to reproduce it with least code? BTW,can anyone sniffers the language it's written in? ...