questions about utf-8 | ansaurus

utf-8

How to get UTF-8 working in java webapps?

I need to get UTF-8 working in my Java webapp (servlets + JSP, no framework used) to support äöå etc. for regular Finnish text and Cyrillic alphabets like ЦжФ for special cases. My setup is the following: Development encironment: Windows XP Production encironment: Debian Database used: MySQL 5.x Users mainly use Firefox2 but also O...

Ruby: How to break a potentially unicode string into bytes

I'm writing a game which is taking user input and rendering it on-screen. The engine I'm using for this is entirely unicode-friendly, so I'd like to keep that if at all possible. The problem is that the rendering loop looks like this: "string".each_byte do |c| render_this_letter(c) end I don't know a whole lot about i18n, but I ...

How does the UTF-8 support of TinyXML work

I'm using TinyXML (http://www.grinninglizard.com/tinyxml/) to parse/build XML files. Now according to the documentation (http://www.grinninglizard.com/tinyxmldocs/) this library supports multibyte character sets through UTF-8. So far so good I think. But, the only API that the library provides (for getting/setting element names, attribut...

UTF8 to/from wide char conversion in STL

Is it possible to convert UTF8 string in a std::string to std::wstring and vice versa in a platform independent manner? In a Windows application I would use MultiByteToWideChar and WideCharToMultiByte. However, the code is compiled for multiple OSes and I'm limited to standard C++ library. ...

character-encoding

How to display a non-ascii filename in the file download box in browsers?

There doesn't seem to be an accepted way of sending down a header parameter in non ascii format. The header for file download usually looks like Content-disposition: attachment; filename="theasciifilename.doc" Except if you smash a utf8 encoded string in the filename parameter, Firefox will handle it fine, whereas IE will throw up. T...

Loading UTF-8 encoded dump into MySQL

Hi, I've been pulling my hear out over this problem for a few hours yesterday: I've a database on MySQL 4.1.22 server with encoding set to "UTF-8 Unicode (utf8)" (as reported by phpMyAdmin). Tables in this database have default charset set to latin2. But, the web application (CMS Made Simple written in PHP) using it displays pages in u...

Setting ISO-8859-1 encoding for a single Tapestry 4 page in application that is otherwise totally UTF-8

I have a Tapestry application that is serving its page as UTF-8. That is, server responses have header: Content-type: text/html;charset=UTF-8 Now within this application there is a single page that should be served with ISO-8859-1 encoding. That is, server response should have this header: Content-type: text/html;charset=ISO-8859-1 ...

How to convert a utf-8 string to a utf-16 string in PHP

How do I convert a utf-8 string to a utf-16 string in PHP? ...

PHP: Replace umlauts with closest 7-bit ASCII aequivalent in an UTF-8 string

What I want to do is to remove all accents and umlauts from a string, turning "lärm" into "larm" or "andré" into "andre". What I tried to do was to utf8_decode the string and then use strtr on it, but since my source file is saved as UTF-8 file, I can't enter the ISO-8859-15 characters for all umlauts - the editor inserts the UTF-8 chara...

What options do exist now to implement UTF8 in Ruby and RoR?

Following the development of Ruby very closely I learned that detailed character encoding is implemented in Ruby 1.9. My question for now is: How may Ruby be used at the moment to talk to a database that stores all data in UTF8? Background: I am involved in a new project where Ruby/RoR is at least an option. But the project needs to rel...

internationalization

Exporting MSAccess Tables as Unicode with Tilde delimiter

I want to export the contents of several tables from MSAccess2003. The tables contain unicode Japanese characters. I want to store them as tilde delimited text files. I can do this manually using File/Export and, in the 'Advanced' dialog selecting tilde as Field Delimiter and the Unicode as the Code Page. I can store this as an Export...

How to tell if text on the windows clipboard is ISO 8859 or UTF-8 in C++ ?

I would like to know if there is an easy way to detect if the text on the clipboard is in ISO 8859 or UTF-8 ? Here is my current code: COleDataObject obj; if (obj.AttachClipboard()) { if (obj.IsDataAvailable(CF_TEXT)) { HGLOBAL hmem = obj.GetGlobalData(CF_TEXT); CMemFile sf((BYTE*) ::GlobalLock(hmem),...

UTF-8 in Windows

How do I set the code page to UTF-8 in a C Windows program? I have a third party library that has uses fopen to open files. I can use wcstombs to convert my Unicode filenames to the current code page, however if the user has a filename with a character outside the code page then this breaks. Ideally I would just call _setmbcp(65001...

PHP utf8 problem

I have some problems comparing an array with Norwegian characters with a utf8 character. All characters except the special Norwegian characters(æ, ø, å) works fine. function isNorwegianChar($Char) { $aNorwegianChars = array('a', 'A', 'b', 'B', 'c', 'C', 'd', 'D', 'e', 'E', 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'j', 'J', 'k', 'K',...

UTF8 vs. UTF16 vs. char* vs. what? Someone explain this mess to me!

I've managed to mostly ignore all this multi-byte character stuff, but now I need to do some UI work and I know my ignorance in this area is going to catch up with me! Can anyone explain in a few paragraphs or less just what I need to know so that I can localize my applications? What types should I be using (I use both .Net and C/C++, an...

character-encoding

Why does the string "¿" get translated to "Â¿" when calling .getBytes()

When writing the string "¿" out using System.out.println(new String("¿".getBytes("UTF-8"))); Â¿ is written instead of just ¿. WHY? And how do we fix it? ...

character-encoding

Why is ¿ displayed different in Windows vs Linux even when using UTF-8?

Why is the following displayed different in Linux vs Windows? System.out.println(new String("¿".getBytes("UTF-8"), "UTF-8")); in Windows: ¿ in Linux: Â¿ ...

character-encoding

[C++] UTF-8 to ASCII using ICU Library

I have a std::string with UTF-8 characters in it. I want to convert the string to its closest equivalent with ASCII characters. For example: Łódź => Lodz Assunção => Assuncao Schloß => Schloss Unfortunatly ICU library is realy unintuitive and I haven't found good documentation on its usage, so it would take me too much time to l...

transliteration

Case-insensitive UTF-8 string collation for SQLite (C/C++)

I am looking for a method to compare and sort UTF-8 strings in C++ in a case-insensitive manner to use it in a custom collation function in SQLite. The method should ideally be locale-independent. However I won't be holding my breath, as far as I know, collation is very language-dependent, so anything that works on languages other than...

internationalization

iPhone "Web Site Error"

I'm writing server-side programs in PHP for an iPhone app. And I have no iPhone. :P The iPhone app requests XML files from the site whenever a user runs the iPhone app. You may visit http://www.appvee.com/iphone/ads or http://www.appvee.com/iphone/latest for the XML files. And a message box will show up with the following error message...

1
2
3
4
5
...
69