utf-8

Rails + MySQL Unicode

I'm trying to get Unicode working properly in rails using MySQL. Now, Rails displays the text correctly, but it shows up as ??? in MySQL. Additionally, I am not able to filter the text. My MySQL database has been configured with the utf8 character set. My client character is also UTF8. Likewise, rails is set to use UTF8. If I ent...

Ruby 1.8 and UTF-8 string case statment compare (Ruby on Rails 2.2)

Hello :) I have a rake task (in lib/tasks directory) that I run with cron on my shared web hosting. The problem is that I want to compare a UTF-8 string using case statment but my source code is not UTF-8 encoded. If I save source code as UTF-8 there is error when I try to start it :( What I have to do? May be read this strings from ...

I18n and Passwords that aren't US-ASCII, Latin1, or Win1252

How do you handle passwords for services when the user enters something that is best represented in Unicode or some other non-Latin character encoding? Specifically, can you use a Cyrillic password as a password to Oracle? What do you do to verify a user's password against a Windows authentication mechanism if the password is provided a...

Setting the default Java character encoding?

How do I properly set the default character encoding used by the JVM (1.5.x) programmatically? I have read that -Dfile.encoding=whatever used to be the way to go for older JVMs... I don't have that luxury for reasons I wont get into. I have tried: System.setProperty("file.encoding", "UTF8"); And the property gets set, but it doesn't...

How to get rid of weird characters in my RSS feed?

Hi, I've created a utf8 encoded RSS feed which presents news data drawn from a database. I've set all aspects of my database to utf8 and also saved the text which i have put into the database as utf8 by pasting it into notepad and saving as utf8. So everything should be encoded in utf8 when the RSS feed is presented to the browser, howe...

How can I convert messages in an mbox file to UTF-8?

Hello, I am trying to modify the below program to ensure each msg is converted to utf-8 using Encode::decode(), but I am unsure of how and where to place this to make it work. #!/usr/bin/perl use warnings; use strict; use Mail::Box::Manager; open (MYFILE, '>>data.txt'); binmode(MYFILE, ':encoding(UTF-8)'); my $file = shift || $ENV{...

How to write out a text file in C# with a code page other than utf-8?

I want to write out a text file. Instead of the default UTF-8, I want to write it encoded as ISO-8859-1 which is code page 28591. I have no idea how to do this... I'm writing out my file with the following very simple code: using (StreamWriter sw = File.CreateText(myfilename)) { sw.WriteLine("my text..."); sw.Close(); } ? ...

Convert utf8-characters to iso-88591 and back in PHP

Hi all. Some of my script are using different encoding, and when I try to combine them, this has becom an issue. But I can't change the encoding they use, instead I want to change the encodig of the result from script A, and use it as parameter in script B. So: is there any simple way to change a string from UTF-8 to ISO-88591 in PHP...

Character encoding JSP -displayed wrong in JSP but not in URL: "á » á é » é"

I have this Web Application in JSP running on JBoss Application Server. I am using Servlets for friendly urls. I'm sending search parameters through my JSP's and Servlets. I am using a form with a text box, the Servlet The first Servlet uses request.getParameter() to get the text, and sends it to another Servlet with response.sendRedir...

Converting MBCS stream to UTF-8 and vice versa in C++

Hi, I'm using Visual C++ (VS2005) and compiling the project in Multibyte Character Set (MBCS). However, the program needs to communicate with a webapp (which is in utf-8) via XMLRPC. So I'm thinking maybe I can use MBCS internally and convert the strings to utf-8 before sending them to the xmlrpc module and converting them back to MBCS ...

Howto identify UTF-8 encoded strings

What's the best way to identify if a string (is or) might be UTF-8 encoded? The Win32 API IsTextUnicode isn't of much help here. Also, the string will not have an UTF-8 BOM, so that cannot be checked for. And, yes, I know that only characters above the ASCII range are encoded with more than 1 byte. ...

Convert GB2312 to UTF-8

I have a text file that contains localized language strings that is currently encoded in GB2312 (simplified Chinese), but all of my other language files are in UTF-8. I am finding it very difficult to work with this file, as none of my text editors will work properly with it and keep corrupting it. Are there any tools to convert this to ...

How to get &nbsp to behave properly using HTML Purifier?

I am using HTML Purifier in my PHP project and am having trouble getting it to work properly with user input. I am having users enter in HTML using a WYSIWYG editor (TinyMCE), but whenever a user enters in the HTML entity   (non-breaking space) it gets saved into the database as this weird foreign character (Â). However, the thing...

PHP/MySQL with encoding problems

I am having trouble with PHP regarding encoding. I have a JavaScript/jQuery HTML5 page interact with my PHP script using $.post. However, PHP is facing a weird problem, probably related to encoding. When I write htmlentities("í") I expect PHP to output í. However, instead it outputs í At the beginning, I thought ...

Cyrillic characters in PHP's json_encode

I'm trying to encode Cyrillic UTF-8 array to JSON string using php's function json_encode. The sample code looks like this: <?php $arr = array( 'едно' => 'първи', 'две' => 'втори' ); $str = json_encode($arr); echo $str; ?> It works fine but the result of the script is represented as:...

Detecting encoding conversion problems

The majority of content on my company's website starts life as a Word document (Windows-1252 encoded) and is eventually copied-and-pasted into our UTF-8-encoded content management system. The conversion usually chokes on a few characters (special break characters, smart quotes, scientific notations) which have to be cleaned up manually, ...

Why isn't the Byte Order Mark emitted from UTF8Encoding.GetBytes?

The snippet says it all :-) UTF8Encoding enc = new UTF8Encoding(true/*include Byte Order Mark*/); byte[] data = enc.GetBytes("a"); // data has length 1. // I expected the BOM to be included. What's up? ...

Convert asp.net project pages from Windows-1251 to Utf-8

I can do that file-by-file with Save As Encoding in Visual Studio, but I'd like to make this in one click. Is it possible? ...

UTF-8 From File to TextBox VC++ 6.0

How do I get an old VC++ 6.0 MFC program to read and display UTF8 in a TextBox or MessageBox? Preferably without breaking any of the file reading and displaying that is currently written in there (fairly substantial). I read a line into CString strStr, then used this code: int nLengthNeeded = MultiByteToWideChar(CP_UTF8,0,strStr,1024,...

How to convert an unreadable string back to UTF-8 bytes in c#

I have a string looks like aeroport aimé I know it is French, and I want to convert this string back to readable format. Any suggestions? ...