utf-8

Converting UTF8 to ANSI with Ruby

I have a Ruby script that generates a UTF8 CSV file remotely in a Linux machine and then transfers the file to a Windows machine thru SFTP. I then need to open this file with Excel, but Excel doesn't get UTF8, so I always need to open the file in a text editor that has the capability to convert UTF8 to ANSI. I would love to do this pr...

UTF-16 to UTF-8 conversion (for scripting in Windows)

Hi, what is the best way to convert a UTF-16 files to UTF-8? I need to use this in a cmd script. ...

Handling UTF-8 encoding

We have an Java application running on Weblogic server that picks up XML messages from a JMS or MQ queue and writes it into another JMS queue. The application doesn't modify the XML content in any way. We use BEA's XMLObject to read and write the messages into queues. The XML messages contain the encoding type declarations as UTF-8. We...

UTF-8 URI explodes Apache & mod_rewrite

I have Apache with mod_rewrite, and whenever I enter a URI with an accented character in it, Apache gives me a "Page Not Found" error. The URI is: /places/tags/Café My page encoding is UTF-8. My database connection & tables are UTF-8. My Apache DefaultCharacterSet = UTF-8. Yes, Apache has language packs, but I believe they're there for...

UTF-8 latin-1 conversion issues, python django

ok so my issue is i have the string '\222\222\223\225' which is stored as latin-1 in the db. What I get from django (by printing it) is the following string, 'ââââ¢' which I assume is the UTF conversion of it. Now I need to pass the string into a function that does this operation: strdecryptedPassword + chr(ord(c) - 3 - intCounter -...

Save all files in Visual Studio project as UTF-8

I wonder if it's possible to save all files in a Visual Studio 2008 project into a specific character encoding. I got a solution with mixed encodings and I want to make them all the same (UTF-8 with signature). I know how to save single files, but how about all files in a project? ...

How to convert Unicode string into a utf-8 or utf-16 string?

How to convert Unicode string into a utf-8 or utf-16 string? My VS2005 project is using Unicode char set, while sqlite in cpp provide int sqlite3_open( const char *filename, /* Database filename (UTF-8) */ sqlite3 **ppDb /* OUT: SQLite db handle */ ); int sqlite3_open16( const void *filename, /* Database filename (UT...

How to convert UTF-8 to US-Ascii in Java

We have a system where customers, mainly European enter texts (in UTF-8) that has to be distributed to different systems, most of them accepting UTF-8, but now we must also distribute the texts to a US system which only accepts US-Ascii 7-bit So now we'll need to translate all European characters to the nearest US-Ascii. Is there any Ja...

How do I check that string has only international letters and spaces in UTF8 in PHP?

In Python I could've converted it to Unicode and do '(?u)^[\w ]+$' regex search, but PHP doesn't seem to understand international \w, or does it? ...

utf-8 and htmlentities in RSS feeds

I'm writing some RSS feeds in PHP and stuggling with character-encoding issues. Should I utf8_encode() before or after htmlentities() encoding? For example, I've got both ampersands and Chinese characters in a description element, and I'm not sure which of these is proper: $output = utf8_encode(htmlentities($source)); or $output = htmle...

Light C Unicode Library

Im looking for a small C library to handle utf8 strings. Specifically, splitting based on unicode delimiters for use with stemming algorithms. Related posts have suggested: ICU http://www.icu-project.org/ (I found it too bulky for my purposes on embedded devices) UTF8-CPP: http://utfcpp.sourceforge.net/ (Excellent, but C++ not C) Ha...

How do I determine the character set of a string?

I have several files that are in several different languages. I thought they were all encoded UTF-8, but now I'm not so sure. Some characters look fine, some do not. Is there a way that I can break out the strings and try to identify the character sets? Perhaps split on white space then identify each word? Finally, is there an easy ...

What could go wrong if I convert ANSI encoded files to UTF-8?

I have an existing ASP.NET 2.0 website, stored in Team Foundation Server 2005. Some of the pages/controls are encoded as ANSI (according to Notepad++) and the Content-Type header is set to: <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"/> I would like to change all pages to UTF-8, and therefore the Content-T...

HTTP headers encoding/decoding in Java

A custom HTTP header is being passed to a Servlet application for authentication purposes. The header value must be able to contain accents and other non-ASCII characters, so must be in a certain encoding (ideally UTF-8). I am provided with this piece of Java code by the developers who control the authentication environment: String f...

error-page directive in web.xml does not display UTF8 properly

I have an application web.xml with the following entry: <error-page> <error-code>404</error-code> <location>/system_files/error/p_notfound.jsp</location> </error-page> However, when this page is displayed, Japanese characters are garbled. The same page (p_notfound.jsp) displays properly if displayed directly or even through ...

How to save file as UTF-8 format

We need to send email which contains Pound (currency) symbols in ColdFusion. Before sending email, we are dumping the data into a html file for preview. How to send a email with utf-8 encoding in ColdFusion How to save a file with utf-8 encoding in ColdFusion ...

How to handle UTF-8 email headers (like Subject:) using Ruby?

I'm an email n00b but I am working on an application that sends HTML email with Unicode characters (as my friend noted "enjoy encoding hell"). The Subject: header comes from user input and therefore may contain Unicode characters. Some mail clients (like GMail and Outlook 2007) are OK with this, but from my reading it seems the right wa...

UTF8 MySQL problems on Rails - encoding issues with utf8_general_ci

I have a staging Rails site up that's running on MySQL 5.0.32-Debian. On this particular site, all of my tables are using utf8 / utf8_general_ci encoding. Inside that database, I have some data that looks like so: mysql> select * from currency_types limit 1,10; +------+-----------------+---------+ | code | name | symbol | ...

UTF8 problem with MySQL 5

I'm migrating my WordPress blog and phpBB Forum into a new hosting server. I am using phpMyAdmin to import the SQL script from the database in the previous site. When I open the .sql script with Kate, it says it uses UTF8 as encoding. When I import the sql in the new server, I have the option in phpMyAdmin to choose the encoding, where...

Classic ASP gremlims, getting a  inserted into text whenever an HTML special character is used

I'm working on an older classic ASP site, and there's a form that allows the user to enter some text (into a multiline textbox), and if they add an html character like (register trademark) it inserts it correctly. But when they go to edit the data, using the same form, the update will add a random 'Â' (circumflex accent) in front of the...