iconv

Converting UTF8 to ANSI with Ruby

I have a Ruby script that generates a UTF8 CSV file remotely in a Linux machine and then transfers the file to a Windows machine thru SFTP. I then need to open this file with Excel, but Excel doesn't get UTF8, so I always need to open the file in a text editor that has the capability to convert UTF8 to ANSI. I would love to do this pr...

Iconv::IllegalSequence when using www::mechanize

I'm trying to do a little bit of webscraping, but the WWW:Mechanize gem doesn't seem to like the encoding and crashes :-/ The post request results in a 302 redirect (which mechanize follows, so far so good) and the resulting page seems to crash it :-/ I googled quite a bit, but nothing came up so far how to solve this. Any of you got an ...

How to fix weird issue with iconv on Mac Os x

Hi all, I am on Mac Os X 10.5 (but I reproduced the issue on 10.4) I am trying to use iconv to convert an UTF-8 file to ASCII the utf-8 file contains characters like 'éàç' I want the accented characters to be turned into their closest ascii equivalent so my command is this : iconv -f UTF-8 -t ASCII//TRANSLIT//IGNORE myutf8file.t...

any PHP or Ruby library to convert Tranditional Chinese to Simplified Chinese or vice versa?

any PHP or Ruby library to convert Tranditional Chinese to Simplified Chinese or vice versa (Big5 <--> GB)? the iconv library won't do it as it merely convert the encoding -- the glyph stays the same. ...

any good alternative to Iconv library for encoding conversion?

i was using Iconv library on Ruby to convert encoding from UTF-8 to UTF-32, UTF-16 etc and it was quite good. However, I do see an issue when converting from Big5 to UTF-8 -- an exception is thrown for invalid sequence... and the problem goes away when it is converting from CP950 to UTF-8, of which CP950 is essentially Big5... so I wo...

iconv gives "Illegal Character" with smart quotes -- how to get rid of them?

I have a MySQL table with 120,000 lines stored in UTF-8 format. There is one field, product name, that contains text with many accents. I need to fill a second field with this same name after converting it to a url-friendly form (ASCII). Since PHP doesn't directly handle UTF-8, I'm using: $value = iconv ('UTF-8', 'ISO-8859-1', $value)...

How do I remove accents from characters in a PHP string?

I'm attempting to remove accents from characters in PHP string as the first step to making the string usable in a URL. I'm using the following code: $input = "Fóø Bår"; setlocale(LC_ALL, "en_US.utf8"); $output = iconv("utf-8", "ascii//TRANSLIT", $input); print($output); The output I would expect would be something like this: F'oo ...

Should I use mb_* or iconv_* functions for multibyte strings?

Hi there! As we all now, handling multibyte strings is not that easy in PHP. For example I want to get the length of the following string: ä strlen('ä'); // 2, because ä equals 2 bytes mb_strlen('ä', 'UTF-8'); // 1 iconv_strlen('ä', 'UTF-8'); // 1 Which functions should I use? The mb_* or iconv_*? Why? Considering that the encoding ...

osx change file encoding (iconv) recursive

hi, I know I can convert a single file encoding under OSX using: iconv -f ISO-8859-1 -t UTF-8 myfilename.xxx > myfilename-utf8.xxx I have to convert a bunch of files with a specific extension, so I want to convert file encoding from ISO-8859-1 to UTF-8 for all *.ext files in folder /mydisk/myfolder perhaps someobe know the syntax how ...

Character encoding issues when reading XLS files with PHP

I'm using the PHP-Excel-Reader library to read some XLS files and immediately have hit this issue: PHP Notice: iconv() [function.iconv]: Detected an incomplete multibyte character in input string in C:\web\docs\housing\excel_reader2.php on line 1718 The line in question is this: $result = iconv('UTF-16LE', $this->_defaultEncoding...

Zend Framework and string covertation using iconv

Hello, everyone One site was moved to another server where is installed Solaris and other iconv settings. Now, when I validate anythink with "StringLength" function from Zend Framework my scripts fail with this error: Notice: iconv_strlen() [function.iconv-strlen]: Wrong charset, conversion from `UTF-8' to `UCS-4LE' is not allowed in /...

Uploaded file char-set conversion with Ruby

I have an application where we're having our clients upload a csv file to our server. We then process and put the data from the csv into our database. We're running into some issues with char-sets especially when we're dealing with JSON, in particular some non-converted UTF-8 characters are breaking IE on JSON responses. Is there a way ...

iconv encoding conversion problem

I am having trouble converting strings from utf8 to gb2312. My convert function is below void convert(const char *from_charset,const char *to_charset, char *inptr, char *outptr) { size_t inleft = strlen(inptr); size_t outleft = inleft; iconv_t cd; /* conversion descriptor */ if ((cd = iconv_open(to_charset, from_ch...

encoding problem on file_get_contents

i'm using a script for getting a url's content then it calculates keyword destiny etc. but my problem is that, there is problem about turkish characters like "ı","ş" i tried iconv for converting utf-8 to iso-8859-9 but it didn't work. you can see the code on http://www.gazihanisildak.com/keyword/code.txt thx in advance. ...

Different results from converting a file from iso-8859-1 to utf-8 iconv in shell vs calling it from python with subprocess

Well, this could be a simple question, to be frank I'm a little confused with encodings an all those things. Let's suppose I have the file 01234.txt which is iso-8859-1. When I do: iconv --from-code=iso-8859-1 --to-code=utf-8 01234.txt > 01234_utf8.txt It gives me the desired result, but when I do the same thing with python and usin...

How can I upgrade the iconv module in a PHP installation?

I am trying to install PHP on a windows 2003 server with iconv library version 2.5. However, if I download the PHP 5.2.13 binaries and try to install it, the iconv library version is listed as 1.1. Is there anyway to upgrade this module alone? Am I missing something? ...

How can I convert Cyrillic stored as LATIN1 ( sql ) to true UTF8 Cyrillic with iconv?

I have a SQL dump file consisting of incorrectly stored Cyrillic Russian ( WINDOWS-1251 ) text, example Èðàíñêèå which should properly be displayed as Иранские. In the past I have successfully converted the sql file but memory fails in what I did and in what order. Logically it would make sense that since it's stored in LATIN1 I would ...

Tab / LF / CR unicode character

I have a Unicode file (UTF-16 FFFE little-endian BOM) which contains rows of tab-separated fields. Read http://stackoverflow.com/questions/2308112/splitting-unicode-i-think-using-split-in-ruby, I am going to use the Ruby split (file to lines, then line to fields). BTW, what's the Unicode char for: LF CR Tab Thanks! ...

Possible to repair garbled Chinese filenames?

I'm downloading via FTP some files with chinese names (BIG5 encoded), and Filezilla displays those filenames as garbage (as FTP cannot handle any encoding other than ASCII and UTF-8, as least the standard compliant ones). Given a filename with garbled characters, is it possible for me to repair the encoding and get a proper filename St...

How to convert non-Latin-based encoded text into UTF-8, or make them coexist on same page?

Good day, I have a script that scrapes the title/description of remote pages and prints those values into a corresponding charset=UTF-8 encoded page. Here is the problem, whenever a remote page is encoded with non-Latin characters encoding like (Arabic, Russian, Chinese, Japanese etc.) the imported values print as garbled text. I've tr...