views:

5874

answers:

7

What is the fastest, easiest tool or method to convert text files between character sets?

Specifically, I need to convert from UTF-8 to ISO-8859-15 and vice versa.

Everything goes: one-liners in your favorite scripting language, command-line tools or other utilities for OS, web sites, etc.

Best solutions so far:

On Linux/UNIX/OS X/cygwin:

  • Gnu iconv suggested by Troels Arvin is best used as a filter. It seems to be universally available. Example:

    $ iconv -f UTF-8 -t ISO-8859-15 in.txt > out.txt

    As pointed out by Ben, there is an online converter using iconv.

  • Gnu recode (manual) suggested by Cheekysoft will convert one or several files in-place. Example:

    $ recode UTF8..ISO-8859-15 in.txt
    This one uses shorter aliases:
    $ recode utf8..l9 in.txt

    Recode also supports surfaces which can be used to convert between different line ending types and encodings:

    Convert newlines from LF (Unix) to CR-LF (Dos):
    $ recode ../CR-LF in.txt

    Base64 encode file:
    $ recode ../Base64 in.txt     

    You can also combine them.

    Convert a Base64 encoded UTF8 file with Unix line endings to Base64 encoded Latin 1 file with Dos line endings:
    $ recode utf8/Base64..l1/CR-LF/Base64 file.txt

On Windows with Powershell (Jay Bazuzi):

  • PS C:\> gc -en utf8 in.txt | Out-File -en ascii out.txt

    (No ISO-8859-15 support though; it says that supported charsets are unicode, utf7, utf8, utf32, ascii, bigendianunicode, default, and oem.)

+3  A: 

iconv(1)

iconv -f FROM-ENCODING -t TO-ENCODING file.txt

Also there are iconv-based tools in many languages.

Daniel Papasian
+2  A: 

Under Linux you can use the very powerful recode command to try and convert between the different charsets as well as any line ending issues. recode -l will show you all of the formats and encodings that the tool can convert between. It is likely to be a VERY long list.

Cheekysoft
+10  A: 

Stand-alone utility approach:

iconv -f UTF8 -t ISO88591 in.txt > out.txt

f: from
t: to

Troels Arvin
I found this the best one if it's available, only it's UTF-8 and ISO-8859-1 (names without dashes wouldn't work for me)
Antti Sykäri
Antti Sykäri: There must be something wrong with your iconv. The non-dash versions are even used in the examples in the manual page for iconv.
Troels Arvin
A: 

Yudit editor supports and converts between many different text encodings, runs on linux, windows, mac, etc.

Adam Davis
+2  A: 

Ooh, can I use PowerShell?

Get-Content -Encoding UTF8 FILE-UTF8.TXT | Out-File -Encoding UTF7 FILE-UTF7.TXT

The shortest version, if you can assume that the input BOM is correct:

gc FILE.TXT | Out-File -en utf7 file-utf7.txt
Jay Bazuzi
+1  A: 

If it's only a few files you might want to give the iconv online converter a try...

Ben
+2  A: 

PHP iconv()

iconv("UTF-8", "ISO-8859-15", $input);