tags:

views:

2415

answers:

5

I have a requirement where a client will supply a file in encoding ANSI, but my system can only successfully read a file in UNICODE. So how do I tackle this issue? I know when I "save as" the file into as UNICODE encoded the file gets picked up. It's difficult to make the client comply with our request. So can I have any batch program for this folder to convert this file into UNICODE and then pick up?

+4  A: 

recode could do the job.

Jan Jungnickel
+11  A: 

iconv can do that:

Usage: iconv [OPTION...] [FILE...]
Convert encoding of given files from one encoding to another.

 Input/Output format specification:
  -f, --from-code=NAME       encoding of original text
  -t, --to-code=NAME         encoding for output

 Information:
  -l, --list                 list all known coded character sets

 Output control:
  -c                         omit invalid characters from output
  -o, --output=FILE          output file
  -s, --silent               suppress warnings
      --verbose              print progress information

  -?, --help                 Give this help list
      --usage                Give a short usage message
  -V, --version              Print program version

Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.

For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html&gt;.
Johannes Schaub - litb
+3  A: 

You can also easily convert encodings in python:

inf = open("infile.txt")
data = inf.read().decode("latin1")
inf.close()

outf = open("outfile.txt", "w")
outf.write(data.encode("utf-8"))
outf.close()
sth
+1  A: 

Here's a Powershell solution

$lines = gc "pathToFile"
$lines | out-file -enconding Unicode
JaredPar
+6  A: 

Neither ANSI nor Unicode are encodings.You'll have to know the ANSI codepage of the input file and the Unicode encoding (UTF8 or UTF16 - LE or BE) before you can use one of the suggested tools (such as iconv)

Serge - appTranslator
Wish I could upvote this more. For most Windows users, "Unicode" means UTF32. Most western European languages use Latin1 codepage, so most people assume that's "ANSI" encoding (again, I blame MS for their word usage in their "Save As" options).
Joe Pineda
We could add that looking into Control Panel->Regional Settings->Advanced Options will show which ANSI code-pages are installed and used.
Joe Pineda
On Windows systems, "Unicode" usually means UTF-16.
Alan Moore