I have a requirement where a client will supply a file in encoding ANSI, but my system can only successfully read a file in UNICODE. So how do I tackle this issue? I know when I "save as" the file into as UNICODE encoded the file gets picked up. It's difficult to make the client comply with our request. So can I have any batch program for this folder to convert this file into UNICODE and then pick up?
+11
A:
iconv
can do that:
Usage: iconv [OPTION...] [FILE...]
Convert encoding of given files from one encoding to another.
Input/Output format specification:
-f, --from-code=NAME encoding of original text
-t, --to-code=NAME encoding for output
Information:
-l, --list list all known coded character sets
Output control:
-c omit invalid characters from output
-o, --output=FILE output file
-s, --silent suppress warnings
--verbose print progress information
-?, --help Give this help list
--usage Give a short usage message
-V, --version Print program version
Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.
For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html>.
Johannes Schaub - litb
2009-03-08 10:17:26
+3
A:
You can also easily convert encodings in python:
inf = open("infile.txt")
data = inf.read().decode("latin1")
inf.close()
outf = open("outfile.txt", "w")
outf.write(data.encode("utf-8"))
outf.close()
sth
2009-03-08 10:47:00
+1
A:
Here's a Powershell solution
$lines = gc "pathToFile"
$lines | out-file -enconding Unicode
JaredPar
2009-03-08 13:27:20
+6
A:
Neither ANSI nor Unicode are encodings.You'll have to know the ANSI codepage of the input file and the Unicode encoding (UTF8 or UTF16 - LE or BE) before you can use one of the suggested tools (such as iconv)
Serge - appTranslator
2009-03-08 13:40:58
Wish I could upvote this more. For most Windows users, "Unicode" means UTF32. Most western European languages use Latin1 codepage, so most people assume that's "ANSI" encoding (again, I blame MS for their word usage in their "Save As" options).
Joe Pineda
2009-03-08 14:08:46
We could add that looking into Control Panel->Regional Settings->Advanced Options will show which ANSI code-pages are installed and used.
Joe Pineda
2009-03-08 14:14:51
On Windows systems, "Unicode" usually means UTF-16.
Alan Moore
2009-03-09 05:59:05