views:

693

answers:

3

How can I check the file encoding in a shell script? I need to know if a file is encoded in utf-8 or iso-8859-1.

Thanks

+2  A: 

I'd just use

file -bi myfile.txt

to determine the character encoding of a particular file.

A solution with an external dependency but I suspect file is very common nowadays among all semi-modern distro's.

EDIT:

As a response to Laurence Gonsalves' comment: b is the option to be 'brief' (not include the filename) and i is the shorthand equivalent of --mime so the most portable way (including Mac OSX) then probably is:

file --mime myfile.txt
ChristopheD
this works...thanks
rizidoro
`file -bi` works for me on Linux, but not on OS-X. On OS-X it says "regular file" regardles of the encoding. `file --mime` or just `file` with no flags works on both OS-X and Linux.
Laurence Gonsalves
@Laurence Gonsalves: thanks for letting me know, I've updated the answer accordingly.
ChristopheD
+2  A: 

you can use the file command file --mime myfile.text

Jochen Hilgers
+2  A: 

There's no way to be 100% certain (unless you're dealing with a file format that internally states its encoding).

Most tools that attempt to make this distinction will try and decode the file as utf-8 (as that's the more strict encoding), and if that fails, then fall back to iso-8859-1. You can do this with iconv "by hand", or you can use file:

$ file utf8.txt
utf8.txt: UTF-8 Unicode text
$ file latin1.txt
latin1.txt: ISO-8859 text

Note that ASCII files are both UTF-8 and ISO-8859-1 compatible.

$ file ascii.txt
ascii.txt: ASCII text

Finally: there's no real way to distinguish between ISO-8859-1 and ISO-8859-2, for example, unless you're going to assume it's natural language and use statistical methods. This is probably why file says "ISO-8859".

Laurence Gonsalves