How do I convert a file to its utf-8 format using Perl? and how do I check whether the converted file is in utf-8 format?
To do converting, take a look on Text::Iconv
use Text::Iconv;
$converter = Text::Iconv->new("fromcode", "tocode");
$converted = $converter->convert("Text to convert");
Installing bindings to the iconv
library such as Text::Iconv
is not necessary because Perl already comes with a character encoding library on its own: Encode
. Part of it is piconv
, an iconv(1)
workalike. Use it to batch convert files to UTF-8. ANSI is just a stupid name for the group of windows-125?
encodings. You most likely have files encoded in windows-1252. Example:
piconv -f windows-1252 -t UTF-8 < input-file > output-file
If metadata are missing, heuristics have to be used to determine the encoding of a file content. I have been recommending Encode::Detect
.
Hey,
that depends on the string you got. if it's a file been uploaded - i think this code will help. but if it's a text from web / text that converted itself to utf-8 ( because you're working on utf-8 ) then you'll have a problem figuring it out.
i usually use:
use Encoding::Guess
my $enc = guess_encoding($string);
and then with the above code, i do:
use Text::Iconv;
$converter = Text::Iconv->new($enc,"utf-8");
$converted = $converter->convert("Text to convert");
FYI utf-8 list can be found here:
http://www.fileformat.info/info/charset/UTF-8/list.htm?start=1024
using Encode module you can easily encode in different encoding
e.g;
my $str = "A string in Perl internal format ....";
my $octets = encode("utf-8",$str,Encode::FB_CROAK);
to check for utf you can use function
is_utf8($str,Encode::FB_CROAK)