views:

248

answers:

4

Can you point me tool to convert japanese characters to unicode?

+2  A: 

CPAN gives me "Unicode::Japanese". Hope this is helpful to start with. Also you can look at article on Character Encodings in Perl and perl doc for unicode for more information.

Space
s/artical/article/
Brad Gilbert
Brad, you have editing powers. :)
brian d foy
Why -ve? is that only for spell mistake :).
Space
+1  A: 

See http://p3rl.org/UNI.

use Encode qw(decode encode);
my $bytes_in_sjis_encoding = "\x88\xea\x93\xf1\x8e\x4f";
my $unicode_string = decode('Shift_JIS', $bytes_in_sjis_encoding); # returns 一二三
my $bytes_in_utf8_encoding = encode('UTF-8', $unicode_string); # returns "\xe4\xb8\x80\xe4\xba\x8c\xe4\xb8\x89"

For batch conversion from the command line, use piconv:

piconv -f Shift_JIS -t UTF-8 < infile > outfile
daxim
A: 

First, you need to find out the encoding of the source text if you don't know it already.

The most common encodings for Japanese are:

  1. euc-jp: (often used on Unixes and some web pages etc with greater Kanji coverage than shift-jis)
  2. shift-jis (Microsoft also added some extensions to shift-jis which is called cp932, which is often used on non-Unicode Windows programs)
  3. iso-2022-jp is a distant third

A common encoding conversion library for many languages is iconv (see http://en.wikipedia.org/wiki/Iconv and http://search.cpan.org/~mpiotr/Text-Iconv-1.7/Iconv.pm) which supports many other encodings as well as Japanese.

David Morrissey
A: 

This question seems a bit vague to me, I'm not sure what you're asking. Usually you would use something like this:

open my $file, "<:encoding(cp-932)", "JapaneseFile.txt"

to open a file with Japanese characters. Then Perl will automatically convert it into its internal Unicode format.

Kinopiko