views:

121

answers:

2

The y operator in Perl does character-by-character transliteration. For example, if we do y/abc/dfg to the string "foobar", we get "foofdr". But what if I want to transliterate "ā" to "ei" and "ä" to "a:" and "ō" to "әu" and "o" to "ɒ".

I tried the following line of code but no luck:(

y/āäōo/(ei)(a:)(әu)ɒ/

Do we hopefully have a workaround for this problem? Or do I have to repeatedly use the s operator and do a lot of cumbersome substitutions?

Thanks in advance for any guidance :)

+14  A: 

In this case, create a hash and go from the keys to the strings easily.

use warnings;
use strict;
use utf8;
binmode STDOUT, ":utf8";
my $string = "āäōo";
my %trans = qw/ā ei ä a: ō u o ɒ/;
my $keys = join '', keys %trans;
$string =~ s/([$keys])/$trans{$1}/g;
print "$string\n";

You need to alter this if your keys are more than one character long by sorting the keys in order of decreasing length and joining them using ( | | ) instead of [ ].

Kinopiko
@Kinopiko, this sounds a good idea. At least I don't have to do a lot of mechanical substitutions :)
Mike
@Kinopiko, this answer has been so far peer upvoted sixth time. I'm conviced that the y operator cannot do the job I expected. We simply cannot change it like we change Perl's default input separator. Thanks :) If I were to use the hash to solve this problem, my code would be this: open my $in,'<',"./test.txt";my %trans = qw/ā ei ä a: ō əu o ɒ/;my @keys = keys %trans;for my $key (@keys){while(<$in>){s/($key)/$trans{$1}/;print;}}please correct me if you see anything suspicious. Thanks again :)
Mike
@Kinopiko and@ysth, thanks for the explanation. I didn't know [] and || can be used this way like in the illustrated code.
Mike
Sorry, I probably should have made it clear that the `y` operator doesn't do that in my answer. It can only convert single characters.
Kinopiko
A: 

It sounds like you're trying to do something similar to Text::Unaccent::PurePerl.

brian d foy
@brian, thanks for introducing this interesting module to me :) But I guess It's only marginally relevant to what I was trying to do. I wanted to transliterate one system of pronunciation symbols to another, which is somewhat different from Text::Unaccent::PurePerl's word-to-word conversion.
Mike
Well, it's doing the same job: you're replacing a characters with possibly multiple characters. Topologically it's the same.
brian d foy