tags:

views:

84

answers:

1

Given a source text like

nin2 hao3 ma

(which is a typical way to write ASCII Pinyin, without proper accentuated characters) and given a (UTF8) conversion table like

a1;ā
e1;ē
i1;ī
o1;ō
u1;ū
ü1;ǖ
A1;Ā
E1;Ē
...

how would I convert the source text into

nín hǎo ma

?

For what it's worth I'm using PHP, and this might be a regex I'm looking into?

+1  A: 
<?php
$in = 'nin2 hao3 ma';
$out = 'nín hǎo ma';

function replacer($match) {
  static $trTable = array(
    1 => array(
      'a' => 'ā',
      'e' => 'ē',
      'i' => 'ī',
      'o' => 'ō',
      'u' => 'ū',
      'ü' => 'ǖ',
      'A' => 'Ā',
      'E' => 'Ē'),
    2 => array('i' => 'í'),
    3 => array('a' => 'ǎ')
  );
  list(, $word, $i) = $match;
  return str_replace(
    array_keys($trTable[$i]),
    array_values($trTable[$i]),
    $word); }

// Outputs: bool(true)
var_dump(preg_replace_callback('~(\w+)(\d+)~', 'replacer', $in) === $out);
Ollie Saunders
Impressive. Thanks!
Philipp Lenssen