views:

113

answers:

1

I use SCIM on Linux for Chinese and Japanese language input. Unfortunately, when I try to capture input using Perl's STDIN, the input is crazy. As roman characters are typed, SCIM tries to guess the correct final characters.

^H (backspace) codes are used to delete previously suggested chars on the command line. (As you type, SCIM tries to guess final Asian chars and displays them.) However, these backspace chars are shown literally as ^H and not interpreted correctly.

Example one-liner:

perl -e 'print "Chinese: "; my $s = <STDIN>; print $s'

When I enable SCIM Chinese or Japanese language input, as I type, e.g., nihao => 你好, here is the result:

你^H你^H你^H你^H你^H你好^H^H你好^H^H你好^H^H你哈^H^H你哈^H^H你哈^H^H你好^H^H你好^H^H你好^H^H你好

At the very end of this string, you can see "你好" (nihao/hello). At a normal bash prompt, if I type nihao (with Chinese enabled), the results is perfect.

This has something to do with interpretation of backspace chars (or control chars) during Perl's STDIN. The same thing happens when using command 'read' in Bash.

Witness: read -p 'Chinese: ' s && echo $s

+2  A: 

The problem is that you need something to interpret the backspace characters. The normal bash prompt does that. If you turned SCIM off and typed ca<BACKSPACE>ot<ENTER>, it would look like you typed cot, but Perl would see it as ca^Hot.

You can use a full-fledged readline package (like Term::ReadLine and a suitable back-end), or you could just fix up the strings after reading them. There's a Clean::Backspace module that does that, but it doesn't appear to be Unicode-safe, which pretty much makes it unusable for this application.

Here's a quick stab at some code that ought to work:

my $s = <STDIN>; 
1 while $s =~ s/(?!\cH)\X\cH//g;   # Delete character-backspace pair
print $s;

You'd probably want to make a subroutine to handle this.

cjm