tags:

views:

109

answers:

4

I am trying to remove/delete the second last character of a Tibetan script, as shown below (character in following example are of English):

$char = "ti.be.tan.|";

So I want to remove the "second last" character "." I tried in following way with my few knowledge of RE:

$char =~ s/.|$/|/g;
$char =~ s/[.|]$/|/g;
$char = tr/.|//d;       # and later add |.

What am I doing wrong?

+8  A: 

Before I tell you what you need to do right, let's look at what you're doing wrong:

$char =~ s/.|$/|/g;

The problem here is that both . and | are metacharacters in regular expressions. The | means "or", so you're saying "match . or $". You correctly know that $ means the end of the string, but . means "any one character." So it immediately matches one character, and continues to immediately match one character, each time changing that character to | (metacharacters don't apply in the second half of the s/// expression), then it matches the end of the string and adds a | in there. Or something like that. Basically, not what you want to happen.

$char =~ s/[.|]$/|/g;

Well, inside []s, . and | stop being metacharacters, but [] means "one of these," so this regular expression looks for the character before the end of the string, and if it's either | or ., it changes it to |. Again, not what you want to happen.

$char = tr/.|//d;       # and later add |.

tr is the wrong tool for this job. This would delete all . and | characters in your string, expect that you're not using the =~ regex match operator, but the = assignment operator. Definitely not what you want to happen.

What you want is this:

$char =~ s/\.\|$/|/;

We've escaped both the . and the | with a \ so Perl knows "the character after the \ is a literal character with no special meaning*" and matches a literal .| at the end of your string and replaces it with just |.

That said, it sounds like you're kind of new to regular expressions. I'm a big fan of perldoc perlretut, which I think is one of the best (if not the best) introduction to regular expressions in Perl. You should really read it - regexes are a powerful tool in the hands of those who know them, and a powerful headache to those who don't.

Chris Lutz
Thanks... Both "." and "|" are not the character you are referring to.I gave those two as an example of "Tibetan unicode charater 0F0B(.) and OFOD(|). Thanks for pointing me to perl's documentation.
Cthar
You'll probably need to use Unicode escapes to get the same effect. I don't know Tibetan, but I bet the `perldoc` regular expression pages will discuss this issue at some point.
Chris Lutz
The O'Reilly book on regular expressions is quite good too.
Chris Huang-Leaver
+3  A: 

Chris Lutz has already provide an excellent answer so I just want to provide additional answer in case you want to remove second last character of other kind of string.

Here it is:

$char =~ s/(.)(.)$/\2/g;

Basicaly, Perl (actally RegEx) will map everything between '(' and ')' to groups. Which you can manipulate that group later. From this code the gourps are.

$char =~ s/(.)(.)$/\2/g;
#          ^-^^-^  ^^
#  Capture G1 G2   ++-- Then replace it with only group 2

So in this case, Perl goes from the first character, since it was not match any, it let go (not replace), when it find a match it replace the match with what you specified (in this case is group#2).

Hope this helps.

NawaMan
`\2` escapes are deprecated, and will generate a warning if you `use warnings;` (which I'm sure you always do :P). You should use `$2` instead.
Chris Lutz
to clarify, they are deprecated on the right side of s///, not the left.
ysth
Oh really?!! :-o I always use it because I first use it in sed. :p
NawaMan
Why two sets of brackets? Matches don't need to be captured to be replaced.$char =~ s/.(.)$/$1/g; would be just as effective.
EmFi
That's right. :p
NawaMan
+1  A: 

You could also use substr as an lvalue in this situation:

$char = "ti.be.tan.|";
substr($char,-2,1) = "";
print $char;               # ===>  ti.be.tan|
mobrule
A: 

There's also the method using positive lookahead assertion to remove the second last character.

$char ~= s/.(?:.$)//;

Which essentially reads substitute "" for any character which is immediately followed by a single character and the end of the string.

If the second last character is always a specific character you can replace the first . Remember to escape RE metacharacters ()[]/.*?

EmFi