ansaurus

Question

What is a Perl regex for finding the first non-consecutively-repeating character in a string?

Answer 1

+1 A:

use 5.010;
$str=~/^(([a-z])\g{-1}+)*(?<c>[a-z])/i;
$char = $+{c};

Anon 2010-03-30 21:27:33

Wrong. `[a-z]{2,}` will match `abc`.

KennyTM 2010-03-30 21:29:14

o crap good point

Anon 2010-03-30 21:35:37

try now I fixed it

Anon 2010-03-30 21:37:49

Error: Sequence (?<c...) not recognized in regex;

DVK 2010-03-30 21:44:07

@DVK: that's the 5.10 labeled capture feature, as is the \g relative back reference.

brian d foy 2010-03-30 23:33:40

@brian - illuminating as usual. I paid some attention to 5.10 out if curiocity but at work I'm still suck in late Jurassic with 5.8 and smatterings of 5.005 *gag* :(

DVK 2010-03-31 00:53:31

Answer 2

+2 A:

(?:(.)\1+)*(.?)

Get the 2nd capture. (Will return an empty string if every character is consecutively duplicated.)

Test cases:

~:2434$ perl -e "\"abc\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
a
~:2435$ perl -e "\"aabbcc\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"

~:2436$ perl -e "\"aabbc\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
c
~:2437$ perl -e "\"aabcc\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
b
~:2438$ perl -e "\"aabcbbbcccccc\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
b
~:2439$ perl -e "\"aabbvbbcccccc\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
v
~:2440$ perl -e "\"aabbcdecc\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
c
~:2441$ perl -e "\"aabbccddeef\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
f
~:2442$ perl -e "\"faabbccddeef\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
f
~:2443$ perl -e "\"faabbccddeefax\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
f
~:2444$ perl -e "\"xfaabbccddeefx\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
x
~:2445$ perl -e "\"xabcdefghai\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
x
~:2446$ perl -e "\"cccdddeeea12345\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
a
~:2447$ perl -e "\"1234a5678a23\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
1

Or (will not match if every character is consecutively duplicated.)

(?:^|(.)(?!\1))(.)(?!\2)

KennyTM 2010-03-30 21:31:03

Does not work ;(. the captures were "a" and "b". I will leave it to you to figure out why :)

DVK 2010-03-30 21:42:32

Whoever up-voted this, please take it back - it does not work!

DVK 2010-03-30 21:44:59

@Kenny - second try's much better. You caught the error quick. I got this far on my own, but I'm also stuck on "what if no duplicates upfront".

DVK 2010-03-30 21:49:30

@DVK: `perl -e "\"aabbcdecc\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"` gives me `c`; @spong `perl -e "\"abcdefg\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"` gives me `a`. I don't know why it's wrong with you guys.

KennyTM 2010-03-30 21:52:40

@kenny - sorry, I was testing your original expression (.)

DVK 2010-03-30 21:56:28

@DVK: Ah I see.

KennyTM 2010-03-30 21:57:38

After the edit, the first expression works!

DVK 2010-03-30 21:57:54

Curiously, your first regex matches any string because you use the * (zero or more) and ? (zero or one) can always match zero cases of anything.

brian d foy 2010-03-30 23:21:22

@brian: Justify your claim. They clearly work. `perl -e "\"aabbccddeef\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"` prints `f`, `perl -e "\"cccdddeeea12345\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"` prints `a`.

KennyTM 2010-03-31 05:53:12

@brian: And the second regex is able to match 123 is *by-design*. The OP wants to match **not consecutively duplicated** characters, not **duplicated** characters.

KennyTM 2010-03-31 05:54:35

I misunderstood the problem. My apologies.

brian d foy 2010-03-31 06:32:24

Answer 3

A:

I wish Perl had a regex negate flag! ie, return all the characters that do NOT match /regex/

What you are looking for is really the regex capture complement of:

m/(.)(\1)+/

I tried all the suggestions on this page against Brian's data list (the result of in his program listing). None work completely.

The regex of:

(?:^|(.)(?!\1))(.)(?!\2)

fails to match the beginning 'f' in line 2 and 3. Brian's does not match the 'f' at the beginning of line 2 and 3 or any of the singletons at the end of line 5.

The regex of:

$str=~/^(([a-z])\g{-1}+)*(?<c>[a-z])/i;
$char = $+{c};

does work.

The only single regex that I found is a simple one:

#!/usr/bin/perl
while( <DATA> ) {
    chomp;
    print "BEFORE: $_\n";
    s/(.)(\1)+//g;
    print "AFTER: $_\n";
    print "charater: " . substr($_,0,1) . "\n\n";
 }
__END__
aabbccddeef
faabbccddeef
faabbccddeefax
xfaabbccddeefx
xabcdefghai
cccdddeeea12345
1234a5678a23
aabbcdecc
abcdefg
aabbccddeef
cccdddeeea12345

This works in the simple case of 'give the first character.' ((edit: reread: sorry, I now read that the obvious delete the doubles was not what you were looking for...))

Love to hear if there is a better solution.

drewk 2010-03-31 05:31:41

This one seems to work: m/(?:(.)\1+)*(.?)/

DVK 2010-03-31 11:44:11

This: m/(?:(.)\1+)*(.?)/ indeed works! Even the singletons on the beginning of lines and in groups are in \2.Nice puzzle...

drewk 2010-03-31 16:31:01

ansaurus

tags:

views:

answers:

What is a Perl regex for finding the first non-consecutively-repeating character in a string?

related questions