ansaurus

Question

How can I find repeated letters with a Perl regex?

Answer 1

+5 A:

Use \N to refer to previous groups:

/(\w)\1+/g

Jonathan Lonowski 2008-10-07 14:58:30

Answer 2

A:

How about:

(\w)\1+

The first part makes an unnamed group around a character, then the back-reference looks for that same character.

Joseph Pecoraro 2008-10-07 14:58:52

This only matches the first two repeating chars, not the whole repeating substring.

Michael Carman 2008-10-07 15:31:15

HAHAHA! nice edit! gave you a +1 just cuz your funny. and I too fight the neg-bombs. ;0)

Keng 2008-11-21 13:55:41

Answer 3

+8 A:

I Think using a backreference would work:

(\w)\1+

\w is basically [a-zA-Z_0-9] so if you only want to match letters between A and Z (case insensitively), use [a-zA-Z] instead.

(EDIT: or, like Tanktalus mentioned in his comment (and as others have answered as well), [[:alpha:]], which is locale-sensitive)

hasseg 2008-10-07 14:58:58

instead of [a-zA-Z], just use [[:alpha:]] which is locale-sensitive ;-)

Tanktalus 2008-10-08 20:58:26

Answer 4

+32 A:

You can find any letter, then use \1 to find that same letter a second time (or more). If you only need to know the letter, then $1 will contain it. Otherwise you can concatenate the second match onto the first.

my $str = "Foooooobar";

$str =~ /(\w)(\1+)/;

print $1;
# prints 'o'
print $1 . $2;
# prints 'oooooo'

Adam Bellaire 2008-10-07 15:00:06

For just letters swap out \w for [a-zA-Z].

TomC 2008-10-07 15:21:08

@TomC: That's not unicode safe!

Leon Timmermans 2008-10-07 15:41:49

Now I can replace doubled letters for just one: Regex.Replace(str, @"(\w)\1+", "$1"); thank you Adam.

Junior Mayhé 2009-10-02 03:09:20

Answer 5

+11 A:

I think you actually want this rather than the "\w" as that includes numbers and the underscore.

([a-zA-Z])\1+

Ok, ok, I can take a hint Leon. Use this for the unicode-world or for posix stuff.

([[:alpha:]])\1+

Keng 2008-10-07 15:03:02

We live in a unicode world. [a-zA-Z] will not cover most languages. [[:alpha:]] would be more correct.

Leon Timmermans 2008-10-07 15:40:59

oh you crazy foreigners! ;o)yeah, posix would be a better syntax for the non-American English chars.

Keng 2008-10-07 15:53:46

Answer 6

+3 A:

You might want to take care as to what is considered to be a letter, and this depends on your locale. Using ISO Latin-1 will allow accented Western language characters to be matched as letters. In the following program, the default locale doesn't recognise é, and thus créé fails to match. Uncomment the locale setting code, and then it begins to match.

Also note that \w includes digits and the underscore character along with all the letters. To get just the letters, you need to take the complement of the non-alphanum, digits and underscore characters. This leaves only letters.

That might be easier to understand by framing it as the question "What regular expression matches any digit except 3?", and the answer is /[^\D3]/.

#! /usr/local/bin/perl

use strict;
use warnings;

# uncomment the following three lines:
# use locale;
# use POSIX;
# setlocale(LC_CTYPE, 'fr_FR.ISO8859-1');

while (<DATA>) {
    chomp;
    if (/([^\W_0-9])\1+/) {
        print "$_: dup [$1]\n";
    }
    else {
        print "$_: nope\n";
    }
}

__DATA__
100
food
créé
a::b

dland 2008-10-07 15:55:28

Answer 7

+2 A:

The Following code will return all the characters, which ever repeating twice or more.

my $str = "SSSannnkaaarsss";

print $str =~ /(\w)\1+/g;

2008-10-07 18:08:55

Answer 8

+1 A:

FYI, aside from RegExBuddy, a real handy free site for testing regular expressions is RegExr at gskinner.com. Handles ([[:alpha:]])(\1+) nicely.

bill weaver 2008-10-07 19:28:48

Answer 9

+1 A:

Just for kicks, a completely different approach:

if ( ($str ^ substr($str,1) ) =~ /\0+/ ) {
    print "found ", substr($str, $-[0], $+[0]-$-[0]+1), " at offset ", $-[0];
}

ysth 2008-10-08 04:01:45

Yes, it will find non-letters too. But can you find the subtle bug?

ysth 2008-10-08 04:04:08

ansaurus

tags:

views:

answers:

How can I find repeated letters with a Perl regex?

related questions