ansaurus

Question

What regex can match sequences of the same character?

Answer 1

A:

Answering my own question, but got it:

m|(\w)\1+|

Bill 2009-03-13 21:42:53

\W is the opposite of what you want, isn't it?

Telemachus 2009-03-13 23:35:00

Telemachus is right, this will not match the examples you gave in the question.

gpojd 2009-03-15 03:46:28

Also it is better not to use pipes (or any other non default delimiters) for the regular expression unless you have a reason to.

Pat 2009-03-15 22:34:30

Answer 2

+16 A:

Sure thing! Grouping and references are your friends:

(.)\1+

Will match 2 or more occurences of the same character. For word constituent characters only, use \w instead of ., i.e.:

(\w)\1+

David Hanak 2009-03-13 21:43:03

This will only match some chars, and miss ones like '###'. The examples he gave where alphabetic chars, but it doesn't really ask for only alphabetic ones. I'd replace '\w' with '.'.

gpojd 2009-03-15 03:48:01

Well, based on the non-operational examples the questioner gave, I assumed s/he wanted to match alphabetic characters only. I should have expressed this in the explanation though.

David Hanak 2009-03-15 12:52:23

Answer 3

+1 A:

This is what backreferences are for.

m/(\w)\1\1/

will do the trick.

friedo 2009-03-13 21:44:24

This would not match 'aa'.

gpojd 2009-03-15 03:42:43

Answer 4

+2 A:

This will match more than \w would, like @@@:

/(.)\1+/

gpojd 2009-03-14 00:42:49

This is the right one, for "a sequence of the same character", and not just the "aaa", "bbb" examples. +1

Axeman 2009-03-14 19:09:22

Answer 5

+9 A:

Note that in Perl 5.10 we have alternative notations for backreferences as well.

foreach (qw(aaa bbb abc)) {
  say;
  say ' original' if /(\w)\1+/;
  say ' new way'  if /(\w)\g{1}+/;
  say ' relative' if /(\w)\g{-1}+/;
  say ' named'    if /(?'char'\w)\g{char}+/;
  say ' named'    if /(?<char>\w)\k<char>+/;
}

oylenshpeegul 2009-03-14 00:51:19

http://perldoc.perl.org/perlre.html or http://perldoc.perl.org/search.html?q=perlre

Brad Gilbert 2009-03-14 03:53:06

Answer 6

A:

This is also possible using pure regular expressions (i.e. those that describe regular languages -- not Perl regexps). Unfortunately, it means a regexp whose length is proportional to the size of the alphabet, e.g.:

(a* + b* + ... + z*)

Where a...z are the symbols in the finite alphabet.

So Perl regexps, although a superset of pure regular expressions, definitely have their advantages even when you just want to use them for pure regular expressions!

Edmund 2009-03-20 04:07:39

ansaurus

tags:

views:

answers:

What regex can match sequences of the same character?

related questions