tags:

views:

212

answers:

5

I'm looking for a simple regular expression to match the same character being repeated more than 10 or so times times, so for example if I have a document littered with horozntal lines:

=================================================

It will match the line of = characters because it is repeated more than 10 times.

Note that I'd like this to work for any character.

A: 
={10,}

matches = that is repeated 10 or more times.

SilentGhost
sure that this does not take 10 or more arbitrary characters?
Etan
`perl -e 'print "NO" if "abcdefghijklmno" =~ /.{10,}/;'`
Kinopiko
This got two upvotes? It's wrong.
Kinopiko
it was wrong, but it has been edited (to match my answer which got some downvotes, good)
dalloliogm
And it's still wrong.
Kinopiko
it's still wrong since it works only for =.
Etan
*Gee, didn't know I had to say explicitly that you can replace the character with anything you want.*
SilentGhost
the title of the question was misleading..
dalloliogm
A: 

use the {10,} operator:

$: cat > testre
============================
==
==============

$: grep -E '={10,}' testre
============================
==============
dalloliogm
+8  A: 

The regex you need is /(.)\1{9,}/.

Test:

#!perl
use warnings;
use strict;
my $regex = qr/(.)\1{9,}/;
print "NO" if "abcdefghijklmno" =~ $regex;
print "YES" if "------------------------" =~ $regex;
print "YES" if "========================" =~ $regex;

Here the \1 is called a backreference. It references what is captured by the dot . between the brackets (.) and then the {9,0} asks for nine or more of the same character. Thus this matches ten or more of any single character.

Although the above test script is in Perl, this is very standard regex syntax and should work in any language. In some variants you might need to use more backslashes, e.g. Emacs would make you write \(.\)\1\{9,\} here.

Kinopiko
Thanks - this works a treat.
Kragen
+2  A: 

. matches any character. Used in conjunction with the curly braces already mentioned:

$: cat > test
========
============================
oo
ooooooooooooooooooooooo


$: grep -E '(.)\1{10}' test
============================
ooooooooooooooooooooooo
jeekl
+2  A: 

In Python you can use (.)\1{9,}

  • (.) makes group from one char (any char)
  • \1{9,} matches nine or more characters from 1st group

example:

txt = """1. aaaaaaaaaaaaaaa
2. bb
3. cccccccccccccccccccc
4. dd
5. eeeeeeeeeeee"""
rx = re.compile(r'(.)\1{9,}')
lines = txt.split('\n')
for line in lines:
 rxx = rx.search(line)
 if rxx:
  print line

Output:

1. aaaaaaaaaaaaaaa
3. cccccccccccccccccccc
5. eeeeeeeeeeee
Michał Niklas
if re.search(line): print line (the assignemnt to the rxx variable is not necessary)
dalloliogm
You are right in this simple context. Using variable rxx I can do something like rxx.group(1), rxx.start(1) etc.
Michał Niklas