views:

24

answers:

2

I'm looking for a regex that finds all words in a list that do not have characters next to each other that are the same. (this is an exercise)

So abcdef is printed, but aabcdef is not.

I tried both

egrep "^((.)[^\1])*$"

and egrep "^((.)[^\2])*$" words but, other than being not sure which one would be right, they don't work.

I know I can go egrep -v "(.)\1", but i want to use the regex in an OR structure with some other ones, so that's not possible.

For those interested, the full exercise is to find all words that have exactly two character pairs, so aacbb and aabbd are matched, but abcd and aabbcc are not.

Thanks,

A: 

Is egrep a requirement? Or can we switch to something more powerful like perl,python etc?

A think a negative look ahead assertion would work here:

#!/usr/bin/env python

import re

test1 = "abcdef"
test2 = "aabcdef"
test3 = "abbcdef"

r = re.compile(r"^(?:(.)(?!\1))*$")

assert r.match(test1) is not None
assert r.match(test2) is None
assert r.match(test3) is None

I guess the two groups version can be made by combining three of these expressions with ones that do match pairs.

Douglas Leeder
who says *nix tools are not powerful ? :)
ghostdog74
this is for schoolwork, so i'm afraid only grep is allowed (how stupid that might be)
thepandaatemyface
A: 

egrep is deprecated. use grep -E

eg

echo "aacbb" | grep -E "(\w)\1"
ghostdog74
could you explain what \w does? Is it just a replacement for [A-Za-z]?
thepandaatemyface