tags:

views:

42

answers:

3

I'm trying to find a certain sequence in the text of several .txt files. I am looking for a string that is joined to a 4 digit number. e.g. Watson1990. I tested the regex using an online tester and it appeared to work, however the expression (or combinations of it) failed to produce an output on my files.

My regular expression is as follows:

egrep '\w*\d{4}' *.txt

However it does not produce any output. Can you tell me what is wrong with this? I'm using OSX (Snow Leopard).

Thanks.

+3  A: 

The reason why your regular expression doesn't work is that in extended regular expression syntax the token \d matches the letter d, not a digit. Use the character class [0-9] instead.

Also \w matches digits as well as letters so you probably don't want to use it here. Use the character class [A-Za-z] to match letters in A-Z or a-z.

I changed the * to a + because presumably you want at least one letter before the number. The + means "one or more", whereas * means "zero or more".

Finally you may wish to consider what should happen if you see a 5 digit number. Your regular expression currently accepts it because a 5 digit number starts with a 4 digit number.

In conclusion, try this:

egrep '[a-zA-Z]+[0-9]{4}' *.txt
Mark Byers
@Mark Bryers +1 if you explain to the OP why his command is wrong... I know he was really just looking for a command that works, but explaining why his doesn't would make for a better answer.
Stephen
@Stephen: OK... and I'll upvote your comment if you spell my name correctly. ;)
Mark Byers
Thanks - that looks like a more rigorous method, and I have a better of understanding of how it works also.
celenius
@Mrak Beyrs: Dammit. Sorry. I'll try harder next time. ;)
Stephen
+1  A: 

Your regular expression uses Perl, not extended, regex components. Try

grep -P '\w\d{4}' *.txt

if your version of grep has that option. I'm using GNU grep 2.5.1 and the -P option is listed as "highly experimental".

Joshua Ulrich
+1  A: 

GNU grep

grep -Po "(\w+\d{4})" file
ghostdog74
Thanks - that worked perfectly.
celenius
@ghostdog74 +1 if you explain to the OP why his command is wrong... I know he was really just looking for a command that works, but explaining why his doesn't would make for a better answer.
Stephen
I was looking for something that worked, but also to learn why it was wrong too. It was my first grep / regex. Incidentally, I also used the '-h' flag to remove filenames from the result.
celenius
@celenius My apologies if you think I was trying to put words in your mouth there. I was not, I was merely... well, rambling mostly ;). I did notice that you had said "Can you tell me what is wrong with this?". :)
Stephen