ansaurus

Question

Problem with regular expression using grep

Answer 1

A:

use word boundaries. not sure if this works in grep, but in other regex implementations i'd surround it with whitespace or word boundary patterns

'\s+04\s+' or '\b04\b'

Something like that

Rich 2009-10-06 19:08:56

Answer 2

+6 A:

I'm sure you are about to get buried in clever regular expressions, but I think in this case all you need to do is include one of the spaces on each side of your region code in the grep.

grep ' 04 ' numbers.txt

DigitalRoss 2009-10-06 19:12:00

even "grep '04 ' numbers.txt"

Flavius Stef 2009-10-06 19:14:32

Thanks. *Bounces head against the nearest wall.**

WebDevHobo 2009-10-06 19:14:53

Answer 3

+2 A:

I'd do:

awk '$2 == "04"' < numbers.txt

and with grep:

grep -e '^[^ ]*[ ]*04[ ]*[^ ]*$' numbers.txt

yogsototh 2009-10-06 19:12:53

Answer 4

A:

If you want region codes alone, you should use:

grep "[[:space:]]04[[:space:]]"

this way it will only look for numbers on the middle column, while start or end of strings are considered word breaks.

You can even do:

function search_region_codes {
   grep "[[:space:]]${1}[[:space:]]" FILE
}

replacing FILE with the name of your file,

and use

search_region_codes 04

or even

function search_region_codes {
   grep "[[:space:]]${1}[[:space:]]" $2
}

and using

search_region_codes NUMBER FILE

2009-10-06 19:41:52

Answer 5

A:

Are you searching for an entire region code, or a region code that contains the subpattern?

If you want the whole region code, and there is at least one space on either side, then you can format the grep by adding a single space on either side of the specific region code. There are other ways to indicate word boundaries using regular expressions.

grep ' 04 ' numbers.txt

If there can be spaces in the name or phone number fields, than that solution might not work. Also, if you the pattern can be a sub-part of the region code, then awk is a better tool. This assumes that the 'name' field contains no spaces. The matching operator '==' requires that the pattern exactly match the field. This can be tricky when there is whitespace on either side of the field.

awk '$2 == "04" {print $0}' < numbers.txt

If the file has a delimiter, than can be set in awk using the '-F' argument to awk to set the field separator character. In this example, a comma is used as the field separator. In addition, the matching operator in this example is a '~' allowing the pattern to be any part of the region code (if that is applicable). The "/y" is a way to match work boundaries at the beginning and end of the expression.

awk -F , '$2 ~ /\y04\y/ {print $0}' < numbers.txt

In both examples, the {print $0} is optional, if you want the full line to be printed. However, if you want to do any formatting on the output, that can be done inside that block.

semiuseless 2009-10-06 19:48:24

ansaurus

tags:

views:

answers:

Problem with regular expression using grep

related questions