views:

145

answers:

5

I've got some textfiles that hold names, phone numbers and region codes. One combination per line.

The syntax is always "Name Region_code number"
With any number of spaces between the 3 variables.

What I want to do is search for specific region codes, like 23 or 493, forexample. The problem is that these numbers might appear in the longer numbers too, which might enable a return that shouldn't have been returned.

I was thinking of this sort of command:
grep '04' numbers.txt

But if I do that, a line that contains 04 in the number but not as region code will show as a result too... which is not correct.

A: 

use word boundaries. not sure if this works in grep, but in other regex implementations i'd surround it with whitespace or word boundary patterns

'\s+04\s+' or '\b04\b'

Something like that

Rich
+6  A: 

I'm sure you are about to get buried in clever regular expressions, but I think in this case all you need to do is include one of the spaces on each side of your region code in the grep.

grep ' 04 ' numbers.txt

DigitalRoss
even "grep '04 ' numbers.txt"
Flavius Stef
Thanks. *Bounces head against the nearest wall.**
WebDevHobo
+2  A: 

I'd do:

awk '$2 == "04"' < numbers.txt

and with grep:

grep -e '^[^ ]*[ ]*04[ ]*[^ ]*$' numbers.txt
yogsototh
A: 

If you want region codes alone, you should use:

grep "[[:space:]]04[[:space:]]"

this way it will only look for numbers on the middle column, while start or end of strings are considered word breaks.

You can even do:

function search_region_codes {
   grep "[[:space:]]${1}[[:space:]]" FILE
}

replacing FILE with the name of your file,

and use

search_region_codes 04

or even

function search_region_codes {
   grep "[[:space:]]${1}[[:space:]]" $2
}

and using

search_region_codes NUMBER FILE
A: 

Are you searching for an entire region code, or a region code that contains the subpattern?

If you want the whole region code, and there is at least one space on either side, then you can format the grep by adding a single space on either side of the specific region code. There are other ways to indicate word boundaries using regular expressions.

grep ' 04 ' numbers.txt

If there can be spaces in the name or phone number fields, than that solution might not work. Also, if you the pattern can be a sub-part of the region code, then awk is a better tool. This assumes that the 'name' field contains no spaces. The matching operator '==' requires that the pattern exactly match the field. This can be tricky when there is whitespace on either side of the field.

awk '$2 == "04" {print $0}' < numbers.txt

If the file has a delimiter, than can be set in awk using the '-F' argument to awk to set the field separator character. In this example, a comma is used as the field separator. In addition, the matching operator in this example is a '~' allowing the pattern to be any part of the region code (if that is applicable). The "/y" is a way to match work boundaries at the beginning and end of the expression.

awk -F , '$2 ~ /\y04\y/ {print $0}' < numbers.txt

In both examples, the {print $0} is optional, if you want the full line to be printed. However, if you want to do any formatting on the output, that can be done inside that block.

semiuseless