tags:

views:

445

answers:

6

I have data like this:

# data_display  

ab as we hj kl  
12 34 45 83 21  
45 56 98 45 09

I need just the first column alone, and only the rows starting with numbers.

I now use:

# data_display | awk '{ print $1 }' | grep "^[0-9]"

Is there any way to optimise it more, like using the regex in awk itself?

I am very new to awk.

Thanks.

KK

A: 

Sure you can:

pax> echo 'ab as we hj kl  
12 34 45 83 21  
45 56 98 45 09' | awk '/^[0-9]/ {print $1}'

gives you:

12
45

Awk commands consist of an actual pattern to match and a command to run. If there's no pattern, the command runs for all lines.

paxdiablo
+5  A: 

In awk, regular expressions come before the print statement including the curly braces. So in your case, the awk call would be:

awk '/^[0-9]/ {print $1}'
LiraNuna
+2  A: 

You can place the grep regexp in the awk command directly:

data_display | awk '/^[0-9]/{ print $1 }'
rsp
+1  A: 

You could use cut instead of awk:

$ data_display | grep '^[0-9]' | cut -f 1 -d ' '
Svante
why use 2 commands when 1 already does it... producing overheads
levislevis85, you should always test your assumptions. awk is a complete programming language, cut and grep are simpler tools. Have you tested which overhead is greater? I'm just pointing to options.
Svante
Anyway, I think that `cut` comes in when awk cannot use the `$1` mechanism. The funny thing is, moving the regex from an external grep into the awk script makes almost no difference -- so the overhead of "additional commands" seems to be insignificant.
Svante
I voted for this one because it first takes care of all lines that it does NOT need to process, then it uses simpler cut over complicated awk to do the job. It's a question if invocation of two simple programs would be faster than one complex one, I'm guessing it be dependent on the dataset itself.
Marcin
+1  A: 

for more accuracy, check for actual numbers (in case you have data like 1a, which is not a number but will match using the solution given so far.

$ awk '$1+0==$1' file

or

awk '$1 ~/^[0-9]+$/' file
good for showing the regex comparison operator "~"
glenn jackman
+1  A: 

cut -d' ' -f1 filename | grep '^[0-9]'

this should be the fastest. since awk looks and classifies the file as records and fields.

here we are minimizing the amount of data that grep needs to process by cutting the first field.

Venkataramesh Kommoju