tags:

views:

461

answers:

3

I have a data that looks like this:

-1033  
-  
222
100
-30
-
10

What I want to do is to capture all the numbers excluding "dash only" entry.

Why my awk below failed?

 awk '$4 != "-" {print $4}'
+1  A: 

Your awk script says

If the fourth field is not a dash, print it out

However, you want to print it out if the line is not a dash

awk '$0 != "-"'

Default action is to print so no body is needed.

If you want to print group of numbers, you can use a GNU awk extension if you use gawk. It allows splitting records using regular expressions:

gawk 'BEGIN { RS="(^|\n)-($|\n)" } { print "Numbers:\n" $0 }'

Now, instead of lines, it takes a group of numbers separated by a line containing only -. Setting the field separator (FS) to a newline allows you to iterate over the numbers within such a group:

gawk 'BEGIN { FS="\n"; RS="(^|\n)-($|\n)" } 
      { print "Numbers:"; for(i=1;i<=NF;i++) print " *: " $i }'

However I agree with other answers. If you just want to filter out lines matching some text, grep is the better tool for that.

Johannes Schaub - litb
+1  A: 

Assuming that your data file is actually multi-column, and that the values are in column 4, the following will work:

awk '$4 != "-" {print $4} {}'

It prints the value only where it isn't "-". Your version will probably print the value regardless (or twice) since the default action is to print. Adding the {} makes the default action "do nothing".

If the data is actually as shown (one column only), you should be using $1 rather than $4 - I wouldn't use $0 since that's the whole line and it appears you have spaces at the end of your first two lines which would cause $0 to be "-1033 " and "- ".

But, if it were a single column, I wouldn't use awk at all but rather:

grep -v '^-$'
grep -v '^ *- *$'

the second allowing for spaces on either side of the "-" character.

paxdiablo
+1  A: 

Why are you checking $4? It appears you should check $1 or $0 as litb said.

But awk is a heavyweight tool for this job. Try

grep -v '^-$'

To remove lines containing only a dash or

grep -v '^ *- *$'

To remove lines containing only a dash and possibly some space characters.

Norman Ramsey
The file may actually be a multi-column file with the relevant values in column 4. That was my reading. For example, a share transaction file containing date, stock, dollar-value and quantity on each line, and you're only interested in real quantities. (cont...)
paxdiablo
... Things like return of capital would involve changes in purchase price but not quantity.
paxdiablo
Could be, but in that case why does it fail? Beats me.
Norman Ramsey