tags:

views:

182

answers:

3

I have data that looks like this:

foo 78 xxx
bar    yyy
qux 99 zzz
xuq    xyz

They are tab delimited. How can I extract lines where column 2 is empty, yielding

bar    yyy
xuq    xyz

I tried this but doesn't seem to work:

awk '$2==""' myfile.txt 
+4  A: 

You need to specifically set the field separator to a TAB character:

> cat qq.in
  foo     78      xxx
  bar             yyy
  qux     99      zzz
  xuq             xyz
> cat qq.in | awk 'BEGIN {FS="\t"} $2=="" {print}'
  bar             yyy
  xuq             xyz

The default behaviour for awk is to treat an FS of SPACE (the default) as a special case. From the man page:

In the special case that FS is a single space, fields are separated by runs of spaces and/or tabs and/or newlines. (my italics)

paxdiablo
You can specify the tab char as an awk option and not in the BEGIN clause. This is bash syntax for a tab char: `awk -F $'\t' '$2 == ""' file ...`
glenn jackman
+2  A: 
grep -e '^.*\t\t.*$' myfile.txt

Will grep each line consisting of characters-tab-tab-characters (nothing between tabs).

Konerak
@daotoad: he said "extract lines where column 2 is empty". If he wants ONLY column2 to be empty, he has to specify as such. In that case, replace the .* with the range of valid characters (or an inversion of range of not-allowed characters). Maybe [^\t] will do. Depends on his specs!
Konerak
Actually, @daotoad, in a 3-column file, I think the only way to get two consecutive tabs is if column two is empty regardless of the state of columns one or three. So this answer still seems valid to me.
paxdiablo
Yep, scratch my earlier comment. The two tabs can only occur on an "inner blank". In other words on an N column table (where N >=3) they can occur on columns 2 through N-1. Since we have a three column table, that means 2 only. The delimiter on the end columns is a newline or the start/end of file. The mistake proves that it's past my bedtime. Sorry for the error.
daotoad
+4  A: 
perl -F/\t/ -lane 'print unless $F[1] eq q//' myfile.txt

Command Switches

  • -F tells Perl what delimiter to autosplit on (tabs in this case)
  • -a enables autosplit mode, splitting each line on the specified delimiter to populate an array @F
  • -l automatically appends a newline "\n" at the end of each printed line
  • -n processes the file line-by-line
  • -e treats the first quoted argument as code and not a filename
Zaid
Great job with autosplit! I always forget about it and end up writing little scripts when a one liner would do. Shouldn't that be `print if $F[1] eq ''`? Otherwise, a column like `aaa 0 bbb` will be printed. Also, for my shell/perl, the `-F` option needs to be `-F'\t'` or `-F"\t"`, it doesn't like `-F/\t/` despite what perlrun says. I can get around the interpolation with `-F=/\\t/` too, but its pretty ugly.
daotoad