tags:

views:

75

answers:

4

I have a data file that looks like the following example. I've added '%' in lieu of \t, the tab control character.

1234:56%  Alice Worthington
alicew%   Jan 1, 2010 10:20:30 AM%  Closed%   Development
Digg:
Reddit:
Update%%  file-one.txt%   1.1%      c:/foo/bar/quux
Add%%     file-two.txt%   2.5.2%    c:/foo/bar/quux
Remove%%  file-three.txt% 3.4%      c:/bar/quux
Update%%  file-four.txt%  4.6.5.3%  c:/zzz

... many more records of the above form

The records I'm interested in are the lines beginning with "Update", "Add", "Remove", and so on. I won't know what the lines begin with ahead of time, or how many lines precede them. I do know that they always begin with a string of letters followed by two tabs. So I wrote this regex:

generate-report-for 1234:56 | egrep "^[[:alpha:]]+\t\t.+"

But this matches zero lines. Where did I go wrong?

Edit: I get the same results whether I use '...' or "..." for the egrep expression, so I'm not sure it's a shell thing.

A: 

It looks like the shell is parsing "\t\t" before it is sent to egrep. Try "\\t\\t" or '\t\t' instead. That is 2 slashes in double quotes and one in single quotes.

drawnonward
I get the same (blank) results either way.
Kevin Stargel
Perhaps \\t\\t?
Chris T
Some shells can pass literal tabs with $'\t' but double up all the other backslashes. If \t is not recognized by your grep that may help.
drawnonward
+3  A: 

Apparently \t isn't a special character for egrep. You can either use grep -P to enable Perl-compatible regex engine, or insert literal tabs with CtrlvCtrli

Even better, you could use the excellent ack

kemp
running the output through `cat -T` is also a nice way to display tabs without having to replace them manually, and would suggest this solution
dsolimano
A: 

The file might not be exactly what you see. Maybe there are control characters hidden. It happens, sometimes. My suggestion is that you debug this. First, reduce to the minimum regex pattern that matches, and then keep adding stuff one by one, until you find the problem:

egrep "[[:alpha:]]" 
egrep "[[:alpha:]]+" 
egrep "[[:alpha:]]+\t" 
egrep "[[:alpha:]]+\t\t" 
egrep "[[:alpha:]]+\t\t.+" 
egrep "^[[:alpha:]]+\t\t.+" 

There are variations on that sequence, depending on what you find out at each step. Also, the first step can really be skipped, but this is just for the sake of showing the technique.

Daniel
just FYI, egrep is deprecated. `grep -E` is preferred
ghostdog74
@ghostdog74 Tell that to Kevin, I have nothing to do with it. Didn't see any deprecation notice on FreeBSD's grep manpage, however.
Daniel
A: 

you can use awk

awk '/^[[:alpha:]]\t\t/' file
ghostdog74