tags:

views:

23

answers:

3

Dear all,

I would like to extract some lines from a text file, I have started to tweak sed lately,

I have a file with the structure

88 3 3 0 0 1 101 111 4 3
89 3 3 0 0 1 3 4 112 102
90 3 3 0 0 1 102 112 113 103
91 3 3 0 0 2 103 113 114 104

What I would like to do is to extract the information according to the second column, I use sth like in my bash script(argument 2 is infile)

sed  -n '/^[0-9]* [23456789]/ p' < $2 > out

however I have different entries other than the range [23456789], for instance 10, since it is composed of 1 and 0, to get that these two characters should be in the range I guess, however there are entries with '1'(for the second column) that I do not like to keep so how can write '10's but not '1's.

Best, Umut

A: 

sed -rn '/^[0-9]* (2|3|4|5|6|7|8|9|10)/p' < $2 > out

mhitza
You either need `-r` or to escape all those pipe characters and parentheses.
Dennis Williamson
A: 
sed  -rn '/^[0-9]* ([23456789]|10)/ p'  < $2 > out

You need the extend-regexp support (-r) to have the | operator (or)

Another interesting way is:

sed  -rn '/^[0-9]* ([23456789]|[0-9]{2,})/ p'  < $2 > out

Which means [23456789] or 2 or more repetition of a digit.

Enrico Carlesso
yep this is the trick, thx Enrico
Umut Tabak
You can do it without `-r`, you just have to escape the parentheses and the pipe character. `\([23456789]\|10\)`
Dennis Williamson
A: 

The instant you see variable-sized columns in your data, you should start thinking about awk:

awk '$2 > 1 && $2 < 11 {print}{}'

will do the trick assuming your file format is correct.

paxdiablo