tags:

views:

38

answers:

5

Hi,

I have a huge text file with lots of lines like:

a 23232 23232 545 3434 DATA4545454_1 454 4646466 3434 3567
a 23232 23267632 545 3436764 DATA454545567564__1 454 464675466 3434 3
a 232676732 232676732 545 3434 DATA4545454_1 454 46457566466 3457534 35675

In all of them I would like to get rid of everything which is after DATA*, so I get:

a 23232 23232 545 3434 DATA4545454_1 
a 23232 23267632 545 3436764 DATA454545567564__1 
a 232676732 232676732 545 3434 DATA4545454_1 

I know it can be done with sed and tried different combinations but I do not get the result, do you know how?

Thanks

+2  A: 
sed 's/\(DATA[^ ]*\).*/\1/'
Ignacio Vazquez-Abrams
A: 

The regex which matches is

^(.+ DATA[0-9_]+).*$

which should be replaced with $1.

Update

That should be: replaced with \1.

Loadmaster
Note: this is a perl answer, not a sed answer. (Sed can use extended regex, which would make it match, but it uses `\1`, not `$1`, for the captured group.)
Jefromi
A: 

all your "DATA" appears at column 6. If its like that throughout, then simply

$ cut -d" " -f1-6 file
a 23232 23232 545 3434 DATA4545454_1
a 23232 23267632 545 3436764 DATA454545567564__1
a 232676732 232676732 545 3434 DATA4545454_1

or grep

$ grep -Eo ".*DATA.[^ ]* " file
a 23232 23232 545 3434 DATA4545454_1
a 23232 23267632 545 3436764 DATA454545567564__1
a 232676732 232676732 545 3434 DATA4545454_1
ghostdog74
for some reason grep is not working with -o option.`grep -Eo ".*DATA.[^ ]* " test_oqgrep: illegal option -- o`
Vijay Sarathi
do you have GNU grep?
ghostdog74
A: 
sed -r 's/(.*_1)(.*)/\1/'  file 
muruga
A: 
sed 's/\(.* DATA.*_1\)\(.*\)/\1/'  file