tags:

views:

67

answers:

3

Hi,

I have a huge text file with lots of lines like:

asdasdasdaasdasd_DATA_3424223423423423
gsgsdgsgs_DATA_6846343636

.....

I would like to do, for each line, to substitute from DATA_ .. to the end, with just empty space so I would get:

asdasdasdaasdasd_DATA_
gsgsdgsgs_DATA_

.....

I know that you can do something similar with:

sed -e "s/^DATA_*$/DATA_/g" filename.txt

but it does not work.

Do you know how?

Thanks

A: 

With regular expressions, * means the previous character, any number of times. To match any character, use .

So what you really want is .* which means any character, any number of times, like this:

sed 's/DATA_.*/DATA_/' filename.txt

Also, I removed the ^ which means start of line, since you want to match "DATA_" even if it's not in the beginning of a line.

Martin
This actually won't work with the sample data - `DATA_` is not at the beginning of the line.
Jefromi
@Jefromi: saw that just after posting - not enough coffee yet! :)
Martin
great, this works now ! just a las t question, how can one do the oppositve, i mean, instead of after DATA_ to the end, just remove everything before DATA_ ?
Werner
This still has the unnecessary `g` as I explain in my answer, and the `$` is not necessary since `.*` will always match until the end of line anyway.
Jefromi
@Werner: if you understand how `s/DATA_.*/DATA_/` works, you should be able to answer your next question too. Can't always come ask SO to write every regex for you.
Jefromi
+2  A: 

You have two problems: you're unnecessarily matching beginning and end of line with ^ and $, and you're looking for _* (zero or more underscores) instead of .* (zero or more of any character. Here's what you want:

sed -e 's/_DATA_.*/_DATA_/'

The g on the end (global) won't do anything, because you're already going to remove everything from the first instance of "DATA" onward - there can't be another match.

P.S. The -e isn't strictly necessary if you only have one expression, but if you think you might tack more on, it's a convenient habit.

Jefromi
A: 

using awk. Set field delimiter as "DATA", then get field 1 ($1). No need regular expression

$ awk -F"_DATA_" '{print $1"_DATA_"}' file
asdasdasdaasdasd_DATA_
gsgsdgsgs_DATA_
ghostdog74