ansaurus

Question

sed beginner question: capturing groups in sed

Answer 1

+1 A:

sed is outputting its input because the substitution isn't matching. Since you're probably using GNU sed, try this:

echo "ko05414     ko:ITGA4" | sed 's/\(^ko[0-9]\{5\}\)\tko:\(.*$\)/\1\2/'

\d -> [0-9] since GNU sed doesn't recognize \d
{} -> \{\} since GNU sed by default uses basic regular expressions.

ninjalj 2010-07-21 18:37:13

this still gives me the same error. I'm in OSX - not sure how to find out if I'm using GNU sed...

Mike Dewar 2010-07-21 18:40:21

@Mike Dewar -- ooh, that's important information... i think OS X uses a BSD-like sed, whereas it's a common assumption here that folks use GNU sed

Dan LaRocque 2010-07-21 18:46:34

@Mike Dewar -- so, i've only got 10.5.8 here, but the problem was `\t`... GNU sed understands this as tab, but I had to insert a literal tab using the shell `Ctrl-V` `Tab`

Dan LaRocque 2010-07-21 18:55:40

that's important to know! Thanks so much!

Mike Dewar 2010-07-21 19:42:33

Answer 2

+1 A:

This should do it. You can also skip the last group and simply use, \1 instead, but since you're learning sed and regex this is good stuff. I wanted to use a non-capturing group in the middle (:? ) but I could not get that to play with sed for whatever reason, perhaps it's not supported.

sed --posix 's/\(^ko[0-9]\{5\}\)\( ko:\)\(.*$\)/\1 \3/g' file > result

And ofcourse you can use

sed --posix 's/ko://'

Anders 2010-07-21 19:01:33

Thanks so much for this! I've upvoted your answer because you've totally nailed this, and the 's/ko://' is great (though what's that backtick doing?). I'm giving the tick to ninjalj cos his answer + comments has explained what I was doing wrong. But I'm definitely sticking with 's/ko://' or maybe even the string replace by getekha! I'll see which is faster...

Mike Dewar 2010-07-21 19:47:52

My bad, leftover from a variable. Yeah I would would give it to him also, he actually bothered explaining.

Anders 2010-07-21 19:57:46

Answer 3

+2 A:

You don't need sed for this

Here is how you can do it with bash:

var="ko05414 ko:ITGA4"
echo ${var//"ko:"}

${var//"ko:"} replaces all "ko:" with ""

See Manipulating Strings for more info

getekha 2010-07-21 19:03:11

Read the comments, he said he is learning sed.

Anders 2010-07-21 19:05:44

while I /am/ learning sed, this approach strikes me as brilliant and simple. I had no idea about this syntax. All this command line fu is awesome.

Mike Dewar 2010-07-21 19:41:28

My mistake, I apologize to getekha.

Anders 2010-07-21 19:45:25

Answer 4

A:

@OP, if you just want to get rid of "ko:", then

$ cat file
ko04062 ko:CXCR3
ko04062 ko:CX3CR1
ko04062 ko:CCL3
ko04062 ko:CCL5
some text with a legit ko: this ko: cannot be deleted.
ko04080 ko:GZMA

$ awk '{sub("ko:","",$2)}1' file
ko04062 CXCR3
ko04062 CX3CR1
ko04062 CCL3
ko04062 CCL5
some text with a legit ko: this ko: cannot be deleted.
ko04080 GZMA

Jsut a note. While you can use pure bash string substitution, its only more efficient when you are changing a single string. If you have a file, especially a big file, using bash's while read loop is still slower than using sed or awk.

ghostdog74 2010-07-22 01:24:28

ansaurus

tags:

views:

answers:

sed beginner question: capturing groups in sed

related questions