tags:

views:

355

answers:

5

Is there any way to tell sed to output only captured groups? for example given by input:

This is a sample 123 text and some 987 numbers

and pattern

/([\d]+)/

I could get only 123 and 987 output in the way formatted by back references perhaps?

+1  A: 

Sed has up to nine remembered patterns but you need to use escaped parentheses to remember portions of the regular expression.

See here for examples and more detail

Peter McGrattan
`sed -e 's/version=\(.+\)/\1/' input.txt` this will still output the whole input.txt
Michael
+1  A: 

Try

sed -n -e "/[0-9]/s/^[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\).*$/\1 \2 \3 \4 \5 \6 \7 \8 \9/p"

I got this under cygwin:

$ (echo "asdf"; \
   echo "1234"; \
   echo "asdf1234adsf1234asdf"; \
   echo "1m2m3m4m5m6m7m8m9m0m1m2m3m4m5m6m7m8m9") | \
  sed -n -e "/[0-9]/s/^[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]*\).*$/\1 \2 \3 \4 \5 \6 \7 \8 \9/p"

1234
1234 1234
1 2 3 4 5 6 7 8 9
$
Bert F
+1  A: 

you can use grep

grep -Eow "[0-9]+" file
ghostdog74
@ghostdog74: Absolutely agree with you. How can I get greo to output only captured groups?
Michael
@Michael - that's why the `o` option is there - http://unixhelp.ed.ac.uk/CGI/man-cgi?grep : -o, --only-matching Show only the part of a matching line that matches PATTERN
Bert F
@Bert F: I understand the matching part, but it's not capturing group. What I want is to have like this ([0-9]+).+([abc]{2,3}) so there are 2 capturing groups. I want to output ONLY capturing groups by backreferences or somehow else.
Michael
+1  A: 

The key to getting this to work is to tell sed to exclude what you don't want to be output as well as specifying what you do want.

string='This is a sample 123 text and some 987 numbers'
echo "$string" | sed -rn 's/[^[:digit:]]*([[:digit:]]+)[^[:digit:]]+([[:digit:]]+)[^[:digit:]]*/\1 \2/p'

This says:

  • don't default to printing each line (-n)
  • exclude zero or more non-digits
  • include one or more digits
  • exclude one or more non-digits
  • include one or more digits
  • exclude zero or more non-digits
  • print the substitution (p)
Dennis Williamson
+1  A: 

I believe the pattern given in the question was by way of example only, and the goal was to match any pattern.

If you have a sed with the GNU extension allowing insertion of a newline in the pattern space, one suggestion is:

> set string = "This is a sample 123 text and some 987 numbers"
>
> set pattern = "[0-9][0-9]*"
> echo $string | sed "s/$pattern/\n&\n/g" | sed -n "/$pattern/p"
123
987
> set pattern = "[a-z][a-z]*"
> echo $string | sed "s/$pattern/\n&\n/g" | sed -n "/$pattern/p"
his
is
a
sample
text
and
some
numbers

These examples are with tcsh (yes, I know its the wrong shell) with CYGWIN. (Edit: For bash, remove set, and the spaces around =.)

Joseph Quinsey
@Joseph: thanks, however, based on my task I feel like grep is more natural, like ghostdog74 suggested. Just need to figure out how to make grep output the capture groups only, not the whole match.
Michael