views:

161

answers:

4

I want to extract a substring matching a pattern and save it to a file. An example string:

Apr 12 19:24:17 PC_NMG kernel: sd 11:0:0:0: [sdf] Attached SCSI removable disk

I want to extract the part between the brackets, in this case [sdf].

I tried to do something like grep -e '[$subtext]' to save the text in the brackets to a variable. Of course it doesn't work, but I am looking for a way similar to this. It would be very elegant to include a variable in a regex like this. What can I do best?

Thanks!

+1  A: 

There's probably a better way using bash only, but:

echo 'Apr 12 19:24:17 PC_NMG kernel: sd 11:0:0:0: [sdf] Attached SCSI removable disk' \
| sed -s 's/.*\[\(.*\)\].*/\1/'

As Jurgen points out, this matches non-matching lines. If you don't want to output nonmatching lines, use '-n' so it doesn't output the pattern, and '/p' to outputs the pattern when it matches.

| sed -n 's/.*\[\(.*\)\].*/\1/p'
Stephen
This also prints non-matching lines
Jürgen Hötzel
@Jurgen Hotzel: Thanks, edited a fix.
Stephen
+1  A: 

Match against regex, replace using grouping and only print if regex matched:

sed -n "s/.*\[\(.*\)\].*/\1/p"
Jürgen Hötzel
A: 

sed is greedy, so the sed answers will miss out some of the data if there are more [] pairs in your data. Use the grep+tr solution or you can use awk

$ cat file
[sss]Apr 12 19:24:17 PC_NMG kernel: sd 11:0:0:0: [sdf] Attached SCSI removable disk [tag] blah blah

$ awk -F"[" '{for(i=2;i<=NF;i++){if($i~/\]/){sub("].*","",$i)};print $i}}' file
sss
sdf
tag
ghostdog74
+1  A: 

BASH_REMATCH is an array containing groups matched by the shell.

$ line='Apr 12 19:24:17 PC_NMG kernel: sd 11:0:0:0: [sdf] Attached SCSI removable disk'
$ [[ $line =~ \[([^]]+)\] ]]; echo "${BASH_REMATCH[1]}"
sdf

If you want to put this in a loop, you can do that; here's an example:

while read -r line; do
  if [[ $line =~ \[([^]]+)\] ]] ; then
    drive="${BASH_REMATCH[1]}"
    do_something_with "$drive"
  fi
done < <(dmesg | egrep '\[([hsv]d[^]]+)\]')

This approach puts no external calls into the loop -- so the shell doesn't need to fork and exec to start external programs such as sed or grep. As such, it is arguably significantly cleaner than other approaches offered here.

BTW, your initial approach (using grep) was not that far off; using grep -o will output only the matching substring:

$ subtext=$(egrep -o "\[[^]]*\]" <<<"$line")

...though this includes the brackets inside the capture, and thus is not 100% correct.

Charles Duffy
but then bash's while read loop is significantly slower for iterating a big file as compared to awk (etc). By the way, i get no output for your first version without the while loop. The `]` in your character ranges should not be escaped.
ghostdog74
@ghostdog - updated, thanks. I *do* get output even as-is, but that's bash 4. I agree that the read loop is slow -- filtering once on the input side is much, much better than filtering inside your loop, and you *have* to have a loop if you're going to match more than one line.
Charles Duffy