tags:

views:

97

answers:

5

I have a regex and replacement pattern that have both been tested in Notepad++ on my input data and work correctly. When I put them into a sed expression, however, nothing gets matched.

Here is the sed command:

 # SEARCH = ([a-zA-Z0-9.]+) [0-9] (.*)
 # REPLACE = \2 (\1)

 sed -e 's/\([a-zA-Z0-9.]+\) [0-9] \(.*\)/\2 \(\1\)/g'

Here is a sampling of the data:

jdoe 1 Doe, John
jad 1 Doe, Jane
smith 2 Smith, Jon

and the desired output:

Doe, John  (jdoe)
Doe, Jane  (jad)
Smith, Jon (smith)

I have tried removing and adding escapes to different characters in the sed expression, but either get nothing matched or something along the lines of:

sed: -e expression #1, char 42: invalid reference \2 on `s' command's RHS

How can I get this escaped correctly?

+3  A: 

I usually find it easier to use the -r switch as this means that escaping is similar to that of most other languages:

sed -r 's/([a-zA-Z0-9.]+) [0-9] (.*)/\2 (\1)/g' file1.txt
Mark Byers
That worked beautifully. Thanks.
Chris Lieb
+2  A: 

The plus sign needs to be escaped when not using the -r switch.

Dennis Williamson
+2  A: 

A few warnings and additions to what everyone else has already said:

  1. The -r option is a GNU extension to enable extended regular expressions. BSD derived sed's use -E instead.
  2. Sed and Grep use Basic Regular Expressions
  3. Awk uses Extended Regular Expressions
  4. You should become comfortable with the POSIX specifications such as IEEE Std 1003.1 if you want to write portable scripts, makefiles, etc.

I would recommend rewriting the expression as

's/\([a-zA-Z0-9.]\{1,\}\) [0-9] \(.*\)/\2 (\1)/g'

which should do exactly what you want in any POSIX compliant sed. If you do indeed care about such things, consider defining the POSIXLY_CORRECT environment variable.

D.Shawley
+1  A: 
$ sed -e 's/\([a-zA-Z0-9.].*\) [0-9] \(.*\)/\2 \(\1\)/g' file
Doe, John (jdoe)
Doe, Jane (jad)
Smith, Jon (smith)
ghostdog74
A: 

Using awk is much simpler...:

cat test.txt | awk '{ print $3 " " $4 " " "("$1")" }'

Output:

Doe, John (jdoe)
Doe, Jane (jad)
Smith, Jon (smith)

See man awk 1

fwa