tags:

views:

371

answers:

6

First of all sorry for my bad english. I'm a german guy.

The code given below is working fine in PHP:

$string = preg_replace('/href="(.*?)(\.|\,)"/i','href="$1"',$string);

Now T need the same for sed. I thought it should be:

sed 's/href="(.*?)(\.|\,)"/href="{$\1}"/g' test.htm

But that gives me this error:

sed: -e expression #1, char 36: invalid reference \1 on `s' command's RHS

+2  A: 

You need a backslash in front of the parentheses you want to reference, thus

sed 's/href="\(.*?\)(.|\,)"/href="{$\1}"/g' test.htm
doesn't work :( should replace . and , at the end of an url
Seblon
you didn't say what you want to do, just that the regexp failed :)
A: 

You have to escape the block selector characters ( and ) as follows.

sed 's/href="\(.*?\)\(.|\,\)"/href="{$\1}"/g' test.htm
Didier Trosset
+2  A: 

sed does not support non-greedy regex match.

Dyno Fu
Please elaborate on this matter.
Adam Matan
\(.*?\) <--- this is greedy match.(with the question mark "?" )
ghostdog74
So if sed does not support non-greedy match, it should support greedy match - What am I missing?
Adam Matan
@Adam: OP is relying on non-greedy match for RE to work. The RE will most likely end up consuming characters past the end of the href attribute.
outis
http://perldoc.perl.org/perlre.html#Regular-Expressions check the "Quantifiers" subsection.
Dyno Fu
A: 

If you want to match a literal ".", you need to escape it or use it in a character class. As an alternative to slashing the capturing parentheses (which you need to do with basic REs), you can use the -E option to tell sed to use extended REs. Lastly, the REs used by sed use \N to refer to subpatterns, where N is a digit.

sed -E "s/href=([\"'])([^\"']*)[.,]\1/href=\1\2\1/i"

This has its own issue that will prevent matches of href attributes that use both types of quotes.

man sed and man re_format will give more information on REs as used in sed.

outis
+1  A: 
sed -e 's|href=\"\(.[^"][^>]*\)\([.,]\)\">|href="\1">|g' file
ghostdog74
thats it. thank you
Seblon
A: 

here is a solution, it is not prefect, only deal with the situation of one extra "," or "."


sed -r -e 's/href="([^"]*)([.,]+)"/href="\1"/g' test.htm
Dyno Fu