tags:

views:

46

answers:

2

I need to replace several URLs in a text file with some content dependent on the URL itself. Let's say for simplicity it's the first line of the document at the URL.

What I'm trying is this:

sed "s/^URL=\(.*\)/TITLE=$(curl -s \1 | head -n 1)/" file.txt

This doesn't work, since \1 is not set. However, the shell is getting called. Can I somehow push the sed match variables to that subprocess?

A: 

So you are trying to call an external command from inside the replacement pattern of a sed substitution. I dont' think it can be done, the $... inside a pattern just allows you to use an already existent (constant) shell variable.

I'd go with Perl, see the /e option in the search-replace operator (s/.../.../e).

UPDATE: I was wrong, sed plays nicely with the shell, and it allows you do to that. But, then, the backlash in \1 should be escaped. Try instead:

sed "s/^URL=\(.*\)/TITLE=$(curl -s \\1 | head -n 1)/" file.txt
leonbloy
It works, try it with $(date) inside the replacement pattern.
rassie
Ha, sed is more powerfull than I believed. Then, try escaping the backlash with \\1
leonbloy
Indeed, it works, even though I swear I tried it before :)
rassie
I would have expected that to evaluate the `$(...)` expression *first*, before invoking sed; so curl would try to download the URL "`\1`". And some of your other backslashes surely got eaten, since you were using double quotes.
Zack
+1  A: 

Try this:

sed "s/^URL=\(.*\)/\1/" file.txt | while read url; do sed "s@URL=\($url\)@TITLE=$(curl -s $url | head -n 1)@" file.txt; done

If there are duplicate URLs in the original file, then there will be n^2 of them in the output. The @ as a delimiter depends on the URLs not including that character.

Dennis Williamson
This, unlike the accepted answer, seems like it *ought* to work. Well, except for the backslashes inside the double quotes.
Zack