views:

732

answers:

5

I'm using sed -e "s/\*DIVIDER\*/$DIVIDER/g" to replace *DIVIDER* with a user-specified string, which is stored in $DIVIDER. The problem is that I want them to be able to specify escape characters as their divider, like \n or \t. When I try this, I just end up with the letter n or t, or so on.

Does anyone have any ideas on how to do this? It will be greatly appreciated!

EDIT: Here's the meat of the script, I must be missing something.

curl --silent "$URL" > tweets.txt

if [[ `cat tweets.txt` == *\<error\>* ]]; then
    grep -E '(error>)' tweets.txt | \
    sed -e 's/<error>//' -e 's/<\/error>//' |
    sed -e 's/<[^>]*>//g' |

head $headarg | sed G | fmt

else
    echo $REPLACE | awk '{gsub(".", "\\\\&");print}'
    grep -E '(description>)' tweets.txt | \
    sed -n '2,$p' | \
    sed -e 's/<description>//' -e 's/<\/description>//' |
    sed -e 's/<[^>]*>//g' |
    sed -e 's/\&amp\;/\&/g' |
    sed -e 's/\&lt\;/\</g' |
    sed -e 's/\&gt\;/\>/g' |
    sed -e 's/\&quot\;/\"/g' |
    sed -e 's/\&....\;/\?/g' |
    sed -e 's/\&.....\;/\?/g' |
    sed -e 's/^  *//g' |
    sed -e :a -e '$!N;s/\n/\*DIVIDER\*/;ta' |   # Replace newlines with *divider*.
    sed -e "s/\*DIVIDER\*/${DIVIDER//\\/\\\\}/g" |          # Replace *DIVIDER* with the actual divider.

    head $headarg | sed G
fi

The long list of sed lines are replacing characters from an XML source, and the last two are the ones that are supposed to replace the newlines with the specified character. I know it seems redundant to replace a newline with another newline, but it was the easiest way I could come up with to let them pick their own divider. The divider replacement works great with normal characters.

+1  A: 

You just need to escape the escape char.

\n will match \n

\ will match \

\\ will match \

Daniel Ice
I just tried \\\n, and it did end up as \n, but it printed it literally. How do I make sed interpret it as an escape rather than a normal string?
SphereCat1
+3  A: 

You can use bash to escape the backslash like this:

sed -e "s/\*DIVIDER\*/${DIVIDER//\\/\\\\}/g"

The syntax is ${name/pattern/string}. If pattern begins with /, every occurence of pattern in name is replaced by string. Otherwise only the first occurence is replaced.

tangens
With this I just end up with a double backslash and an "n" inserted into the stream. Maybe it's the way I'm piping things. I'll edit the original post to include more of the script.
SphereCat1
tangens' solution deals with the backslashes, but not the \n or the \t. The problem is that sed doesn't recognize \n or \t. You can either put them in DIVIDER explicitly, or pipe the output through another filter to replace \n with a newline. eg: sed 's/\\n/\<ret>/g'
William Pursell
That's a good idea, I could pipe it back through tr on the way out. Thanks!
SphereCat1
A: 

Maybe:

case "$DIVIDER" in
(*\\*) DIVIDER=$(echo "$DIVIDER" | sed 's/\\/\\\\/g');;
esac

I played with this script:

for DIVIDER in 'xx\n' 'xxx\\ddd' "xxx"
do
    echo "In:  <<$DIVIDER>>"
    case "$DIVIDER" in     (*\\*) DIVIDER=$(echo "$DIVIDER" | sed 's/\\/\\\\/g');;
    esac
    echo "Out: <<$DIVIDER>>"
done

Run with 'ksh' or 'bash' (but not 'sh') on MacOS X:

In:  <<xx\n>>
Out: <<xx\\n>>
In:  <<xxx\\ddd>>
Out: <<xxx\\\\ddd>>
In:  <<xxx>>
Out: <<xxx>>
Jonathan Leffler
+1  A: 

It seems to be a simple substitution:

$ d='\n'
$ echo "a*DIVIDER*b" | sed "s/\*DIVIDER\*/$d/"
a
b

Maybe I don't understand what you're trying to accomplish.

Then maybe this step could take the place of the last two of yours:

sed -n ":a;$ {s/\n/$DIVIDER/g;p;b};N;ba"

Note the space after the dollar sign. It prevents the shell from interpreting "${s..." as a variable name.

And as ghostdog74 suggested, you have way too many calls to sed. You may be able to change a lot of the pipe characters to backslashes (line continuation) and delete "sed" from all but the first one (leave the "-e" everywhere). (untested)

Dennis Williamson
Thanks for the info about the unnecessary seds calls! I wrote this quite a while ago and didn't know much about sed or shell scripting in general.
SphereCat1
I've tried combining the calls both with continuation and all on one line, but some of the replacements stop working. It no longer removes the <description> tags or the extra space. Is there some magic order that they need to be in?
SphereCat1
Did you leave this in place? `sed -n '2,$p' | sed ...` You're selecting which lines to act on by doing this. If you take out the pipe here, it might not work. Without seeing data and your revised script I can't say for sure. You should be aware that this way madness lies (using regexes on HTML): http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
Dennis Williamson
A: 

Using FreeBSD sed (e.g. on Mac OS X) you have to preprocess the $DIVIDER user input:

d='\n'
d='\t'
NL=$'\\\n'
TAB=$'\\\t'
d="${d/\\n/${NL}}"
d="${d/\\t/${TAB}}"
echo "a*DIVIDER*b" | sed -E -e "s/\*DIVIDER\*/${d}/"
carlo