tags:

views:

185

answers:

7

I'm trying to escape a user-provided search string that can contain any arbitrary character and give it to sed, but can't figure out how to make it safe for sed to use. In sed, we do s/search/replace/, and I want to search for exactly the characters in the search string without sed interpreting them (e.g., the '/' in 'my/path' would not close the sed expression).

I read this related question concerning how to escape the replace term. I would have thought you'd do the same thing to the search, but apparently not because sed complains.

Here's a sample program that creates a file called "my_searches". Then it reads each line of that file and performs a search and replace using sed.

#!/bin/bash

# The contents of this heredoc will be the lines of our file.
read -d '' SAMPLES << 'EOF'
/usr/include
P@$$W0RD$?
"I didn't", said Jane O'Brien.
`ls -l`
~!@#$%^&*()_+-=:'}{[]/.,`"\|
EOF
echo "$SAMPLES" > my_searches

# Now for each line in the file, do some search and replace
while read line
do
        echo "------===[ BEGIN $line ]===------"

        # Escape every character in $line (e.g., ab/c becomes \a\b\/\c).  I got
        # this solution from the accepted answer in the linked SO question.
        ES=$(echo "$line" | awk '{gsub(".", "\\\\&");print}')

        # Search for the line we read from the file and replace it with
        # the text "replaced"
        sed 's/'"$ES"'/replaced/' < my_searches     # Does not work

        # Search for the text "Jane" and replace it with the line we read.
        sed 's/Jane/'"$ES"'/' < my_searches         # Works

        # Search for the line we read and replace it with itself.
        sed 's/'"$ES"'/'"$ES"'/' < my_searches      # Does not work

        echo "------===[ END ]===------"
        echo
done < my_searches

When you run the program, you get sed: xregcomp: Invalid content of \{\} for the last line of the file when it's used as the 'search' term, but not the 'replace' term. I've marked the lines that give this error with # Does not work above.

------===[ BEGIN ~!@#$%^&*()_+-=:'}{[]/.,`"| ]===------
sed: xregcomp: Invalid content of \{\}
------===[ END ]===------

If you don't escape the characters in $line (i.e., sed 's/'"$line"'/replaced/' < my_searches), you get this error instead because sed tries to interpret various characters:

------===[ BEGIN ~!@#$%^&*()_+-=:'}{[]/.,`"| ]===------
sed: bad format in substitution expression
sed: No previous regexp.
------===[ END ]===------

So how do I escape the search term for sed so that the user can provide any arbitrary text to search for? Or more precisely, what can I replace the ES= line in my code with so that the sed command works for arbitrary text from a file?

I'm using sed because I'm limited to a subset of utilities included in busybox. Although I can use another method (like a C program), it'd be nice to know for sure whether or not there's a solution to this problem.

A: 

this : echo "$line" | awk '{gsub(".", "\\\\&");print}' escapes every character in $line, which is wrong!. do an echo $ES after that and $ES appears to be \/\u\s\r\/\i\n\c\l\u\d\e. Then when you pass to the next sed, (below)

sed 's/'"$ES"'/replaced/' my_searches

, it will not work because there is no line that has pattern \/\u\s\r\/\i\n\c\l\u\d\e. The correct way is something like:

$ sed 's|\([@$#^&*!~+-={}/]\)|\\\1|g' file
\/usr\/include
P\@\$\$W0RD\$?
"I didn't", said Jane O'Brien.
\`ls -l\`
\~\!\@\#\$%\^\&\*()_\+-\=:'\}\{[]\/.,\`"\|

you put all the characters you want escaped inside [], and choose a suitable delimiter for sed that is not in your character class, eg i chose "|". Then use the "g" (global) flag.

tell us what you are actually trying to do, ie an actual problem you are trying to solve.

ghostdog74
This is the actual problem I'm trying to solve. I'm reading a line out of a file that contains a string the user entered, and replacing it with another string, also containing user-entered data. I'm using bash and sed because I have a limited set of utilities (busybox). I'm trying to allow the user to enter any possible character and still have it work in the sed expression.
indiv
A: 

As ghostdog mentioned, awk '{gsub(".", "\\\\&");print}' is incorrect because it escapes out non-special characters. What you really want to do is perhaps something like:

awk 'gsub(/[^[:alpha:]]/, "\\\\&")'

This will escape out non-alpha characters. For some reason I have yet to determine, I still cant replace "I didn't", said Jane O'Brien. even though my code above correctly escapes it to

\"I\ didn\'t\"\,\ said\ Jane\ O\'Brien\.

It's quite odd because this works perfectly fine

$ echo "\"I didn't\", said Jane O'Brien." | sed s/\"I\ didn\'t\"\,\ said\ Jane\ O\'Brien\./replaced/
replaced`
SiegeX
+1  A: 
Norman Ramsey
A: 

This seems to work for FreeBSD sed:

# using FreeBSD & Mac OS X sed
ES="$(printf "%q" "${line}")"
ES="${ES//+/\\+}"
sed -E s$'\777'"${ES}"$'\777'replaced$'\777' < my_searches
sed -E s$'\777'Jane$'\777'"${line}"$'\777' < my_searches
sed -E s$'\777'"${ES}"$'\777'"${line}"$'\777' < my_searches
seddie
What's -E? I don't have it in my sed, nor do I see it in the gnu documentation: http://www.gnu.org/software/sed/manual/sed.html .. lowercase -e doesn't work with the same "xregcomp: Invalid content of \{\}". It would seem that \{ and \} are somehow meaningful to sed, but I haven't researched how.
indiv
A: 

The -E option of FreeBSD sed is used to turn on extended regular expressions.

The same is available for GNU sed via the -r or --regexp-extended options respectively.

For the differences between basic and extended regular expressions see, for example:

http://www.gnu.org/software/sed/manual/sed.html#Extended-regexps

Maybe you can use FreeBSD-compatible minised instead of GNU sed?

# example using FreeBSD-compatible minised, 
# http://www.exactcode.de/site/open_source/minised/

# escape some punctuation characters with printf
help printf
printf "%s\n" '!"#$%&'"'"'()*+,-./:;<=>?@[\]^_`{|}~'
printf "%q\n" '!"#$%&'"'"'()*+,-./:;<=>?@[\]^_`{|}~'

# example line
line='!"#$%&'"'"'()*+,-./:;<=>?@[\]^_`{|}~  ...  and Jane ...'

# escapes in regular expression
ES="$(printf "%q" "${line}")"        # escape some punctuation characters
ES="${ES//./\\.}"                    # . -> \.
ES="${ES//\\\\(/(}"                  # \( -> (
ES="${ES//\\\\)/)}"                  # \) -> )

# escapes in replacement string
lineEscaped="${line//&/\&}"          # & -> \&   

minised s$'\777'"${ES}"$'\777'REPLACED$'\777' <<< "${line}"
minised s$'\777'Jane$'\777'"${lineEscaped}"$'\777' <<< "${line}"
minised s$'\777'"${ES}"$'\777'"${lineEscaped}"$'\777' <<< "${line}"
seddie
A: 

To avoid potential backslash confusion, we could (or rather should) use a backslash variable like so:

backSlash='\\'
ES="${ES//${backSlash}(/(}"    # \( -> (              
ES="${ES//${backSlash})/)}"    # \) -> )

(By the way using variables in such a way seems like a good approach for tackling parameter expansion issues ...)

seddie
A: 

... or to complete the backslash confusion ...

backSlash='\\'
lineEscaped="${line//${backSlash}/${backSlash}}"   # double backslashes
lineEscaped="${lineEscaped//&/\&}"                 # & -> \&
userror