tags:

views:

161

answers:

4

When testing an answer for another user's question I found something I don't understand. The problem was to replace all literal \t \n \r characters from a string with a single space.

Now, the first pattern I tried was:

/(?:\\[trn])+/

which surprisingly didn't work. I tried the same pattern in Perl and it worked fine. After some trial and error I found that PHP wants 3 or 4 backslashes for that pattern to match, as in:

/(?:\\\\[trn])+/

or

/(?:\\\[trn])+/

these patterns - to my surprise - both work. Why are these extra backslashes necessary?

+3  A: 

You need 4 backslashes to represent 1 in regex because:

  • 2 backslashes are used for unescaping in a string ("\\\\" -> \\)
  • 1 backslash is used for unescaping in the regex engine (\\ -> \)

From the PHP doc,

escaping any other character will result in the backslash being printed too1

Hence for \\\[,

  • 1 backslash is used for unescaping the \, one stay because \[ is invalid ("\\\[" -> \\[)
  • 1 backslash is used for unescaping in the regex engine (\\[ -> \[)

Yes it works, but not a good practice.

KennyTM
+1  A: 

The regular expression is just /(?:\\[trn])+/. But since you need to escape the backslashes in string declarations as well, each backslash must be expressed with \\:

"/(?:\\\\[trn])+/"
'/(?:\\\\[trn])+/'

Just three backspaces do also work because PHP doesn’t know the escape sequence \[ and ignores it. So \\ will become \ but \[ will stay \[.

Gumbo
Then why do 3 backslashes work? And why aren't single quotes different from double quotes in this case?
kemp
@kemp: Updated my answer.
Gumbo
Gumbo:: just so I know if I understood correctly -- this case works because `\[` isn't a control character **and** it does not become a literal open square bracket because the pattern is parsed left to right so the backslash gets attached to the one preceding it and previously escaped?
kemp
@kemp: Yes, only the escape sequences listed in the manual are replaced.
Gumbo
+4  A: 

Its works in perl because you pass that directly as regex pattern /(?:\\[trn])+/

but in php, you need to pass as string, so need extra escaping for backslash itself.

"/(?:\\\\[trn])+/"

The regex \ to match a single backslash would become '/\\\\/' as a PHP preg string

S.Mark
A: 

Use str_replace!

$code = str_replace(array("\t","\n","\r"),'',$code);

Should do the trick

AntonioCS
This doesn't answer my question, and is also wrong because str_replace() doesn't allow substitution of all the requested characters (however many they are) with a single quote -- you can just remove them all.
kemp
@kemp yes it does. If it doesn't remove as it is try combinations of \r\n or \n\r
AntonioCS
No, you can't substitute - say - three (or any arbitrary number) of those with a single whitespace, unless you want to provide **every** possible combination. What your code does is just removing them all.
kemp