tags:

views:

309

answers:

3

Here's a small example (download, rename to .php and execute it in your shell):

test.txt

Why does preg_replace return NULL instead of the original string?

\x{2192} is the same as HTML "→" ("→").

+1  A: 

From the documentation on preg_replace():

Return Values

preg_replace() returns an array if the subject parameter is an array, or a string otherwise.

If matches are found, the new subject will be returned, otherwise subject will be returned unchanged or NULL if an error occurred.

In your pattern, I don't think the u flag is supported. WRONG

Edit: It seems like some kind of encoding issue with the subject. When I erase '147 3.2 V6 - GTA (184 kW)' and manually re-type it everything seems to work.

Edit 2: In the pattern you provided, there are 3 spaces that seem to be giving issues to the regex engine. When I convert them to decimal their value is 160 (as opposed to normal space 32). When I replace those spaces with normal ones it seems to work.

I've replaced the offending spaces with underscores below:

'147 3.2 V6 - GTA (184 kW)'
'147 3.2_V6 - GTA_(184_kW)'
Mike B
"In your pattern, I don't think the u flag is supported."Please elaborate. How can u be not supported? u simply indicates that the pattern should be treated as a UTF-8 string.
Ree
@Ree Yeah I had a full-retard moment. Disregard that.
Mike B
Yes, it seems PHP has problems reading the string. It was read from an Excel file using a PEAR library...
Ree
A: 

I believe there is also a fault in your Regex expression: ~\x{2192}~u

Try replacing what I have and see if that works out for you: /\x{2192}/u

Jordan S. Jones
You are wrong. The expression is fine.
Ree
While the expression is fine, \x{2192} is meaningless in PHP, it only accepts 0-255 hex codes
razzed
@razzed: You are wrong. Please read my comment for your answer.
Ree
A: 
  • You are using single quotes, which means the only thing that you can escape is other single quotes. To enable escape sequences (e.g. \x32, then use double quotes "")
  • I am not a UTF8 expert, but the escape code \x2192 is not correct either. You can do: \x21\x92 to get both bytes into your string, but you may want to look at utf8_encode and utf8_decode
  • Your source string has invalid characters in it, or something. PHP gives: Warning: preg_replace(): Compilation failed: invalid UTF-8 string at offset 0 in test.php on line 7
razzed
Your first point is wrong - quote style have no effect here. Try this: "echo preg_replace('~\x{20AC}~u', 'EUR', '€1000');". You'll get the euro symbol replaced by EUR. Your second point is wrong, too. From php documentation: "In UTF-8 mode, "\x{...}" is allowed, where the contents of the braces is a string of hexadecimal digits. It is interpreted as a UTF-8 character whose code number is the given hexadecimal number.". Your third point - I do not get this warning on my machine, but yeah, it seems the string is malformed in PHP's view. The string was read from an Excel file using some lib...
Ree