views:

191

answers:

3

Hello!

I have a string like "Welcome to McDonalds®: I'm loving it™"... I want to get rid of ":", "'", ® and ™ symbols, so I do the following:

$string = "Welcome to McDonalds®: I'm loving it™";
$string = preg_replace('/[^a-zA-Z0-9 -]/', '', $string); 

BUT on the output I receive:

"Welcome to McDonaldsreg Im loving ittrade"... so preg_replace somehow converts ® to 'reg' and ™ to 'trade', which is not good for me and I can't understand, why is such conversion happens at all. How do I get rid of this conversion?

SOLVED: Thanks for ideas, guys. I solved the problem:

$string = preg_replace(array('/[^a-zA-Z0-9 -]/', '/&[^\s]*;/'), '', preg_replace(array('/&[^\s]*;/'), '', htmlentities($string)));
+8  A: 

You're probably having the special characters in entity form, i.e. ® is really ® in your string. So it's not seen by the replacement operation.

To fix this, you could filter for the &SOMETHING; substring, and remove them. There might be built-in methods to do this, perhaps html_entity_decode.

unwind
Thanks for idea
Andersson83
A: 
Jonathan Sampson
What about numerical character references like `©` or `®`.
Gumbo
Sorry, Gumbo. I was addressing the specific need, but you're right. I've updated to include other non-alpha variants.
Jonathan Sampson
"I want to get rid of ":", "'", ® and ™ symbols" - I don't think he wants to remove all HTML entities.
Alix Axel
Alix Axel, the OP gave an example using ® and ™. Note the OP's "solution" includes `` That is practically the same thing I provided above.
Jonathan Sampson
+3  A: 

If you are looking to replace only the mentioned characters, use

$cleaned = str_replace(array('®','™','®','™', ":", "'"), '', $string);

Regular string replacement methods are usually faster and there is nothing in your example you want to replace that would need the pattern matching power of the Regular Expression engine.

EDIT due to comments: If you need to replace character patterns (as indicated by the solution you gave yourself), a Regex is indeed more appropriate and practical.

In addition, I'm sure McD requires both symbols to be in place if that slogan is used on any public website

Gordon
what's a problem with regex? As for McDonalds, this string is from my imagination ))) nothing related to the real life.
Andersson83
Why did this get a downvote? The PHP documentation in fact states "If you don't need fancy replacing rules (like regular expressions), you should always use this function [str_replace] instead of ereg_replace() or preg_replace()."
Roman Stolper
Thanks Roman. That's exactly what I am refering to :)
Gordon
I downvoted because it's silly to use an array for every variant of values rather than a regular expression in this case. Do you really want an array of every possible `` variant? Do you know how large that would be? Please don't take offense.
Jonathan Sampson
@Gordon: I didn't down voted you but when you `str_replace()` **with an array** it's usually slower than using `preg_replace()`.
Alix Axel
@Jonathan the OP asked for `want to get rid of ":", "'", ® and ™ symbols`. Not the entire range of HTML entities.
Gordon
Gordon, the OP gave us an example. This should be evident by the OP's "solution," `` That removes all variants of ``
Jonathan Sampson
@Alix I cannot confirm that. A quick benchmark I just did suggests different.
Gordon
@Jonathan the solution wasn't there yet when I wrote the answer and it wasn't there when you downvoted me either ;) But it's okay. If the OP really wanted everything removed, then, of course, a Regex is appropriate.
Gordon
@Gordon, I gathered that the McD's string was an *example*, and as such the specifics of it aren't the issue. If the user only expected to remove ™ and ®, then I wouldn't have downvoted you, but I felt it was evident that it was a general example, and not an exclusive issue. No hard feelings though :)
Jonathan Sampson
@Jonathan To me it wasn't obvious. I've changed some bits in my answer to reflect this now. No hard feelings.
Gordon