views:

165

answers:

4

Hey All,

I'm sure someone has covered this before, but I didn't find it in a quick search of the site. Right now I'm trying to filter some input from a WYSIWYG, so that it will remove characters like: ¢©÷µ·¶±€£®§™¥ but keep HTML characters. I've tried htmlentities and htmlspecialcharacters, but that still seems to leave those characters in tact. Any methods already present, or anybody have a good regex that would handle this? Thanks!

A: 
Billy
Any way that I wouldn't have to specify each character I want removed, instead just replacing what's not a regular character or html tag/entity?
David Savage
I would use htmlspecialchars() or strip_tags() first.If you want to replace the bad characters with nothing just use the first regex.
Billy
regex for this? are you serious?
valya
Wait -- you want to keep HTML entities?
Jason Rhodes
A: 

Have you tried the htmlentities() function? Try like this:

$text = htmlentities($text);

There's some other optional parameters which you can check out at http://php.net/manual/en/function.htmlentities.php . You might have to set the quote_style and charset ones, at the very least.

jboxer
I've tried htmlentites with no luck. Here's what I tried:$ret = htmlentities($_POST[$varname], ENT_NOQUOTES, 'UTF-8', false);Still getting the weird characters, any idea if I'm messing something up there?
David Savage
Oh, for some reason, I caught the fact that you tried htmlspecialcharacters() but not htmlentities(). My bad. Anyways, I'd try the ISO options listed in my link, just in case.
jboxer
+2  A: 

If you are using PHP > 5.2.0 Filter could be helpful.

Dolfa
Awesome, filter worked great. I'm still tinkering with the options, so maybe I can avoid string replacement. However, this is what I'm doing now: $ret = str_replace(" ", " ", $_POST[$varname]); $ret = str_replace("/", "", $ret); $ret = filter_var($ret, FILTER_SANITIZE_URL); $ret = str_replace(" "," ", $ret); $ret = str_replace("", "/", $ret);
David Savage
A: 

htmlentities() and htmlspecialchars() aren't going to work for you if you want to remove those characters completely, rather than just converting them to HTML entities.

EDIT

I just noticed that at one point you said you want to preserve HTML entities. If that's the case, use htmlentities()!! It will convert all those symbols into their html entity equivalent. If you echo it, you're still going to see the characters you tried to remove, but if you view the source, you'll see the &name; formatted entity instead.


You may need to use a regex for this, as sad as that is. Most PHP functions are trying to preserve those characters for you in one format or another. It's surprising that they're isn't a function to remove them, that I know of at least!

Jason Rhodes