tags:

views:

371

answers:

10

My main problem is that some output is coming on the page with a space character written as " ". I want to replace it back with a space. I tried str_replace("&nbsp"," ",$mystr) and even preg_replace("/( )/", " ", $mystr) but to no avail. How do I do this? And more generally, if there are other html codes coming as output, like "&", is there a way to replace them with the actual character output instead of the html code?

Edit: Let me clarify a few things here: I don't want people to enter " < s c r i p t > " tags in the source of an editable page. To prevent that, we need some mechanism to escape special characters. But the problem is that some valid characters are also escaped. I want to unescape them, but also want to make sure that no security is breached.

+2  A: 

I think you're looking for html_entity_decode.

Dominic Rodger
That would turn it in a non-breaking space character, not a space.
David Dorward
I am in a way looking for just that, but am just worried that some function actually does htmlentities() call for me before returning the output. Is it not a security issue to run html_entity_decode on a string? But I am also interested to do it with some regular expression matching.
-1 It would convert any character reference and not just ` `.
Gumbo
@Gumbo - just re-read the question, and I still think (the original question at least) that's what the OP asked for. Maybe I'm being thick though.
Dominic Rodger
+2  A: 

Take a look at html_entity_decode function.

Kuroki Kaze
That would turn it in a non-breaking space character, not a space.
David Dorward
You can run over the string and replace U+00A0 with U+0020 afterwards.
Joey
This was answer to a second question, about other entities :)
Kuroki Kaze
-1 It would convert any character reference and not just ` `.
Gumbo
+1  A: 

str_replace should replace that part of the text as it doesn't take regular expressions in account, so there is some other problem i guess

dusoft
A: 

I believe the function you're looking for is http://us2.php.net/manual/en/function.urldecode.php urldecode

Rob
The string is encoded with an HTML entity, not URL encoding. And he asked for a space, not the decoded version of the non-breaking space entity.
David Dorward
+1  A: 
<?php
   $string = "<p>Hello,& n b s p ;world</p>"; # Remove the spaces here - Stackoverflow bug doesn't let me enter the normal string.
   $string = str_replace("& n b s p ;", " ", $string);
   print $string;
?>

This works for me. Perhaps you were expecting it to modify the string in place instead of returning the modified version?

David Dorward
Tried fixing the source but failed, it appears stackoverflow has a bug!
Dominic Rodger
Lepidosteus
@Lepidosteus - That works for normal text but not for code blocks.   is rendered as   but   is rendered as a non-breaking space.
David Dorward
+4  A: 

Are you just doing this?

str_replace("&nbsp", " ", $mystr);

Or do you do this?

$mystr = str_replace("&nbsp", " ", $mystr);

Both str_replace and preg_replace return a value, they don't change the string in-place.

Aistina
No, i was doing as you have printed, that is, collecting what was returned as output.
A: 

Have you tried:

$text=html_entity_decode(str_replace('& nbsp;',' ',$text));

[remove the space between the ampersand and nbsp: it's due to Stack Overflow's formatting]

It'll swap the no-breaking-spaces with normal spaces and then decode any other remaining html entities.

Richy C.
A: 

What you actually need is an HTML filter based on a proper HTML parser so you can let only specified bits and pieces of HTML be passed through by your script.

Sinan Ünür
A: 

Look at HTML Purifier. Give it a whitelist of allowed tags/attributes, and it will filter everything for you.

Gordon
A: 

Since the trailing semicolon may be obmitted, you might want consider using a regular expression:

preg_replace("/&nbsp[;]?/", " ", $str)

You can replace [;]? by ;?. But Stack Overflow seems to replace &nbsp‍; (this is written with a ZERO WIDTH JOINER U+200D) so I used [;]?.

Gumbo