views:

446

answers:

2

I need to clean a string that comes (copy/pasted) from various office suite (excel, access, word) each with his own set of encoding.

I'm using json_encode for debugging purposes in order to being able to see every single encoded character.

I'm able to clean everything I found so far (\r \n) with str_replace, but with \u00a0 I have no luck.

$string = '[email protected]\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0;[email protected]'; //this is the output from json_encode

$clean = str_replace("\u00a0", "",$string);

returns:

[email protected]\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0;[email protected]

that is exactly the same, it completly ignores \u00a0.

Is there a way around this also I'm feeling I'm reinventing the wheel, is there any function/class that completely strips EVERY possibile char of EVERY possible encoding?

Thank you for your time.

_EDIT_

After the first two replies I need to clarify that my example DOES work because it's the output from json_encode not the actual string!

_EDIT_

+1  A: 

Works for me, when I copy/paste your code. Try replacing the double quotes in your str_replace() with single quotes, or escaping the backslash ("\\u00a0").

Adam Backstrom
In your example it works because you are using the output from json_encode not the actual string! If I copy paste my code it works perfectly even for me.
0plus1
What happens if you replace `\xa0` rather than `\u00a0`?
Adam Backstrom
This happens. It does delete the instances of \u00a0 and when printed from json_encode it looks ok, however if i echo the string without json encode I get a � where before there was the \u00a0. At this point I can't understand what's going on.. please give me an explanation! :-)
0plus1
I found the solution, just assign the json_encode to a variable and then str_replace like there's no tomorrow. I would love to still understand the gimmick about \xa0 if you may..
0plus1
That may be a null character… The escaped character `\u00a0` says "unicode character with hex value 00a0." My original suggestion would have only stripped out the a0 segment. Try replacing \x00a0 with a blank string.
Adam Backstrom
+1  A: 

you have to do this with singe quotes like this:

str_replace('\u00a0', "",$string);

or, if you like to use double quotes, hou have to escape the backslash - wich would look like this:

str_replace("\\u00a0", "",$string);
oezi
Still won't work
0plus1