views:

75

answers:

3

I am scraping some information from a 10 year old website that was built in ASP using Frontpage(originally) and Dreamweaver(lately). I am using PHP.

I am getting back strings with whitespace that is not spaces. Using the PHP trim function, some of the white space is removed but not all.

original string: string(47) "  School Calendar"
trimmed string: string(34) " School Calendar"

How do I figure out what the whitespaces are so I can remove them?

My page showing var_dumps of the original and trimmed strings is here.

+1  A: 

echo ord($trimmed_string) will tell you what the character code of the whitespace in that example is. (It gives the character code of the first character.)

chaos
+2  A: 

It looks like (if you view source on your page), that you're string has   "spaces" that aren't being trimmed by PHP's trim function.

The best option is probably to replace these in advance, by calling str_replace prior to trim:

$stringToTrim = str_replace(" "," ", $original);

$trimmed = trim($stringToTrim);

(Not using standard code formatting because it wasn't handling the   correctly)

Reed Copsey
this won't work if the code actually has a non-breaking space (rather than the html code for one).
Kip
Agreed - but in his case, he showed us exactly what was in there...
Reed Copsey
+1  A: 

Unicode has a plenty of "different" spaces: http://en.wikipedia.org/wiki/Space%5F%28punctuation%29#Table%5Fof%5Fspaces

http://www.brunildo.org/test/space-chars.html

Trim doesn't know about them all. You should use regex to get rid of them all if you need.

FractalizeR