views:

1307

answers:

8

I have a bizarre problem: Somewhere in my HTML/PHP code there's a hidden, invisible character that I can't seem to get rid of. By copying it from Firebug and converting it I identified it as  or 'Zero width no-break space'. It shows up as non-empty text node in my website and is causing a serious layout problem.

The problem is, I can't get rid of it. I can't see it in my files even when turning Invisibles on (d'uh). I can't seem to find it, no search tool seems to pick up on it. I rewrote my code around where it could be, but it seems to be somewhere deeper in one of the framework files.

Any good tools to find characters by charcode across files or something like that? (Mac OS X)

+1  A: 

use notepad plus plus.. there is an option to show all characters

Umair Ahmed
As stated, I'm more looking for a Mac OS X (or UNIX) tool.
deceze
yep i missed that... i think i saw some where it can be run using Crossover. not pretty solution though
Umair Ahmed
Btw: Notepad++ has an option to save Unicode files without BOM. Just in case you're gonna switch to Windows ;-)
Boldewyn
i run notepad++ on ubuntu using wine. i don't know if wine runs on OS X. notepad++ is awesome though.
the0ther
+1  A: 

vi or vim will show up any non-EOL characters.

Matthew Scharley
Anything that can search across files? I already skimmed all the places I suspected it in.
deceze
grep could probably do it... but with unicode characters, it's a little hard, because you never know what encoding the file is in, and hence what to pass to grep.
Matthew Scharley
Don't have too much experience with grep, would this be the correct usage? kk:trunk deceze$ grep -R '/\xFEFF/' .
deceze
A: 

I'm pretty sure Textwrangler will do it.

EDIT: VersionTracker link as Bare Bones site seems to be down again.

da5id
+2  A: 

I'm not a Mac user, but my general advice would be: when all else fails, use a hex editor. Very useful in such cases.

See "Comparison of hex editors" in WikiPedia.

Craig McQueen
+5  A: 

You don't get the character in the editor, because you can't find it in text editors. #FEFF or #FFFE are so-called byte-order marks. They are a Microsoft invention to tell in a Unicode file, in which order multi-byte characters are stored.

To get rid of it, tell your editor to save the file either as ANSI/ISO-8859 or as Unicode without BOM. If your editor can't do so, you'll either have to switch editors (sadly) or use some kind of truncation tool like, e.g., a hex editor that allows you to see how the file really looks.

On googleing, it seems, that TextWrangler has a "UTF-8, no BOM" mode. Otherwise, if you're comfortable with the terminal, you can use Vim:

:set nobomb

and save the file. Presto!

The characters are always the very first in a text file. Editors with support for the BOM will not, as I mentioned, show it to you at all.

Cheers,

Boldewyn
Now that would explain it, but no insult necessary. Weird that it translates into a proper character in the browser. I'll look for that...
deceze
Sorry, the 'you don't get it' was no insult, it should terminate in a comma. My apologies!
Boldewyn
Yep, that was indeed it. I wonder where that came from, as my editor (TextMate) doesn't save BOMs...
deceze
I saw that before, but it usually rendered as garbage on top of the page. Seems it's harder to find when it's in the middle of a page...? Anyway, thanks! :)
deceze
It can occur in the middle of a page, when you use PHP's include statement with a BOM-started file to include. Otherwise it should usually not appear (although it _is_ a standard Unicode character and can be used as such).
Boldewyn
If you're editing your HTML/PHP code with Altova XMLSpy then the option to turn off BOM is found at menu "Tools/Options", tabpage "Encoding". XMLSpy can preserve BOM if it finds it, or add it to a file when it doesn't exist yet. It has no option to remove BOM.
Workshop Alex
Oh, oops. I somehow doubt that you're using XMLSpy on a Mac OS X, although it can be installed on Mac OS X by using "Parallels for Mac" virtualization.
Workshop Alex
Just filed another question on using awk: http://stackoverflow.com/questions/1068650/using-awk-to-remove-the-byte-order-mark
Boldewyn
+3  A: 

It's a byte-order mark. Under Mac OS X: open terminal window, go to your sources and type:

grep -rn $'\xFEFF' *

It will show you the line numbers and filenames containing BOM.

Vexatus
Since it almost certainly are the first two bytes of the file, the problem is to get it away. I'm not quite experienced with awk, but it should be a one-liner with it to remove the first two bytes of a file.
Boldewyn
A: 

If you are using Textmate and the problem is in a UTF-8 file:

  1. Open the file
  2. File > Re-open with encoding > ISO-8859-1 (Latin1)
  3. You should be able to see and remove the first character in file
  4. File > Save
  5. File > Re-open with encoding > UTF8
  6. File > Save

It works for me every time.

Mirko
+1  A: 

Thanks,

This worked for me with gedit>file>save as>ISO-8859-15

dmbart