ansaurus

Question

Clean source code files of invisible characters

Answer 1

+1 A:

use notepad plus plus.. there is an option to show all characters

Umair Ahmed 2009-07-01 07:33:23

As stated, I'm more looking for a Mac OS X (or UNIX) tool.

deceze 2009-07-01 07:34:58

yep i missed that... i think i saw some where it can be run using Crossover. not pretty solution though

Umair Ahmed 2009-07-01 07:37:11

Btw: Notepad++ has an option to save Unicode files without BOM. Just in case you're gonna switch to Windows ;-)

Boldewyn 2009-07-01 11:39:47

i run notepad++ on ubuntu using wine. i don't know if wine runs on OS X. notepad++ is awesome though.

the0ther 2010-08-13 18:31:50

Answer 2

+1 A:

vi or vim will show up any non-EOL characters.

Matthew Scharley 2009-07-01 07:35:50

Anything that can search across files? I already skimmed all the places I suspected it in.

deceze 2009-07-01 07:40:17

grep could probably do it... but with unicode characters, it's a little hard, because you never know what encoding the file is in, and hence what to pass to grep.

Matthew Scharley 2009-07-01 07:42:02

Don't have too much experience with grep, would this be the correct usage? kk:trunk deceze$ grep -R '/\xFEFF/' .

deceze 2009-07-01 07:52:13

Answer 3

A:

I'm pretty sure Textwrangler will do it.

EDIT: VersionTracker link as Bare Bones site seems to be down again.

da5id 2009-07-01 07:45:09

Answer 4

+2 A:

I'm not a Mac user, but my general advice would be: when all else fails, use a hex editor. Very useful in such cases.

See "Comparison of hex editors" in WikiPedia.

Craig McQueen 2009-07-01 07:55:12

Answer 5

+5 A:

You don't get the character in the editor, because you can't find it in text editors. #FEFF or #FFFE are so-called byte-order marks. They are a Microsoft invention to tell in a Unicode file, in which order multi-byte characters are stored.

To get rid of it, tell your editor to save the file either as ANSI/ISO-8859 or as Unicode without BOM. If your editor can't do so, you'll either have to switch editors (sadly) or use some kind of truncation tool like, e.g., a hex editor that allows you to see how the file really looks.

On googleing, it seems, that TextWrangler has a "UTF-8, no BOM" mode. Otherwise, if you're comfortable with the terminal, you can use Vim:

:set nobomb

and save the file. Presto!

The characters are always the very first in a text file. Editors with support for the BOM will not, as I mentioned, show it to you at all.

Cheers,

Boldewyn 2009-07-01 07:59:28

Now that would explain it, but no insult necessary. Weird that it translates into a proper character in the browser. I'll look for that...

deceze 2009-07-01 08:02:45

Sorry, the 'you don't get it' was no insult, it should terminate in a comma. My apologies!

Boldewyn 2009-07-01 08:07:38

Yep, that was indeed it. I wonder where that came from, as my editor (TextMate) doesn't save BOMs...

deceze 2009-07-01 08:08:29

I saw that before, but it usually rendered as garbage on top of the page. Seems it's harder to find when it's in the middle of a page...? Anyway, thanks! :)

deceze 2009-07-01 08:12:57

It can occur in the middle of a page, when you use PHP's include statement with a BOM-started file to include. Otherwise it should usually not appear (although it _is_ a standard Unicode character and can be used as such).

Boldewyn 2009-07-01 08:25:40

If you're editing your HTML/PHP code with Altova XMLSpy then the option to turn off BOM is found at menu "Tools/Options", tabpage "Encoding". XMLSpy can preserve BOM if it finds it, or add it to a file when it doesn't exist yet. It has no option to remove BOM.

Workshop Alex 2009-07-01 08:42:41

Oh, oops. I somehow doubt that you're using XMLSpy on a Mac OS X, although it can be installed on Mac OS X by using "Parallels for Mac" virtualization.

Workshop Alex 2009-07-01 08:46:14

Just filed another question on using awk: http://stackoverflow.com/questions/1068650/using-awk-to-remove-the-byte-order-mark

Boldewyn 2009-07-01 12:23:11

Answer 6

+3 A:

It's a byte-order mark. Under Mac OS X: open terminal window, go to your sources and type:

grep -rn $'\xFEFF' *

It will show you the line numbers and filenames containing BOM.

Vexatus 2009-07-01 08:00:15

Since it almost certainly are the first two bytes of the file, the problem is to get it away. I'm not quite experienced with awk, but it should be a one-liner with it to remove the first two bytes of a file.

Boldewyn 2009-07-01 08:27:06

Answer 7

A:

If you are using Textmate and the problem is in a UTF-8 file:

Open the file
File > Re-open with encoding > ISO-8859-1 (Latin1)
You should be able to see and remove the first character in file
File > Save
File > Re-open with encoding > UTF8
File > Save

It works for me every time.

Mirko 2010-06-21 14:52:09

Answer 8

+1 A:

Thanks,

This worked for me with gedit>file>save as>ISO-8859-15

dmbart 2010-08-13 09:07:40

ansaurus

tags:

views:

answers:

Clean source code files of invisible characters

related questions