views:

415

answers:

7

A recent problem* left me wondering whether there is a text editor out there that lets you see every single character of the file, even if they are invisible? Specifically, I'm not looking for hex editing capabilities, I am interested in a text editor that'll show me all of the invisible characters (not just the common whitespace / line break characters). The BOM marker is just one example, others are e.g. mathematical invisibles or possibly unsupported characters.

I'm not looking for a text editor that simply supports a large variety of text encoding / translations between encodings. All text editors I've come across treat the invisible characters correctly i.e. leave them invisible (or simply get removed in the translation as in the case of the BOM marker).

I'm asking this mostly out of academic interests, so I'm not particular about any specific OS. I can easily test Linux and OSX solutions, but if you recommend a Windows editor, I would appreciate if you include descriptions of how the editor handles invisibles other than whitespace / line breaks.

EDIT: I'm beginning to be sure that the behavior I want can be implemented in emacs/vim via either custom highlighting or by messing around with the font itself. A solution of this type would also be acceptable.

EDIT2: After looking at several options I found TextMate which at least shows a blank space where an invisible UTF-8 character is in the file. Slightly disappointed with SO's ability to answer my question. Bounty goes to VIM, because that is the direction in which the solution most likely lies.


*The incident that lead me to this question: I wrote a perl script using TextWrangler and managed to change the encoding to UTF8 BOM, which inserts te BOM marker at the start of the file. Perl (or rather the operating system) promptly misses the #! and mayhem ensues. It then took me the better part of an afternoon to figure this out since most text editors do not show the BOM marker even with various "show invisibles" options turned on. Now I've learned my lesson and will use less immediately :-).

+4  A: 

Notepad++ rocks:

npp

Coronatus
Can you verify what a UTF8 BOM file looks like in Notepad++, specifically does it show <U+FEFF> as the first character?
Timo
+2  A: 

In Visual Studio's Open File dialog, the Open pushbutton has a down arrow next to it that lets you choose Open With.... One of the options in the resulting dialog is Binary Editor.

I've used this now and then to spot some invisible character or to resolve some line-ending issue.

Scott Smith
+1  A: 

I prefer UltraEdit even though it is not free. It is very capable of showing hidden characters, including a robust HEX viewing mode. (I am not affiliated with the publisher, IDM.)

JYelton
+3  A: 

vim (in either textual or graphic mode) can show all control characters if you :set list. The BOM is a special case, controlled by the :set bomb or :set nobomb commands.

Alex Martelli
Yeah, I actually checked vim out first when I thought of this problem. It is possible to insert invisibles, e.g. "i CTRL-V u2062" for an invisible math "times", but there is no way of making vim show this character. If you switch encodings you see something but also mess up the character you inserted.
Timo
There you go. Not the answer I was looking for, but the best one nonetheless.
Timo
A: 

I am not sure as I haven't used it in a while, but I remember that SciTE was a good one that showed me "too much information" for my needs.

Programmer's Notepad on Windows might work.

TextPad (It's nagware, runs on Windows)

I'm not sure which of these will show the hidden characters out of the box, but they're all made for "nerdy" stuff, so I assume that they would work ,at least with a little tweaking. I can verify that Programmer's Notepad does show "hidden" characters.

Moshe
+1  A: 

I've encountered the same limitations — my specific issue is the need to be able to display characters like U+200B, the zero-width space, and U+200C, the zero-width non-joiner. (Used in electronic texts with such languages as Khmer, which otherwise do not separate words with spaces.) Unlike you, instead of "platform doesn't matter," I need an editor with Windows and Linux versions, and Mac too is desirable.

I haven't found any text editors that will let you display them on-screen, although some (many?) will let you enter them and will properly treat them as characters that can be cut and pasted and whose presence is indicated via cursor movement. (That is, if the screen shows "if" and there are three ZWSP's between the "i" and "f," you have to press the arrow key four times to move from "i" to "f.")

TextPad 4.7.3 is otherwise my text editor of choice, but it is very limited in its acceptance of scripts; and TextPad 5 definitely does not show these invisibles.

I have often resorted to opening my files in OpenOffice.org Writer, which will show a gray slash at these characters' location with invisibles turned on, and Microsoft Word, which displays a double-box (box within a box) character for such invisibles. This double-box has width and changes the line-breaks on-screen, which is not trivial and which I haven't seen in any other editor.

Roger_S
Wow, ++ for presenting a real world use case! I mean I was just poking around out of curiosity :-). Anyway, I have come to the conclusion that the easiest way to implement this is to make a custom utf-8 font map that includes special glyphs for all the desired ZWSP characters. This should also make it cross platform, you just need to figure out how to make your favorite editor use the custom font.
Timo
A: 

Open the file in EMACS and do a M-X hexl-mode. You'll get a display that looks like this:

87654321  0011 2233 4455 6677 8899 aabb ccdd eeff  0123456789abcdef                               
00000000: 2320 2020 2020 2020 2020 2020 2020 2020  #               
00000010: 2020 2020 2020 2020 2020 2020 2020 2020                  
00000020: 2020 2020 2020 2020 2020 2020 2020 2020                  
00000030: 2d2a 2d20 4175 746f 636f 6e66 202d 2a2d  -*- Autoconf -*-
00000040: 0a23 2050 726f 6365 7373 2074 6869 7320  .# Process this 
00000050: 6669 6c65 2077 6974 6820 6175 746f 636f  file with autoco
00000060: 6e66 2074 6f20 7072 6f64 7563 6520 6120  nf to produce a 
00000070: 636f 6e66 6967 7572 6520 7363 7269 7074  configure script
00000080: 2e0a 2320 4f72 6465 7220 6973 206c 6172  ..# Order is lar
00000090: 6765 6c79 2069 7272 6576 656c 6c61 6e74  gely irrevellant
000000a0: 2c20 616c 7468 6f75 6768 2069 7420 6d75  , although it mu
000000b0: 7374 2073 7461 7274 2077 6974 6820 4143  st start with AC
000000c0: 5f49 4e49 5420 616e 6420 656e 6420 7769  _INIT and end wi
000000d0: 7468 2041 435f 4f55 5450 5554 0a23 2053  th AC_OUTPUT.# S
000000e0: 6565 2068 7474 703a 2f2f 6175 746f 746f  ee http://autoto
000000f0: 6f6c 7365 742e 736f 7572 6365 666f 7267  olset.sourceforg
00000100: 652e 6e65 742f 7475 746f 7269 616c 2e68  e.net/tutorial.h
vy32