tags:

views:

215

answers:

3

There's a few "how do I invert a regexp" questions here on stackoverflow, but I can't find one for vim (if it does exist, by goggle-fu is lacking today).

In essence I want to match all non-printable characters and delete them. I could write a short script, or drop to a shell and use tr or something similar to delete, but a vim solution would be dandy :-)

Vim has the atom \p to match printable characters, however trying to do this :s/[^\p]//g to match the inverse failed and just left me with every 'p' in the file. I've seen the (?!xxx) sequence in other questions, and vim seems to not recognise this sequence. I've not found seen an atom for non-printable chars.

In the interim, I'm going to drop to external tools, but if anyone's got any trick up their sleeve to do this, it'd be welcome :-)

Ta!

+1  A: 

I'm also a little puzzled why you can't use the \p. But, [:print:] works fine:

:s/[^[:print:]]//g
dsummersl
This does not support unicode: `echo "Å"=~'[[:print:]]' "Å"=~'\p'` results in `0 1`.
ZyX
@ZyX: Good catch. I wonder why `[:print:]` doesn't include printable unicode characters?
Jefromi
A: 

If you want to filter file with Unicode (only if fileencoding=utf-8) printable characters, you could do this in three steps: mark all printable characters with not used UTF-8 symbol (for example, with nr2char(0xFFFF)), delete all characters, that are not followed by this symbol and, finally, delete this symbol:

%s/\p\@<=/<ffff>/g
%s/[^<ffff>]<ffff>\@!//g
%s/<ffff>//g

Here you must replace <ffff> with the actual character (if you type this, instead of <ffff> type <C-r>=nr2char(0xFFFF)<CR>).

If you are not working with Unicode use the dsummersl's answer.

ZyX
+6  A: 

Unfortunately you can't put \p in character classes, although that would be a nice feature. However you can use the negative-lookahead feature \@! to build your search:

/\p\@!.

This will first make sure that the . can only match when it is not a \p character.

too much php
Top stuff -- that did the job, cheers :-)
Chris J