On linux i have a directory with lots of files. Some of them have nonASCII characters, but they are all valid UTF8. One programme has a bug that prevents it working with nonASCII filenames, I have to find out how many are affected. I was going to do this with find
and then do a grep
to print the nonASCII characters, and then do a wc -l
to find the number. it doesn't have to be grep, I can use any standard unix regex, like perl
, sed
, awk
, etc.
However I'm not sure if there is a regex for 'any character that's not a ASCII character', is there?