How do i remove non ascii characters from a file?
+2
A:
tr -dc [:graph:][:cntrl:] < input-file > cleaned-file
That's assuming you want to retain "control" characters and "printable" characters. Fiddle as required.
Carl Smotricz
2010-07-16 12:47:04
A:
My 2 cents might not solve your problem but may give you some hints.
file
command tells you file encoding i.e. utf, ascii, etc and iconv
can convert a file between different encodings.
Nikhil S
2010-07-16 12:47:20
iconv surprisingly stripped someother stuff too on the xml file.. i did iconv -f ascii -t ascii -c
janar
2010-07-17 07:49:26
+1
A:
what language do you want to use? You can do this in perl like this:
perl -pi -e 's/[[:^ascii:]]//g' filename
bluesmoon
2010-07-16 12:48:28
+1
A:
You can write a C program like this:
#include <stdio.h>
#include <ctype.h>
int main(int argc, char **argv)
{
FILE *fin = fopen("source_file", "rb");
FILE *fout = fopen("target_file", "w");
int c;
while ((c = fgetc(fin)) != EOF) {
if (isprint(c))
fputc(c, fout);
}
fclose(fin);
fclose(fout);
return 0;
}
Note: error checks were avoided for simplicity.
Compile it with:
$ gcc -W source_code.c -o convert
Run it with:
$ ./convert
Pablo Santa Cruz
2010-07-16 12:52:09