tags:

views:

59

answers:

5

How do i remove non ascii characters from a file?

+2  A: 
tr -dc [:graph:][:cntrl:] < input-file > cleaned-file

That's assuming you want to retain "control" characters and "printable" characters. Fiddle as required.

Carl Smotricz
A: 

My 2 cents might not solve your problem but may give you some hints.
file command tells you file encoding i.e. utf, ascii, etc and iconv can convert a file between different encodings.

Nikhil S
iconv surprisingly stripped someother stuff too on the xml file.. i did iconv -f ascii -t ascii -c
janar
+1  A: 
perl -pe's/[[:^ascii:]]//g' < input.txt > output.txt
Thomas
This is exactly what i did to fix the issue.
janar
+1  A: 

what language do you want to use? You can do this in perl like this:

perl -pi -e 's/[[:^ascii:]]//g' filename
bluesmoon
just did this before seeing your comment.
janar
+1  A: 

You can write a C program like this:

#include <stdio.h>
#include <ctype.h>

int main(int argc, char **argv)
{
   FILE *fin = fopen("source_file", "rb");
   FILE *fout = fopen("target_file", "w");
   int c;
   while ((c = fgetc(fin)) != EOF) {
       if (isprint(c))
          fputc(c, fout);
   }
   fclose(fin);
   fclose(fout);
   return 0;
}

Note: error checks were avoided for simplicity.

Compile it with:

$ gcc -W source_code.c -o convert

Run it with:

$ ./convert
Pablo Santa Cruz