ansaurus

Question

Answer 1

+7 A:

Found here:

http://unix.derkeiler.com/pdf/Newsgroups/comp.unix.shell/2008-10/msg00031.pdf

awk '{if(NR==1)sub(/^\xef\xbb\xbf/, "");print}'

Enjoy!

Bartosz 2009-07-01 11:45:59

It seems that the dot in the middle of the sub statement is too much (at least, my awk complains about it). Beside this it's exactly what I searched, thanks!

Boldewyn 2009-07-01 12:21:54

This solution, however, works **only** for UTF-8 encoded files. For others, like UTF-16, see Wikipedia for the corresponding BOM representation: http://en.wikipedia.org/wiki/Byte_order_mark

Boldewyn 2009-07-01 12:36:33

I agree with the earlier comment; the dot does not belong in the middle of this statement and makes this otherwise great little snippet an example of an awk syntax error.

Brandon Craig Rhodes 2009-12-08 14:37:46

So: `awk '{if(NR==1)sub(/^\xef\xbb\xbf/,"");print}' INFILE > OUTFILE` and make sure INFILE and OUTFILE are different!

mrclay 2010-02-12 20:30:43

Answer 2

+2 A:

Not awk, but simpler:

tail -c +4 UTF8 > UTF8.nobom

To check for BOM:

hd -n 3 UTF8

If BOM is present you'll see: 00000000 ef bb bf ...

mrclay 2010-02-15 20:07:07

The tail trick is cool. Thanks!

Boldewyn 2010-02-16 21:02:21

Answer 3

+2 A:

Using sed:

# Removing BOM from all text files in current directory:
sed -i '1 s/^\xef\xbb\xbf//' *.txt

Advantage of using Gnu Sed: the -i parameter means "in place", and will update files without need of redirections or weird tricks.

Denilson Sá 2010-09-01 21:06:02

That's nice, too. Thanks!

Boldewyn 2010-09-06 07:37:01

ansaurus

tags:

views:

answers:

Using awk to remove the Byte-order mark

related questions