views:

150

answers:

2

Hello,

I'm searching (without success) a script, which would work as a batch file and allow me to prepend a UTF-8 text file with a BOM if it doesn't have one.

Neither the language it is written in (perl, python, c, bash) or the OS it works on matters to me. I have access to a wide range of computers.

I've found a lot of script to do the reverse (strip the BOM), which sounds to me as kind of silly, as many Windows program will have trouble reading UTF-8 text files if they don't have a BOM.

Did I miss the obvious? Thanks!

+1  A: 

I find it pretty simple. Assuming the file is always UTF-8(you're not detecting the encoding, you know the encoding):

Read the first three characters. Compare them to the UTF-8 BOM sequence(wikipedia says it's 0xEF,0xBB,0xBF). If it's the same, print them in the new file and then copy everything else from the original file to the new file. If it's different, first print the BOM, then print the three characters and only then print everything else from the original file to the new file.

In C, fopen/fclose/fread/fwrite should be enough.

luiscubal
A: 

I wrote this addbom.sh using the 'file' command and ICU's 'uconv' command.

#!/bin/sh

if [ $# -eq 0 ];
then
        echo usage $0 files ...
        exit 1
fi

for file in $*;
do
        echo "# Processing: $file" 1>&2
        if [ ! -f "$file" ];
        then
                echo Not a file: "$file" 1>&2
                exit 1
        fi
        TYPE=`file - < "$file" | cut -d: -f2`
        if echo "$TYPE" | grep -q '(with BOM)';
        then
                echo "# $file already has BOM, skipping." 1>&2
        else
                ( mv ${file} ${file}~ && uconv -f utf-8 -t utf-8 --add-signature < "${file}~" > "${file}" ) || ( echo Error processing "$file" 1>&2 ; exit 1)
        fi
done
Steven R. Loomis
Absolutely perfect! A lot better than what I came with.Many thanks.
Stephane