views:

186

answers:

1

I have an ant build that concatenates my javascript into one file and then compresses it. The problem is that Visual Studio's default encoding attaches a BOM to every file. How do I configure ant to strip out BOM's that would otherwise appear in the middle of the resulting concatenated file?

My googl'ing revealed this discussion which is the exact problem I'm having but doesn't provide a solution: http://marc.info/?l=ant-user&m=118598847927096

+1  A: 

The Unicode byte order mark codepoint is U+FEFF. This concatenation command will strip out all BOM characters when concatenating two files:

<concat encoding="UTF-8" outputencoding="UTF-8" destfile="nobom-concat.txt">
  <filelist dir="." files="bom1.txt,bom2.txt" />
  <filterchain>
    <deletecharacters chars="&#xFEFF;" />
  </filterchain>
</concat>

This form of the concat command tells the task to decode the files as UTF-8 character data. I'm assuming UTF-8 as this is usually where Java/BOM issues occur.

In UTF-8, the BOM is encoded as the bytes EF BB BF. If you needed it to appear at the start of the resultant file, you could use a subsequent concatenation to prefix the output file with a BOM again.

Encoded values for U+FEFF in other UTF encodings are listed here.

McDowell