views:

454

answers:

3

and can it be configured not to happen?

I'm usually finding myself saving a result of a query as a .csv and processing it later on my Unix machine. The characters being null separated makes me have to filter those chars and is a bit of a pain.

So, these are the questions:

  • Why is this so?

EDIT:

Because it outputs in UTF-16 by default. Easiest conversion would then be:

iconv -f utf-16 -t utf-8 origFile.csv > newFile.csv
  • Can it be disabled somehow? How?

Here's a piece of a hexdump of a file thus generated. Each char is followed by a null char (00):

00000cf0  36 00 36 00 32 00 0d 00  0a 00 36 00 38 00 34 00  |6.6.2.....6.8.4.|
00000d00  30 00 36 00 32 00 31 00  36 00 0d 00 0a 00 36 00  |0.6.2.1.6.....6.|
00000d10  38 00 34 00 30 00 36 00  33 00 36 00 34 00 0d 00  |8.4.0.6.3.6.4...|
00000d20  0a 00 36 00 38 00 34 00  30 00 36 00 38 00 34 00  |..6.8.4.0.6.8.4.|
00000d30  32 00 0d 00 0a 00 36 00  38 00 34 00 30 00 37 00  |2.....6.8.4.0.7.|
00000d40  30 00 32 00 31 00 0d 00  0a 00 36 00 38 00 34 00  |0.2.1.....6.8.4.|
00000d50  30 00 37 00 37 00 39 00  37 00 0d 00 0a 00 36 00  |0.7.7.9.7.....6.|
00000d60  38 00 34 00 30 00 37 00  39 00 32 00 31 00 0d 00  |8.4.0.7.9.2.1...|
00000d70  0a 00 36 00 38 00 34 00  30 00 38 00 32 00 34 00  |..6.8.4.0.8.2.4.|
00000d80  31 00 0d 00 0a 00 36 00  38 00 34 00 30 00 38 00  |1.....6.8.4.0.8.|
00000d90  36 00 36 00 31 00 0d 00  0a 00 36 00 38 00 34 00  |6.6.1.....6.8.4.|
00000da0  30 00 38 00 37 00 35 00  31 00 0d 00 0a 00 36 00  |0.8.7.5.1.....6.|
00000db0  38 00 34 00 31 00 30 00  32 00 35 00 34 00 0d 00  |8.4.1.0.2.5.4...|
00000dc0  0a 00 36 00 38 00 34 00  31 00 30 00 34 00 34 00  |..6.8.4.1.0.4.4.|
+5  A: 

The file is being outputted in Unicode, not ASCII. Unicode uses twice as many bits to represent each character, hence the preceding 00's.

There might be an option to save as ANSI or ASCII, which should use 8 bit characters.

Ch00k
Yes, I'm so accustomed to UTF-8 I forget UTF-16.
Vinko Vrsalovic
But I see no option to set the encoding
Vinko Vrsalovic
+1  A: 

On Unix, I suggest the use of iconv -futf-16le -tutf-8 to filter your output. :-)

Chris Jester-Young
Yes, I already had done so :). Why le?
Vinko Vrsalovic
Because there's a big endian variety where the NUL byte comes first. :-)
Chris Jester-Young
So on that note, you should revise your post to say that the NUL byte actually follows the "real" byte, not precedes it. :-)
Chris Jester-Young
+1  A: 

I know this is an old post...but for new visitors...

When you are saving data from Microsoft SQL Management Studio, you will notice that the 'Save' button has a little arrow next to it. If you select the little arrow you can select 'Save With Encoding...' this will allow you to select the encoding you desire.

Leigh Shayler
Excellent! thanks.
Vinko Vrsalovic