views:

364

answers:

3

I'm not very familiar using ruby with binary data. I'm using mechanize to download a large number of csv files to my local disk. I then need to search these files for specific strings.

I use the save_as method in mechanize to save the file (which saves the file as binary). The content type of the file (according to mechanize) is:

application/vnd.ms-excel;charset=x-UTF-16LE-BOM

From here, I'm not sure how to read the file. I've tried reading it in as a normal file in ruby, but I just get the binary data. I've also tried just using standard unix tools (strings/grep) to try and search without any luck.

When I run the 'file' command on one of the files, I get:

foo.csv: Little-endian UTF-16 Unicode Pascal program text, with very long lines, with CRLF, CR, LF line terminators

I can see the data just fine with cat or vi. With vi I also see some control characters.

I've also tried both the csv and fastercsv ruby libraries, but I get 'IllegalFormatError' exception for these. I've also tried this solution without any luck.

Any help would be greatly appreciated. Thanks.

A: 

Could you please give a link to such a file and the expected result.

Kri-ban
+1  A: 

You can use the command 'iconv' to conver to UTF-8,

# iconv -f 'UTF-16LE' -t 'UTF-8' bad_file.csv > good_file.csv

There is also a wrapper for iconv in the standard library, you could use that to convert the file after reading it into your program.

W Devauld
A: 

Did you ever figure this out? I'm trying to do something similar.