tags:

views:

65

answers:

2

hi,

i have ruby code to parse data in excel file using Parseexcel gem. I need to save 2 columns in that file into a Hash, here is my code:

worksheet.each { |row|
  if row != nil
    key = row.at(1).to_s.strip
    value = row.at(0).to_s.strip

    if !parts.has_key?(key) and key.length > 0
      parts[key] = value
    end
  end
}

however it still save duplicate keys into the hash: "020098-10". I checked the excel file at the specified row and found the difference are " 020098-10" and "020098-10". the first one has a leading space while the second doesn't. I dont' understand is it true that .strip function already remove all leading and trailing white space?

also when i tried to print out key.length, it gave me these weird number:

020098-10 length 18
020098-10 length 17

which should be 9....

+1  A: 

If you will inspect the strings you receive, you will probably get something like:

" \x000\x002\x000\x000\x009\x008\x00-\x001\x000\x00"

This happens because of the strings encoding. Excel works with unicode while ruby uses ISO-8859-1 by default. The encodings will differ on various platforms.

You need to convert the data you receive from excel to a printable encoding. However when you should not encode strings created in ruby as you will end with garbage.

Consider this code:

@enc = Encoding::Converter.new("UTF-16LE", "UTF-8")

def convert(cell)
  if cell.numeric
    cell.value
  else
    @enc.convert(cell.value).strip
  end
end

parts = {}
worksheet.each do |row|
  continue unless row

  key = convert row.at(1)
  value = convert row.at(0)

  parts[key] = value unless parts.has_key?(key) or key.empty?
end

You may want change the encodings to a different ones.

Yossi
A: 

The newer Spreadsheet-gem handles charset conversion automatically for you, to UTF-8 I think as standard but you can change it, so I'd recommend using it instead.

ba